# TV Shows to Watch 

This code shows how to: 
1) Webscrape a website - imbd TV show rankings 

2) Import Information from the webscraper into an excel file 

3) Using conditionals, provide a list of show suggestions 


### Imports: 

In [106]:
# For webscraping 
import  bs4
from bs4 import BeautifulSoup

# Importing website 
import requests

# Dataframe 
import pandas as pd
import numpy as np

# Excel 
import openpyxl

### Excel Workbook Creation:

In [107]:
# Creates an excel page to data to be loaded 
excel = openpyxl.Workbook()

# Assigning a new variable for the active excel 
sheet = excel.active

# Creates a new sheet title 
sheet.title = 'TV Show Rankings Rotten Tomato'

# Creates column headers 
sheet.append(["Show Name","Rating", "Release Year"])


### Webscraping:
    - Getting the website 
    - Parsing the website and making it readable with the html text 

In [108]:
# Accesses URL (Rotten Tomatoes)
url = requests.get('https://editorial.rottentomatoes.com/guide/best-netflix-shows-and-movies-to-binge-watch-now/')

# Extracting html souce code using BeautifulSoup
    # html.paser is default parser 
    # Could have also done url.content however, text is easier to read 
soup = BeautifulSoup(url.text,'html.parser')


#print(soup)



### Webscraping: 
    - breaking down the html 

In [109]:
# Go to website
    # Access html content of website by command + alt (fn + option) + i 
  
    
    # All data to be accessed is in a "tag" <Tag Name> 
    # Tags can have ids, classes, ect

# Find will get the first element with "div
# Create a vaiable: shows 

shows = soup.find('div', class_="articleContentBody")
print(shows)


<div class="articleContentBody">
<img alt="Hellbound" class="wp-image-210047 size-full" height="250" loading="lazy" sizes="(max-width: 700px) 100vw, 700px" src="https://editorialadmin.rottentomatoes.com/wp-content/uploads/2021/11/Hellbound_unit_101_700x250.jpg" srcset="https://prd-rteditorial.s3.us-west-2.amazonaws.com/wp-content/uploads/2021/11/01111801/Hellbound_unit_101_700x250.jpg 700w, https://prd-rteditorial.s3.us-west-2.amazonaws.com/wp-content/uploads/2021/11/01111801/Hellbound_unit_101_700x250-300x107.jpg 300w" width="700"><p class="media-credit image-text" style="display: block;">(Photo by Netflix)</p>
<h1>The 213 Best Netflix Series to Watch Right Now</h1>
<p><strong>Updated: December 1, 2021</strong></p>
<p>Looking for the best shows on Netflix? Look no further, because Rotten Tomatoes has put together a list of the best original Netflix series available to watch right now, ranked according to the Tomatometer.</p>
<p><em>Arcane: League of Legends</em>, based on the Riot Gam

### Webscraping: 
    - Want to further breakdown html code 
    - Need to further specify within "shows" variable 
    - Double Check that webscrapper is pulling correct info

In [110]:
# Extracting information on the shows listed on the site 
    # Gives us info such as Show name, rating, year the show started 
    
shows = soup.find('div', class_="articleContentBody").find_all('h2')
print(shows)


[<h2><a href="//www.rottentomatoes.com/tv/on_the_verge">On the Verge</a> <span class="subtle start-year">(2021)</span> <span class="icon tiny fresh" title="Fresh"></span> <span class="tMeterScore">60%</span></h2>, <h2><a href="//www.rottentomatoes.com/tv/anne">Anne With an E</a> <span class="subtle start-year">(2017)</span> <span class="icon tiny fresh" title="Fresh"></span> <span class="tMeterScore">60%</span></h2>, <h2><a href="//www.rottentomatoes.com/tv/derek">Derek</a> <span class="subtle start-year">(2013)</span> <span class="icon tiny fresh" title="Fresh"></span> <span class="tMeterScore">60%</span></h2>, <h2><a href="//www.rottentomatoes.com/tv/ratched">Ratched</a> <span class="subtle start-year">(2020)</span> <span class="icon tiny fresh" title="Fresh"></span> <span class="tMeterScore">61%</span></h2>, <h2><a href="//www.rottentomatoes.com/tv/behind_her_eyes">Behind Her Eyes</a> <span class="subtle start-year">(2021)</span> <span class="icon tiny fresh" title="Fresh"></span> <

In [111]:
# Want to double check what we have is correct, look for length of tr tags
# Use print(len(shows))
# This number should be 213

shows = soup.find('div', class_="articleContentBody").find_all('h2')
print(len(shows))


213


### Webscraping: 
    - Loops through table that contains the list of shows 

In [112]:
# Here we have created a loop that goes through the contents of all
# the "articleContectBody" and takes out the show name, rating, and year started
# This information is put into an excel sheet (the one tht was created previously)


for show in shows:
    Name = show.find("a").text
    Rating = show.find('span', class_="tMeterScore").text[:-1]
    Year_Started = show.find("span").text[1:-1]
    print(Name, Rating, Year_Started)
    sheet.append([Name, Rating, Year_Started])
    
   

    
        
        

excel.save('Rotten Tomatoe Ratings.xlsx')

On the Verge 60 2021
Anne With an E 60 2017
Derek 60 2013
Ratched 61 2020
Behind Her Eyes 62 2021
Bloodline 62 2015
Emily in Paris 63 2020
White Lines 64 2020
Marvel's The Punisher 64 2017
The Duchess 65 2020
Lilyhammer 65 2012
Halston 65 2021
Disenchantment 65 2018
Marco Polo 66 2014
Grand Army 67 2020
Killer Inside: The Mind of Aaron Hernandez 67 2020
Cursed 67 2020
Fear City: New York vs. the Mafia 68 2020
Self Made: Inspired by the Life of Madam C.J. Walker 68 2020
Love Is Blind 68 2020
Ginny & Georgia 68 2021
Warrior Nun 68 2020
The Eddy 68 2020
The Witcher 68 2019
Pacific Rim: The Black 69 2021
The Liberator 69 2020
Dark Tourist 70 2018
History of Swear Words 70 2021
Marcella 70 2016
The Serpent 70 2021
Daybreak 70 2019
Bonding 71 2019
Requiem 71 2018
Troy: Fall of a City 71 2018
Safe 71 2018
Wanderlust 71 2018
Dracula 71 2020
Designated Survivor 71 2016
Japan Sinks: 2020 72 2020
Everything Sucks! 72 2018
Locke & Key 73 2020
Night Stalker: The Hunt for a Serial Killer 73 2021
Wat

### Extracting from Excel 

In [113]:
# New variable excel 
# Goes to folder excel was saved in 

excel = pd.ExcelFile('/Users/kaitlinneville/Desktop/CSC593/Rotten Tomatoe Ratings.xlsx')

# Specifically tells python to read the show rankings sheet
df = pd.read_excel(excel, "TV Show Rankings Rotten Tomato")

# Display the first ten rows 
df.head(10)

Unnamed: 0,Show Name,Rating,Release Year
0,On the Verge,60.0,2021
1,Anne With an E,60.0,2017
2,Derek,60.0,2013
3,Ratched,61.0,2020
4,Behind Her Eyes,62.0,2021
5,Bloodline,62.0,2015
6,Emily in Paris,63.0,2020
7,White Lines,64.0,2020
8,Marvel's The Punisher,64.0,2017
9,The Duchess,65.0,2020


In [114]:
df.dtypes
# Show rank and year of release are int and we will need them to be floats for later on

Show Name        object
Rating          float64
Release Year      int64
dtype: object

In [115]:
df['Release Year'] = df['Release Year'].astype(float)

In [116]:
df.dtypes
# Confirms that data types were changes to floats 

Show Name        object
Rating          float64
Release Year    float64
dtype: object

### User Input: 
    - Rating 
    - Release_Year 

In [120]:
# Inputs 
Rating = 70
Release_Year = 2000


# Index Show Rank
# Determination Function for choosing TV show to watch 
df= pd.read_excel('Rotten Tomatoe Ratings.xlsx')
df.loc[(df['Rating']> Rating ) & (df['Release Year'] > Release_Year)]


Unnamed: 0,Show Name,Rating,Release Year
31,Bonding,71.0,2019
32,Requiem,71.0,2018
33,Troy: Fall of a City,71.0,2018
34,Safe,71.0,2018
35,Wanderlust,71.0,2018
...,...,...,...
208,Ugly Delicious,100.0,2018
209,Dash & Lily,100.0,2020
210,Mystery Science Theater 3000: The Return,100.0,2017
211,Feel Good,100.0,2020


# Thank you! 
# The End 