# Top 250 IMDB Rated TV Shows and their episodes

In This notebook, we are going to collect data by scrapind the IMDB Website to get the top 250 TV Shows, Then we will use the OMDB API to get the rating, release date and title of each episode for all seasons

In [1]:
# Dependencies
from bs4 import BeautifulSoup
import pandas as pd
import requests
import json
from config import api_key

In [2]:
#Get the top 250 TV Shows
url = 'https://www.imdb.com/chart/toptv/'

# Retrieve page with the requests module
response = requests.get(url)

show_name = []
show_rating = []

#Create the soup object to extract the data
soup = BeautifulSoup(response.text, 'html.parser')
tv_shows = soup.find('tbody', class_='lister-list').find_all('tr')
        
for show in tv_shows:
    my_text = show.find('td', class_='titleColumn')
    show_name.append(my_text.a.text)
    show_rating.append(show.find('td', class_='ratingColumn').text.strip())

tvshow_dt = pd.DataFrame({"TV Show": show_name, "TV Show Rating":show_rating})
tvshow_dt.head()

Unnamed: 0,TV Show,TV Show Rating
0,Planet Earth II,9.5
1,Band of Brothers,9.4
2,Game of Thrones,9.4
3,Planet Earth,9.4
4,Breaking Bad,9.4


## Get all episodes

We will get each episode title, release date and rating from OMDB API (Json)

In [3]:
# Save config information.
url = "http://www.omdbapi.com/?"

# Build partial query URL
query_url = f"{url}apikey={api_key}&t="

In [4]:
#This function get the episodes for a given tv show
def get_episodes(show):
    total_seasons = 1
    curr_season = 1
    show_title = []
    show_totseason = []
    season = []
    episode_number = []
    episode_title = []
    release_date = []
    episode_rating = []
    try:
        while curr_season <= total_seasons:
            response = requests.get(f"{query_url}{show}&season={curr_season}").json()
            total_seasons = int(response["totalSeasons"])
            episodes = response["Episodes"]
            for episode in episodes:
                show_title.append(response["Title"])
                show_totseason.append(total_seasons)
                season.append(curr_season)
                episode_number.append(int(episode["Episode"]))
                episode_title.append(episode["Title"])
                release_date.append(episode["Released"])
                episode_rating.append(episode["imdbRating"])
            curr_season +=1
    except:
        print(f"There is no season found for {show}")
        
    show_df = pd.DataFrame({"TV Show": show_title
                            , "Number of Seasons": show_totseason
                            , "Season": season
                            , "Episode": episode_number
                            , "Title": episode_title
                            , "Released": release_date
                            , "Episode Rating": episode_rating})
    return show_df  

In [5]:
tvepisode_dt = pd.concat(map(get_episodes, tvshow_dt["TV Show"]))

There is no season found for Apocalypse: La 2ème guerre mondiale
There is no season found for House M.D.
There is no season found for Dragon Ball Z
There is no season found for Dragon Ball Z: Doragon bôru zetto
There is no season found for Kôdo giasu: Hangyaku no rurûshu
There is no season found for Long Way Round
There is no season found for Gomorra: La serie
There is no season found for Leyla ile Mecnun
There is no season found for Shin Seiki Evangerion
There is no season found for Sseulsseulhago Chalranhashin: Dokkaebi
There is no season found for Dr. Horrible's Sing-Along Blog
There is no season found for Whose Line Is It Anyway?
There is no season found for Boku dake ga inai machi
There is no season found for Kôkaku kidôtai: Stand Alone Complex
There is no season found for I'm Alan Partridge
There is no season found for Enaaya
There is no season found for Avrupa Yakasi
There is no season found for Naruto: Shippûden


In [6]:
tvepisode_dt.head()

Unnamed: 0,TV Show,Number of Seasons,Season,Episode,Title,Released,Episode Rating
0,Planet Earth II,1.0,1.0,1.0,Islands,2017-02-18,8.1
1,Planet Earth II,1.0,1.0,2.0,Mountains,2017-02-25,7.7
2,Planet Earth II,1.0,1.0,3.0,Jungles,2016-11-20,8.0
3,Planet Earth II,1.0,1.0,4.0,Deserts,2016-11-27,7.6
4,Planet Earth II,1.0,1.0,5.0,Grasslands,2017-03-18,6.8


### Merge with the Overall TV Show Rating

In [7]:
tv_show_data = pd.merge(tvshow_dt, tvepisode_dt, how="inner", on = "TV Show")

In [8]:
#Save to Excel File
tv_show_data.to_excel("Top_TV_shows.xls",sheet_name = "Top TV Shows")