# Data Visualization - Episodes Ranking for "Game of Thrones"

https://datascientyst.com/dataisbeautiful-the-absolute-quality-of-breaking-bad/

## Step 1: Get to IMDB and choose TV series

### Get the identifier by selecting only the last numbers from the url. For example from https://www.imdb.com/title/tt0944947/ get 0944947

## Step 2: Scrape IMDb movie data

### Install sinemagoer

In [1]:
pip install cinemagoer

You should consider upgrading via the '/home/desi/PycharmProjects/Pandas-Exercises-Projects/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


### Get IMDb movie info

In [2]:
from imdb import Cinemagoer

# create an instance of the Cinemagoer class
cg = Cinemagoer()

In [3]:
# get a movie and print its director(s)
avatar = cg.get_movie('0499549')
for director in avatar['directors']:
    print(director['name'])

James Cameron


### Search for movies by title

In [4]:
# movies = cg.search_movie('Game of Thrones')
# movies

### Extract IMDb series

In [5]:
# get information for "Game of Thrones"
series = cg.get_movie('0944947')
series

<Movie id:0944947[http] title:_"Game of Thrones" (2011)_>

In [6]:
# series['episodes']

In [7]:
cg.update(series, 'episodes')
seasons_info = series['episodes']
seasons_info

{1: {1: <Movie id:1480055[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" Winter Is Coming (2011)_>,
  2: <Movie id:1668746[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" The Kingsroad (2011)_>,
  3: <Movie id:1829962[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" Lord Snow (2011)_>,
  4: <Movie id:1829963[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" Cripples, Bastards, and Broken Things (2011)_>,
  5: <Movie id:1829964[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" The Wolf and the Lion (2011)_>,
  6: <Movie id:1837862[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" A Golden Crown (2011)_>,
  7: <Movie id:1837863[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" You Win or You Die (2011)_>,
  8: <Movie id:1837864[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" The Pointy End (2011)_>,
  9: <Movie id:1851398[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" Baelor (2011)_>,
  1

In [8]:
seasons = seasons_info.keys()
seasons

dict_keys([1, 2, 3, 4, 5, 6, 7, 8])

In [9]:
seasons_info[1][3].data

{'title': 'Lord Snow',
 'kind': 'episode',
 'episode of': <Movie id:0944947[http] title:_"Game of Thrones (TV Series 2011–2019) - IMDb" (None)_>,
 'season': 1,
 'episode': 3,
 'rating': 8.501234567891,
 'votes': 35708,
 'original air date': '1 May 2011',
 'year': '2011',
 'plot': "\nJon begins his training with the Night's Watch; Ned confronts his past and future at King's Landing; Daenerys finds herself at odds with Viserys.    "}

## Step 3: Create DataFrame from episodes data

### Collect episode data

In [10]:
import pandas as pd

In [11]:
ep_data = []

for season in seasons_info.values():
    for episode in season.values():
        data = episode.data
        if 'episode of' in data.keys():
            data.pop('episode of')
        df_temp = pd.DataFrame.from_records([data])
        ep_data.append(df_temp)

df = pd.concat(ep_data)  
df

Unnamed: 0,title,kind,season,episode,rating,votes,original air date,year,plot
0,Winter Is Coming,episode,1,1,8.901235,49865,17 Apr. 2011,2011,\nEddard Stark is torn between his family and ...
0,The Kingsroad,episode,1,2,8.601235,37728,24 Apr. 2011,2011,"\nWhile Bran recovers from his fall, Ned takes..."
0,Lord Snow,episode,1,3,8.501235,35708,1 May 2011,2011,\nJon begins his training with the Night's Wat...
0,"Cripples, Bastards, and Broken Things",episode,1,4,8.601235,33955,8 May 2011,2011,\nEddard investigates Jon Arryn's murder. Jon ...
0,The Wolf and the Lion,episode,1,5,9.001235,35295,15 May 2011,2011,\nCatelyn has captured Tyrion and plans to bri...
...,...,...,...,...,...,...,...,...,...
0,A Knight of the Seven Kingdoms,episode,8,2,7.901235,131945,21 Apr. 2019,2019,\nJaime faces judgment and Winterfell prepares...
0,The Long Night,episode,8,3,7.501235,217621,28 Apr. 2019,2019,\nThe Night King and his army have arrived at ...
0,The Last of the Starks,episode,8,4,5.501235,166203,5 May 2019,2019,\nThe Battle of Winterfell is over and a new c...
0,The Bells,episode,8,5,6.001235,193828,12 May 2019,2019,\nForces have arrived at King's Landing for th...


## Step 4: Transform DataFrame

In [12]:
rating_df = pd.pivot_table(df, index='episode', columns='season', values='rating')
rating_df

season,1,2,3,4,5,6,7,8
episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,8.901235,8.601235,8.601235,9.001235,8.301235,8.401235,8.501235,7.601235
2,8.601235,8.401235,8.501235,9.701235,8.301235,9.301235,8.801235,7.901235
3,8.501235,8.701235,8.701235,8.701235,8.401235,8.601235,9.101235,7.501235
4,8.601235,8.601235,9.501235,8.701235,8.501235,9.001235,9.701235,5.501235
5,9.001235,8.601235,8.901235,8.601235,8.501235,9.701235,8.701235,6.001235
6,9.101235,8.901235,8.701235,9.701235,7.901235,8.301235,9.001235,4.001235
7,9.101235,8.801235,8.601235,9.001235,8.801235,8.501235,9.401235,
8,8.901235,8.601235,8.901235,9.701235,9.801235,8.301235,,
9,9.601235,9.601235,9.901235,9.601235,9.401235,9.901235,,
10,9.401235,9.301235,9.001235,9.601235,9.101235,9.901235,,


In [13]:
rating_df.round(1)

season,1,2,3,4,5,6,7,8
episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,8.9,8.6,8.6,9.0,8.3,8.4,8.5,7.6
2,8.6,8.4,8.5,9.7,8.3,9.3,8.8,7.9
3,8.5,8.7,8.7,8.7,8.4,8.6,9.1,7.5
4,8.6,8.6,9.5,8.7,8.5,9.0,9.7,5.5
5,9.0,8.6,8.9,8.6,8.5,9.7,8.7,6.0
6,9.1,8.9,8.7,9.7,7.9,8.3,9.0,4.0
7,9.1,8.8,8.6,9.0,8.8,8.5,9.4,
8,8.9,8.6,8.9,9.7,9.8,8.3,,
9,9.6,9.6,9.9,9.6,9.4,9.9,,
10,9.4,9.3,9.0,9.6,9.1,9.9,,


In [14]:
rating_df.round(1).style.background_gradient(cmap='GnBu')

season,1,2,3,4,5,6,7,8
episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,8.9,8.6,8.6,9.0,8.3,8.4,8.5,7.6
2,8.6,8.4,8.5,9.7,8.3,9.3,8.8,7.9
3,8.5,8.7,8.7,8.7,8.4,8.6,9.1,7.5
4,8.6,8.6,9.5,8.7,8.5,9.0,9.7,5.5
5,9.0,8.6,8.9,8.6,8.5,9.7,8.7,6.0
6,9.1,8.9,8.7,9.7,7.9,8.3,9.0,4.0
7,9.1,8.8,8.6,9.0,8.8,8.5,9.4,
8,8.9,8.6,8.9,9.7,9.8,8.3,,
9,9.6,9.6,9.9,9.6,9.4,9.9,,
10,9.4,9.3,9.0,9.6,9.1,9.9,,


In [15]:
rating_df.round(1).fillna('')


season,1,2,3,4,5,6,7,8
episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,8.9,8.6,8.6,9.0,8.3,8.4,8.5,7.6
2,8.6,8.4,8.5,9.7,8.3,9.3,8.8,7.9
3,8.5,8.7,8.7,8.7,8.4,8.6,9.1,7.5
4,8.6,8.6,9.5,8.7,8.5,9.0,9.7,5.5
5,9.0,8.6,8.9,8.6,8.5,9.7,8.7,6.0
6,9.1,8.9,8.7,9.7,7.9,8.3,9.0,4.0
7,9.1,8.8,8.6,9.0,8.8,8.5,9.4,
8,8.9,8.6,8.9,9.7,9.8,8.3,,
9,9.6,9.6,9.9,9.6,9.4,9.9,,
10,9.4,9.3,9.0,9.6,9.1,9.9,,


## Step 5: Styling for the table

In [16]:
rating_df.columns

Int64Index([1, 2, 3, 4, 5, 6, 7, 8], dtype='int64', name='season')

In [17]:
rating_df.index

Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype='int64', name='episode')

In [18]:
rating_df.columns.name = None               #remove categories
rating_df.index.name = None
rating_df

Unnamed: 0,1,2,3,4,5,6,7,8
1,8.901235,8.601235,8.601235,9.001235,8.301235,8.401235,8.501235,7.601235
2,8.601235,8.401235,8.501235,9.701235,8.301235,9.301235,8.801235,7.901235
3,8.501235,8.701235,8.701235,8.701235,8.401235,8.601235,9.101235,7.501235
4,8.601235,8.601235,9.501235,8.701235,8.501235,9.001235,9.701235,5.501235
5,9.001235,8.601235,8.901235,8.601235,8.501235,9.701235,8.701235,6.001235
6,9.101235,8.901235,8.701235,9.701235,7.901235,8.301235,9.001235,4.001235
7,9.101235,8.801235,8.601235,9.001235,8.801235,8.501235,9.401235,
8,8.901235,8.601235,8.901235,9.701235,9.801235,8.301235,,
9,9.601235,9.601235,9.901235,9.601235,9.401235,9.901235,,
10,9.401235,9.301235,9.001235,9.601235,9.101235,9.901235,,


In [19]:
rating_df = rating_df.rename(columns=lambda x: 'S' + str(x))
rating_df = rating_df.rename(index=lambda x: 'Episode ' + str(x))
rating_df

Unnamed: 0,S1,S2,S3,S4,S5,S6,S7,S8
Episode 1,8.901235,8.601235,8.601235,9.001235,8.301235,8.401235,8.501235,7.601235
Episode 2,8.601235,8.401235,8.501235,9.701235,8.301235,9.301235,8.801235,7.901235
Episode 3,8.501235,8.701235,8.701235,8.701235,8.401235,8.601235,9.101235,7.501235
Episode 4,8.601235,8.601235,9.501235,8.701235,8.501235,9.001235,9.701235,5.501235
Episode 5,9.001235,8.601235,8.901235,8.601235,8.501235,9.701235,8.701235,6.001235
Episode 6,9.101235,8.901235,8.701235,9.701235,7.901235,8.301235,9.001235,4.001235
Episode 7,9.101235,8.801235,8.601235,9.001235,8.801235,8.501235,9.401235,
Episode 8,8.901235,8.601235,8.901235,9.701235,9.801235,8.301235,,
Episode 9,9.601235,9.601235,9.901235,9.601235,9.401235,9.901235,,
Episode 10,9.401235,9.301235,9.001235,9.601235,9.101235,9.901235,,


In [20]:
rating_df.style.background_gradient(cmap='gist_heat_r', axis=None).format( precision=1, na_rep='')  

Unnamed: 0,S1,S2,S3,S4,S5,S6,S7,S8
Episode 1,8.9,8.6,8.6,9.0,8.3,8.4,8.5,7.6
Episode 2,8.6,8.4,8.5,9.7,8.3,9.3,8.8,7.9
Episode 3,8.5,8.7,8.7,8.7,8.4,8.6,9.1,7.5
Episode 4,8.6,8.6,9.5,8.7,8.5,9.0,9.7,5.5
Episode 5,9.0,8.6,8.9,8.6,8.5,9.7,8.7,6.0
Episode 6,9.1,8.9,8.7,9.7,7.9,8.3,9.0,4.0
Episode 7,9.1,8.8,8.6,9.0,8.8,8.5,9.4,
Episode 8,8.9,8.6,8.9,9.7,9.8,8.3,,
Episode 9,9.6,9.6,9.9,9.6,9.4,9.9,,
Episode 10,9.4,9.3,9.0,9.6,9.1,9.9,,


### Color palletes available for matplotlib

<img src="https://i.stack.imgur.com/cmk1J.png" alt="Overview of all colormaps">