# <center> TV Episodes Data Analysis

**The dataset contains the following variables:**

* `id` - int 
* `url` - string
* `name` - string season - int 
* `number` - int
* `type` - string 
* `airdate` - date format 
* `airtime` - 12-hour time format
* `runtime`- float
* `average rating` - float
* `summary` - string without html tags
* `medium image link` - string
* `Original image link` - string

**Insights to be drawn -**

* [Get all the overall ratings for each season and using plots compare the ratings for all the seasons, like season 1 ratings, season 2, and so on.](#rate)
* [Get all the episode names, whose average rating is more than 8 for every season](#8)
* [Get all the episode names that aired before May 2019](#2019)
* [Get the episode name from each season with the highest and lowest rating](#ses)
* [Get the summary for the most popular ( ratings ) episode in every season](#sum)

In [1]:
# Importing all the required liberaries
import pandas as pd
from datetime import datetime
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as pyo
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)

In [2]:
# Loading dataset
api_df = pd.read_excel("TV Episodes API Data.xlsx")
api_df.head()  # Top 5 rows

Unnamed: 0,Id,Url,Name,Season,Number,Type,Airdate,Airtime,Airstamp,Runtime,Average rating,Medium image,Original image,Summary,Self _links,Show _links
0,869671,https://www.tvmaze.com/episodes/869671/westwor...,The Original,1,1,regular,2016-10-02,09:00 PM,2016-10-03T01:00:00+00:00,68,8.0,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...,A woman named Dolores is a free spirit in the ...,{'href': 'https://api.tvmaze.com/episodes/8696...,{'href': 'https://api.tvmaze.com/shows/1371'}
1,911201,https://www.tvmaze.com/episodes/911201/westwor...,Chestnut,1,2,regular,2016-10-09,09:00 PM,2016-10-10T01:00:00+00:00,60,7.7,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...,Bernard suspects that someone is sabotaging th...,{'href': 'https://api.tvmaze.com/episodes/9112...,{'href': 'https://api.tvmaze.com/shows/1371'}
2,911204,https://www.tvmaze.com/episodes/911204/westwor...,The Stray,1,3,regular,2016-10-16,09:00 PM,2016-10-17T01:00:00+00:00,60,7.6,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...,Bernard continues to investigate Dolores' supp...,{'href': 'https://api.tvmaze.com/episodes/9112...,{'href': 'https://api.tvmaze.com/shows/1371'}
3,911205,https://www.tvmaze.com/episodes/911205/westwor...,Dissonance Theory,1,4,regular,2016-10-23,09:00 PM,2016-10-24T01:00:00+00:00,60,7.9,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...,While Dolores joins William and Logan on their...,{'href': 'https://api.tvmaze.com/episodes/9112...,{'href': 'https://api.tvmaze.com/shows/1371'}
4,927174,https://www.tvmaze.com/episodes/927174/westwor...,Contrapasso,1,5,regular,2016-10-30,09:00 PM,2016-10-31T01:00:00+00:00,60,8.0,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...,Dolores takes the first step on her path of di...,{'href': 'https://api.tvmaze.com/episodes/9271...,{'href': 'https://api.tvmaze.com/shows/1371'}


In [3]:
print("Size of Dataset :")
api_df.shape

Size of Dataset :


(36, 16)

In [4]:
print("Info of our Dataset:")
api_df.info()

Info of our Dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 16 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Id              36 non-null     int64  
 1   Url             36 non-null     object 
 2   Name            36 non-null     object 
 3   Season          36 non-null     int64  
 4   Number          36 non-null     int64  
 5   Type            36 non-null     object 
 6   Airdate         36 non-null     object 
 7   Airtime         36 non-null     object 
 8   Airstamp        36 non-null     object 
 9   Runtime         36 non-null     int64  
 10  Average rating  36 non-null     float64
 11  Medium image    36 non-null     object 
 12  Original image  36 non-null     object 
 13  Summary         36 non-null     object 
 14  Self _links     36 non-null     object 
 15  Show _links     36 non-null     object 
dtypes: float64(1), int64(4), object(11)
memory usage: 4.6+ KB


In [5]:
# having any missing value
api_df.isnull().sum()

Id                0
Url               0
Name              0
Season            0
Number            0
Type              0
Airdate           0
Airtime           0
Airstamp          0
Runtime           0
Average rating    0
Medium image      0
Original image    0
Summary           0
Self _links       0
Show _links       0
dtype: int64

In [9]:
# any duplicate rows
api_df.duplicated().sum()

0

In [10]:
# Now we drop the column which are not required
drop_col = ["Id", "Self _links", "Show _links"]
api_df.drop(columns=drop_col, axis=1, inplace=True)

In [11]:
# stastical description of numerical columns
api_df.describe()

Unnamed: 0,Season,Number,Runtime,Average rating
count,36.0,36.0,36.0,36.0
mean,2.388889,5.055556,62.888889,7.872222
std,1.12828,2.714453,8.386308,0.373869
min,1.0,1.0,51.0,7.1
25%,1.0,3.0,60.0,7.675
50%,2.0,5.0,60.0,7.8
75%,3.0,7.0,60.0,8.0
max,4.0,10.0,90.0,8.7


In [12]:
# stastical description of categorical columns
api_df.describe(include=object)

Unnamed: 0,Url,Name,Type,Airdate,Airtime,Airstamp,Medium image,Original image,Summary
count,36,36,36,36,36,36,36,36,36
unique,36,36,1,36,1,36,36,36,36
top,https://www.tvmaze.com/episodes/869671/westwor...,The Original,regular,2016-10-02,09:00 PM,2016-10-03T01:00:00+00:00,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...,A woman named Dolores is a free spirit in the ...
freq,1,1,36,1,36,1,1,1,1


In [13]:
# value counts for specific columns
print("Season :")
api_df['Season'].value_counts()

Season :


1    10
2    10
3     8
4     8
Name: Season, dtype: int64

In [14]:
print("Number :")
api_df["Number"].value_counts()

Number :


1     4
2     4
3     4
4     4
5     4
6     4
7     4
8     4
9     2
10    2
Name: Number, dtype: int64

In [15]:
print("Runtime :")
api_df["Runtime"].value_counts()

Runtime :


60    24
55     3
90     2
70     2
68     1
74     1
71     1
75     1
51     1
Name: Runtime, dtype: int64

In [16]:
print("Average Rating :")
api_df["Average rating"].value_counts()

Average Rating :


7.7    7
8.0    6
7.9    4
7.8    4
7.5    4
7.6    3
8.5    2
8.7    2
8.6    1
8.4    1
7.1    1
7.4    1
Name: Average rating, dtype: int64

### Get all the overall ratings for each season and using plots compare the ratings for all the seasons, like season 1 ratings, season 2, and so on.<a name="rate"></a>

In [17]:
overall_reating = api_df.groupby(["Season"])["Average rating"].mean()
overall_reating

Season
1    8.0900
2    8.0000
3    7.7750
4    7.5375
Name: Average rating, dtype: float64

In [18]:
# bar plot to visualize maximum rating show among all seasons
fig = px.scatter(data_frame=api_df,
                 x=api_df["Name"],
                 y=api_df["Average rating"],
                 color=api_df["Season"].astype(str),
                 )
fig.update_layout(title="Bar plot to visualize maximum show rating among all seasons",
                  xaxis_title="Show Name")
fig.show()

In [19]:
fig1 = px.histogram(data_frame=api_df,
                    x=api_df["Season"].astype(str),
                    y="Average rating",
                    histfunc="avg",
                    color=api_df["Season"].astype(str)
                    )
fig1.update_layout(
    title="Histogram for Average ratings of each season", xaxis_title="Season")
fig1.show()

### Get all the episode names, whose average rating is more than 8 for every season <a name="8"></a>

In [20]:
# Averaging rating more than 8 fpr every season
rating_more_than_8 = api_df["Name"].loc[api_df["Average rating"] > 8]
rating_more_than_8

6                 Trompe L'Oeil
8     The Well-Tempered Clavier
9            The Bicameral Mind
17                      Kiksuya
18              Vanishing Point
19                The Passenger
Name: Name, dtype: object

In [21]:
# PLotting graph to show episodes name
sorting_ar = api_df.sort_values("Average rating")
fig2 = px.bar(data_frame=sorting_ar,
              x="Name",
              y="Average rating",
              color=sorting_ar["Season"].astype(str)
              )
fig2.update_layout(title="Episodes Average Rating")
fig2.show()

### Get all the episode names that aired before May 2019 <a name="2019"></a>

In [22]:
# Episodes name aired before may 2019
episodes_air_before_may19 = api_df.loc[api_df["Airdate"]
                                       < "2019-05-01", ["Name", "Airdate"]]
episodes_air_before_may19

Unnamed: 0,Name,Airdate
0,The Original,2016-10-02
1,Chestnut,2016-10-09
2,The Stray,2016-10-16
3,Dissonance Theory,2016-10-23
4,Contrapasso,2016-10-30
5,The Adversary,2016-11-06
6,Trompe L'Oeil,2016-11-13
7,Trace Decay,2016-11-20
8,The Well-Tempered Clavier,2016-11-27
9,The Bicameral Mind,2016-12-04


In [23]:
# graph
color = api_df["Airdate"] < "2019-05-01"
fig3 = px.bar(data_frame=api_df, x="Name", y="Airdate",
              color=color, title="Episodes Name VS Airdate ")

fig3.show()

**`In abov ebar plot, True represent the episode name before 2019 and False represnts episode name after 2019.`**

### Get the episode name from each season with the highest and lowest rating <a name="ses"><a/>

In [24]:
grp = api_df.groupby("Season")
# Episode with highest rating in each season
highest_rating = api_df.loc[grp['Average rating'].idxmax()]
# Episode with lowest rating in each season
lowest_rating = api_df.loc[grp['Average rating'].idxmin()]

In [25]:
print("Episode with highest rating in each season\n\n",
      highest_rating[["Season", "Name", "Average rating"]])

print("-----"*10)

print("Episode with lowest rating rating in each season\n\n",
      lowest_rating[["Season", "Name", "Average rating"]])

Episode with highest rating in each season

     Season                Name  Average rating
9        1  The Bicameral Mind             8.7
17       2             Kiksuya             8.7
20       3        Parce Domine             8.0
32       4            Zhuangzi             7.8
--------------------------------------------------
Episode with lowest rating rating in each season

     Season          Name  Average rating
2        1     The Stray             7.6
14       2  Akane No Mai             7.6
25       3   Decoherence             7.5
28       4  The Auguries             7.1


In [26]:
fig4 = px.histogram(data_frame=api_df, x="Average rating",
                    y="Name", color=api_df["Season"].astype(str))
fig4.show()

### Get the summary for the most popular ( ratings ) episode in every season <a name="sum"><a/>

In [29]:
for j in highest_rating[["Season", "Name", "Summary"]].itertuples(index=False):
    print(f"{j}\n")

Pandas(Season=1, Name='The Bicameral Mind', Summary="Delores finds out the truth about William's fate. Meanwhile, Maeve organizes an escape plan, only to discover that someone else is pulling the strings. And Robert plays the final piece in his grand narrative.")

Pandas(Season=2, Name='Kiksuya', Summary="Another of Westworld's Host revolutionaries is revealed. Meanwhile, Emily finds the Man and convinces the Ghost Nation to hand him over to her to ensure his suffering.")

Pandas(Season=3, Name='Parce Domine', Summary="If you're stuck in a loop, try walking in a straight line.")

Pandas(Season=4, Name='Zhuangzi', Summary='God is bored.')

