###                                                **A LOOK INTO DISNEY+ SHOWS AND MOVIES [PYTHON EXPLORATORY DATA ANALYSIS]**

As of writing this, shows have been such a rampant form of media whether it be through Netflix, Hulu, Disney+, and many more platforms. These platforms have made watching movies and tv shows so accessible there is now demand for more media to consume. We can see this as Netflix and many other platforms have created their own exclusive "Netflix" shows and movies only available to their platform. Disney+ recently have been churning out a lot of its own exclusive content, and so naturally I became curious about what makes Disney's platform any special. So, in this exploratory data analysis (EDA), we are looking at a data set of all shows/movies avaiable on Disney+ acquired in May 2022 in the United States (via Kaggle) to see if there are any interesting correlations that may lead use to understand what criteria makes a show or movie well rated and popular. By understanding these factors, perhaps we can have better insights on how to create a better reccomendation system and new customer retention/engagement.

### **Setting Up Environment**
##### Importing Necessary Libraries

In [2]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from os import path
from wordcloud import WordCloud, STOPWORDS
import nltk as nl
from nltk.corpus import stopwords
from plotly.subplots import make_subplots
import plotly.io as pio

import matplotlib.pyplot as plt
import kaleido

##### Loading in Data

In [3]:
#Load in csv files
urlt = 'https://github.com/kekevin12/Disney_EDA/blob/856c2bb8e1c4f5bc568706b7bd6c1bb212278635/titles.csv?raw=true'
titles = pd.read_csv(urlt, index_col=0)

##### Data Cleaning

Now lets take a brief look at the datasets

In [4]:
titles.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1535 entries, tm74391 to tm1091117
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   title                 1535 non-null   object 
 1   type                  1535 non-null   object 
 2   description           1529 non-null   object 
 3   release_year          1535 non-null   int64  
 4   age_certification     1210 non-null   object 
 5   runtime               1535 non-null   int64  
 6   genres                1535 non-null   object 
 7   production_countries  1535 non-null   object 
 8   seasons               415 non-null    float64
 9   imdb_id               1133 non-null   object 
 10  imdb_score            1108 non-null   float64
 11  imdb_votes            1105 non-null   float64
 12  tmdb_popularity       1524 non-null   float64
 13  tmdb_score            1426 non-null   float64
dtypes: float64(5), int64(2), object(7)
memory usage: 179.9+ KB


In [5]:
titles.head()

Unnamed: 0_level_0,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
tm74391,Fantasia,MOVIE,Walt Disney's timeless masterpiece is an extra...,1940,G,120,"['animation', 'family', 'music', 'fantasy']",['US'],,tt0032455,7.7,94681.0,57.751,7.4
tm67803,Snow White and the Seven Dwarfs,MOVIE,"A beautiful girl, Snow White, takes refuge in ...",1937,G,83,"['fantasy', 'family', 'romance', 'animation', ...",['US'],,tt0029583,7.6,195321.0,107.137,7.1
tm82546,Pinocchio,MOVIE,Lonely toymaker Geppetto has his wishes answer...,1940,G,88,"['animation', 'comedy', 'family', 'fantasy']",['US'],,tt0032910,7.5,141937.0,71.16,7.1
tm79357,Bambi,MOVIE,Bambi's tale unfolds from season to season as ...,1942,G,70,"['animation', 'drama', 'family']",['US'],,tt0034492,7.3,140406.0,68.136,7.0
tm62671,Treasure Island,MOVIE,Enchanted by the idea of locating treasure bur...,1950,PG,96,"['family', 'action']","['GB', 'US']",,tt0043067,6.9,8229.0,10.698,6.5


##### Check for Duplicates

In [6]:
# Check for duplicates
duplicateTRows = titles[titles.duplicated(keep='last')]
duplicateTRows.head()

Unnamed: 0_level_0,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1


We can see that the 'titles' dataset is comprised of 15 columns which various identifiers. There also seems to be no duplicates in the titles dataset. However, there are differing amount of entries in the columns of interest we want to analyze such as the imdb/tmdb score with 1108 and 1426 entries respectively and description column with 1529 out of the total 1535 id. To resolve this issue, it would be best to remove rows with null values so we do not underfit any calculations.

##### Cleaning Dataset

In [7]:
#Removing unecessary columns 
ctitles = titles.drop(['imdb_id','imdb_score','imdb_votes','tmdb_popularity','production_countries','seasons'], axis=1)
ctitles = ctitles.dropna(subset=['tmdb_score'])
ctitles['genres'] = ctitles.genres.str.strip('[]')
ctitles.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1426 entries, tm74391 to tm1091101
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   title              1426 non-null   object 
 1   type               1426 non-null   object 
 2   description        1426 non-null   object 
 3   release_year       1426 non-null   int64  
 4   age_certification  1155 non-null   object 
 5   runtime            1426 non-null   int64  
 6   genres             1426 non-null   object 
 7   tmdb_score         1426 non-null   float64
dtypes: float64(1), int64(2), object(5)
memory usage: 100.3+ KB


Here I removed all unnecessary columns that I will not be using and decided against using IMDB scores to evaluate the ratings since there are more TMDB entries. This was done because it would provide a greater range and allow for a more accurate analysis with better precision to better represent the entire Disney+ catalog. One note is that all the ratings from TMDB are entirely crowd-sourced and so would reflect the views of a certain base of viewers and does not include any ratings of critics or a larger population like IMDB. 

### Analysis
I am curious to see the total distribution of movies and shows across the Release Year History to see how extensive Disney+'s library is. I will be using the original dataset as the rows removed are not relevant and do not provide a full picture of the entire library.   

In [8]:
# Histogram
pio.templates.default = 'plotly_dark'
fig = px.histogram(titles, x = "release_year", color = "type", 
                   marginal= "rug", title="Release Year History of Disney+ Catalog (US)", 
                   labels= { "type" : "Type", "release_year" : "Release Year"},
                   color_discrete_map= {"MOVIE": "#B9D8EB", "SHOW" : "#C39BD3"},
                   template="plotly_dark")

fig.update_layout(paper_bgcolor = '#1A1D29', 
                  plot_bgcolor = '#1A1D29',
                  font = dict(family="Verdana",color = '#FFFFFF', size=13),
                  barmode='overlay'
                 )

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/cathistplot.png?raw=true)

It appears the oldest movie and show date back to 1928 and 1955 respectively. The catalog shows a left skew towards the more current years, however that is to be expected as there is probably more content avaliable and that digital formats are a very modern implementation. Now lets look at how many movies and shows there are respectively. 

In [9]:
#Count # of Movies and Shows
type_count = titles['type'].value_counts().rename_axis('Type').reset_index(name='Counts')

In [10]:
# Bar Chart
fig = px.bar(type_count, x ="Counts", color="Type",
                   title="Total Count of Disney+ Movies & Shows (US)", 
                   labels= { "type" : "Type", "release_year" : "Release Year"},
                   color_discrete_map= {"MOVIE": "#B9D8EB", "SHOW" : "#C39BD3"},
                   template="plotly_dark")
fig.update_yaxes(title='y', visible=False, showticklabels=False)

fig.update_layout(paper_bgcolor = '#1A1D29', 
                  plot_bgcolor = '#1A1D29',
                  font = dict(family="Verdana",color = '#FFFFFF', size=13),)

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/countbar.png?raw=true)

It appears that the Disney+ library is comprised primarily of movies at 1120 compared to 415 shows. 
 Now lets look at the average runtimes across the collection of movies and shows.

##### Ratings Analysis

Now I would like to glimpse into the average ratings across all the shows and movies available. This time we will used the cleaned data set to avoid the null values. 

In [11]:
 # Box Plot
fig = px.box(ctitles, x ="type", y="tmdb_score", color="type",
    title="TMDB Ratings of Disney+ Movies & Shows (US)", 
    labels= { "tmdb_score" : "Rating (Avg)"},
    color_discrete_map= {"MOVIE": "#B9D8EB", "SHOW" : "#C39BD3"},
    template="plotly_dark")
    
fig.update_layout(paper_bgcolor = '#1A1D29', 
                  plot_bgcolor = '#1A1D29',
                  font = dict(family="Verdana",color = '#FFFFFF', size=13),)

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/boxrating.png?raw=true)

Just looking at the overall ratings, shows seem to have a higher rating average of 7.7 while movies at 6.7. However, looking at just the rating average only does not provide much insight so I would like to further break down the ratings by looking at the different genres if there is any correlation.

##### Genre Ratings

In [143]:
# Adding new genre columns with each genre split and adding count and tmdb_scores associated with it to it
ratings = ctitles[['tmdb_score','type']].assign(
    genres=ctitles['genres'].str.split(r'\s*,\s*')).explode('genres')
ratings = ratings[ratings.genres != ""]

avg = ratings.groupby(['genres','type'])\
    .agg({'genres':'size', 'tmdb_score':'mean'})\
    .rename(columns={'genres':'count','sent':'mean_tmdb_score'})\
    .reset_index()


In [149]:
#Double Bar Graph
fig = px.bar(avg, y ="tmdb_score", x ='genres',
                   color= 'type', barmode='group',
                   title="Average TMDB Ratings of Disney+ Movies & Shows by Genres (US)", 
                   labels= { "tmdb_score" : "Rating (Avg)", "genres" : "Genres"},
                   color_discrete_map= {"MOVIE": "#B9D8EB", "SHOW" : "#C39BD3"},
                   template="plotly_dark")

fig.update_yaxes(nticks=9)

fig.update_layout(paper_bgcolor = '#1A1D29', 
                  plot_bgcolor = '#1A1D29',
                  font = dict(family="Verdana",color = '#FFFFFF', size=13),)

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/genrerating.png?raw=true)

In [147]:
#Top 5 Show Genres
sho  = avg[avg.type != 'MOVIE']
sho.nlargest(5,['tmdb_score'])

Unnamed: 0,genres,type,count,tmdb_score
35,'war',SHOW,1,8.4
21,'horror',SHOW,6,8.183333
27,'romance',SHOW,21,8.119048
9,'documentation',SHOW,89,8.076404
33,'thriller',SHOW,23,8.06087


In [148]:
#Top 5 Movie Genres
mov  = avg[avg.type != 'SHOW']
mov.nlargest(5,['tmdb_score'])

Unnamed: 0,genres,type,count,tmdb_score
24,'reality',MOVIE,3,7.8
18,'history',MOVIE,28,6.989286
8,'documentation',MOVIE,201,6.968657
22,'music',MOVIE,70,6.851429
2,'animation',MOVIE,335,6.807761


From our genre breakdown, we can see that shows on average tend to have much higher ratings compared to movies in each genres. The top five genres for shows are 1. war 2. horror 3. romance 4. documentation 5. thriller. The top 5 genres for movies are 1. reality 2. history 3. documentation 4. music 5. animation. I must say I am quite surprised to see these results as I assumed that for Disney+ which have more family-friendly titles, genres such as family and comedy would rank much higher. Now lets take a look at the runtimes for a show/movie and see if there is any correlation.

In [145]:
fig = px.scatter(ctitles, x ="runtime", y="tmdb_score", color="type",
                   title="Runtimes of Disney+ Movies & Shows (US)",
                   symbol= 'type',
                   trendline="ols",
                   labels= { "runtime" : "Runtime (mins)", "tmdb_score" : "TMDB Rating"},
                   color_discrete_map= {"MOVIE": "#B9D8EB", "SHOW" : "#C39BD3"},
                   template="plotly_dark")

fig.update_yaxes(nticks=12)
fig.update_xaxes(nticks=20)

fig.update_layout(paper_bgcolor = '#1A1D29', 
                  plot_bgcolor = '#1A1D29',
                  font = dict(family="Verdana",color = '#FFFFFF', size=13),)

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/runtimegenre.png?raw=true)

In [18]:
fig = px.box(ctitles, x ="type", y="runtime", color="type",
                   title="Runtimes of Disney+ Movies & Shows (US)",
                   labels= { "runtime" : "Runtime (mins)", "tmdb_score" : "TMDB Rating"},
                   color_discrete_map= {"MOVIE": "#B9D8EB", "SHOW" : "#C39BD3"},
                   template="plotly_dark")
                   
fig.update_layout(paper_bgcolor = '#1A1D29', 
                  plot_bgcolor = '#1A1D29',
                  font = dict(family="Verdana",color = '#FFFFFF', size=13),)

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/boxruntime.png?raw=true)

From the scatterplot there is really no strong relationship between a title's rating and its runtime. However we can clearly see two distinct clusters between movies and shows. Movie titles tend to cluster around 80 - 100 minutes, while shows around the 20 - 30 min range. This is more apparent when looking at the box plots. Since there is a large skew for both formats, looking at the median provides a more accurate representation of the runtime average. Movies on average ran for 85 minutes and shows 24 minutes. This makes sense since shows are usually a shorter format compared to movies and do not need to be as long since there are multiple episodes across a season. 

##### Description Analysis

Now lets take a look at the descriptions for these titles and see if there are any certain plot points/words that seems to be the trend and commonly repeated.

In [19]:
#Stop words that were added to list as they are not descriptive or relevant in providing insight
new_stop_words = ['one','must','disney','series','find','two','get','named','first','make','-','new','back','takes','take','set','also','...','ll','r',"he's",'tries','year'
,'across','around','true','friend','go','three','gets','--','become','time','best','high','become','–','way','animated','little','big','together','show','story','like'
,'life','world','\s','\S',' \'s'," \'S",'—','episode','television','jessica','andent',',','.','max','years','jack','soon','season']

stop = stopwords.words('english')
stop.extend(new_stop_words)

In [20]:
#Lower case all words in description and checks if description has stop words to exclude them 
dcount = ctitles["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]))

dcount = dcount.to_frame().reset_index()
dcount = dcount.iloc[:,1:]

# Counting most common words found in description
wcount = dcount["description"].str.split(expand=True).stack().value_counts()
wcount = wcount.to_frame().reset_index()
wcount.columns = ['word','count']

In [21]:
fig = px.bar(wcount.head(50), y ="count", x ='word',
                   title="Top 50 Common Words in Descriptions of Disney+ Catalog (US)", 
                   labels= { "tmdb_score" : "Rating (Avg)", "genres" : "Genres"},
                   template="plotly_dark")
                   
fig.update_layout(paper_bgcolor = '#1A1D29', 
                  plot_bgcolor = '#1A1D29',
                  font = dict(family="Verdana",color = '#FFFFFF', size=13),)

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/comdesc.png?raw=true)

In [22]:
test = " ".join(word for word in dcount["description"])

wordcloud = WordCloud(width=800,height=400,max_words=100, background_color="white", stopwords= stop).generate(test)

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/descwordcloud.png?raw=true)

There are a lot of interesting things from our descriptions that surprised. I would think that from the many of the descriptions would be comprised more of words that would be found related to the top 5 genres from shows and movies previous annotated. However, just looking at the top 50 common words from the descriptions, many of them stood out to me as very kid-friendly or at least not inline with those you would expect with the top genres such as "family", "friends", "kid". Though not really useful analytically, putting all the words into a word cloud makes it more striking visually and really see what words stand out. Perhaps ratings have less of a factor on what type of content is being made and so I would like to take a quick look at the age certification to understand the makeup of the audience Disney+ seems to be catered towards. 

In [150]:
age = ctitles['age_certification'].value_counts().rename_axis('Age Certification').reset_index(name='Counts')
age.insert(2, "Type", ['Movie','Movie','Movie','Show','Show','Show','Show','Show','Show'],True)

In [24]:
fig = px.bar(age, y = "Counts", x = "Type", barmode='stack', color="Age Certification", text= "Age Certification",
                title="Age Certification Count of Disney+ Catalog (US)",
                template="plotly_dark")

fig.update_layout(paper_bgcolor = '#1A1D29', 
                  plot_bgcolor = '#1A1D29',
                  font = dict(family="Verdana",color = '#FFFFFF', size=13),)

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/agecert.png?raw=true)

Seeing the break down of the age certification, it appears that fail to account for the sample size of the ratings. Just only the number of G rate movies is already greater than all the entries of shows combined. This likely accounts to why so the common words found in the description are much more kid-friendly especially when I do see that in both move and show age certifications, G ratings are the most common ratings so perhaps even by seperating the two descriptions by type would not create any big differences at all and should look at common occurences within each age certification instead.

In [152]:
#Creating Dataset Seperating Age Certification
Gdesc = ctitles.loc[ctitles['age_certification'] == 'G']
PGdesc = ctitles.loc[ctitles['age_certification'] == 'PG']
PG13desc = ctitles.loc[ctitles['age_certification'] == 'PG-13']
TVGdesc = ctitles.loc[ctitles['age_certification'] == 'TV-G']
TVPGdesc = ctitles.loc[ctitles['age_certification'] == 'TV-PG']
TVY7desc = ctitles.loc[ctitles['age_certification'] == 'TV-Y7']
TVYdesc = ctitles.loc[ctitles['age_certification'] == 'TV-Y']
TV14desc = ctitles.loc[ctitles['age_certification'] == 'TV-14']
TVMAdesc = ctitles.loc[ctitles['age_certification'] == 'TV-MA']

In [153]:
#G Rating Word Count
Gcount= Gdesc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
Gcount = Gcount.to_frame().reset_index()
Gcount = Gcount.iloc[:,1:]
Gwcount = Gcount["description"].str.split(expand=True).stack().value_counts()
Gwcount = Gwcount.to_frame().reset_index()
Gwcount.columns = ['word','count']

#PG Word Count
PGcount = PGdesc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
PGcount = PGcount.to_frame().reset_index()
PGcount = PGcount.iloc[:,1:]
PGwcount = PGcount["description"].str.split(expand=True).stack().value_counts()
PGwcount = PGwcount.to_frame().reset_index()
PGwcount.columns = ['word','count']

#PG 13 Word Count
PG13count = PG13desc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
PG13count = PG13count.to_frame().reset_index()
PG13count = PG13count.iloc[:,1:]
PG13wcount = PG13count["description"].str.split(expand=True).stack().value_counts()
PG13wcount = PG13wcount.to_frame().reset_index()
PG13wcount.columns = ['word','count']

#TV G Word Count
TVGcount = TVGdesc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
TVGcount = TVGcount.to_frame().reset_index()
TVGcount = TVGcount.iloc[:,1:]
TVGwcount = TVGcount["description"].str.split(expand=True).stack().value_counts()
TVGwcount = TVGwcount.to_frame().reset_index()
TVGwcount.columns = ['word','count']

#TV PG Word Count
TVPGcount = TVPGdesc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
TVPGcount = TVPGcount.to_frame().reset_index()
TVPGcount = TVPGcount.iloc[:,1:]
TVPGwcount = TVPGcount["description"].str.split(expand=True).stack().value_counts()
TVPGwcount = TVPGwcount.to_frame().reset_index()
TVPGwcount.columns = ['word','count']

#TVY7 Word Count
TVY7count = TVY7desc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
TVY7count = TVY7count.to_frame().reset_index()
TVY7count = TVY7count.iloc[:,1:]
TVY7wcount = TVY7count["description"].str.split(expand=True).stack().value_counts()
TVY7wcount = TVY7wcount.to_frame().reset_index()
TVY7wcount.columns = ['word','count']

#TVY Word Count
TVYcount = TVYdesc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
TVYcount = TVYcount.to_frame().reset_index()
TVYcount = TVYcount.iloc[:,1:]
TVYwcount = TVYcount["description"].str.split(expand=True).stack().value_counts()
TVYwcount = TVYwcount.to_frame().reset_index()
TVYwcount.columns = ['word','count']

#TV14 Word Count
TV14count = TV14desc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
TV14count = TV14count.to_frame().reset_index()
TV14count = TV14count.iloc[:,1:]
TV14wcount = TV14count["description"].str.split(expand=True).stack().value_counts()
TV14wcount = TV14wcount.to_frame().reset_index()
TV14wcount.columns = ['word','count']

#TVMA Word Count
TVMAcount = TVMAdesc["description"].str.lower().apply(lambda x:' '.join([word for word in str(x).split() if word not in (stop)]).replace(',','').replace('.',''))
TVMAcount = TVMAcount.to_frame().reset_index()
TVGcount = TVGcount.iloc[:,1:]
TVMAwcount = TVMAcount["description"].str.split(expand=True).stack().value_counts()
TVMAwcount = TVMAwcount.to_frame().reset_index()
TVMAwcount.columns = ['word','count']

In [154]:
#Creating Bar Charts for Each Age Certification
Gwcount1 = Gwcount.head(10)
figG = go.Bar(y = Gwcount1['count'], x = Gwcount1['word'],name='G')
#
PGwcount1 = PGwcount.head(10)
figPG = go.Bar(y = PGwcount1['count'], x = PGwcount1['word'],name='PG')
#
PG13wcount1 = PG13wcount.head(10)
figPG13 = go.Bar(y = PG13wcount1['count'], x = PG13wcount1['word'],name='PG-13')
#
TVGwcount1 = TVGwcount.head(10)
figTVG = go.Bar(y = TVGwcount1['count'], x = TVGwcount1['word'],name='TV-G')
#
TVPGwcount1 = TVPGwcount.head(10)
figTVPG = go.Bar(y = TVPGwcount1['count'], x = TVPGwcount1['word'],name='TV-PG')
#
TVY7wcount1 = TVY7wcount.head(10)
figTVY7 = go.Bar(y = TVY7wcount1['count'], x = TVY7wcount1['word'],name='TV-Y7')
#
TVYwcount1 = TVYwcount.head(10)
figTVY = go.Bar(y = TVYwcount1['count'], x = TVYwcount1['word'],name='TV-Y')
#
TV14wcount1 = TV14wcount.head(10)
figTV14 = go.Bar(y = TV14wcount1['count'], x = TV14wcount1['word'],name='TV-14')
#
TVMAwcount1 = TVMAwcount.head(10)
figTVMA = go.Bar(y = TVMAwcount1['count'], x = TVMAwcount1['word'],name='TV-MA')

In [157]:
#Combining All Bar Charts into One Figure
figAC = make_subplots(rows=3, cols=3)
figAC.add_trace(figG,row=1,col=1)
figAC.add_trace(figPG,row=1,col=2)
figAC.add_trace(figPG13,row=1,col=3)
figAC.add_trace(figTVG,row=2,col=1)
figAC.add_trace(figTVPG,row=2,col=2)
figAC.add_trace(figTVY7,row=2,col=3)
figAC.add_trace(figTVY,row=3,col=1)
figAC.add_trace(figTV14,row=3,col=2)
figAC.add_trace(figTVMA,row=3,col=3)

figAC.update_layout(height=1000, width=1400,title_text="Top 10 Common Words in Descriptions of Disney+ Catalog",
    paper_bgcolor = '#1A1D29', 
    plot_bgcolor = '#1A1D29',
    font = dict(family="Verdana",color = '#FFFFFF', size=13))

![image](https://github.com/kekevin12/Disney_EDA/blob/main/Graphs/agedesc.png?raw=true)

After seperating the descriptions by age certification, a noticeable trend is that P and PG rated media share very similar word counts, which could suggest that these ideas/words tend to me very popular with the demographic. Furthermore, TV-Y,TV-G and TV-PG do share some similarities with G and PG rated media with only minor word differences but and overall empahsis on familial terms. Interestingly enough, PG-13, TV-Y7, TV-14. and TV-MA have elements of the Marvel franchise within their top 10 word counts which could be a sign of the popularity of the franchise with the community. However, it is worth noting that TV-MA only consists of six shows which does not provide a broad scope, as they are all Marvel shows and interconnected with one another, so this would not accurately represent the composition of TV-MA rated shows entirely. A funny find is that TV-14 has alot of fish-related terms such as 'bluefin','fisherman',and 'port' which I found interesting as there are only two shows that contain all those elements. This means that these insights should be taken cautiously as some show descriptions may have just simply repeated certain elements over and over and thus will appear on these charts over other words. Overall a general trend that I seem to find is that the majority of the Disney + catalog is definitely geared towards children which surprised me because I always hear about the next Marvel or Star Wars shows, so I assumed that more of the popularity would leans towards those franchises.  