<h1>Netflix Data Analysis</h1>
<p>The aim of this work is to do some Exploratory Data Analysis (EDA), for a Netflix dataset. We have to answer the following questions:</p>
<ul>
    <li>What are the most preferred genres?</li>
    <li>Which Netflix shows have the highest ratings?</li>
    <li>What’s the best time of the year to release a show on Netflix?</li>
    <li>Which countries have added most content?</li>
</ul>
<p>The data for this analysis can be downloaded from <a href="https://www.kaggle.com/datasets/shivamb/netflix-shows">kaggle</a>.</p>


<h2>Library import</h2>

In [2]:
import numpy as np
import pandas as pd
from collections import Counter
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import cm
import warnings


In [3]:
# Ignore notifications
warnings.filterwarnings('ignore')

<h2>Dataset</h2>

In [18]:
df = pd.read_csv('../netflix/csvData/netflix_titles.csv')

In [19]:
df.head(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


<h4>Columns' Description</h4>
<p>This dataset contains data collected from Netflix of different TV shows and movies from the year 2008 to 2021.</p>
<ul>
    <li><b>type:</b> Gives information about 2 different unique values one is TV Show and another is Movie.</li>
    <li><b>title:</b> Gives information about the title of Movie or TV Show.</li>
    <li><b>director:</b> Gives information about the director who directed the Movie or TV Show.</li>
    <li><b>cast:</b> Gives information about the cast who plays role in Movie or TV Show.</li>
    <li><b>release_year:</b> Gives information about the year when Movie or TV Show was released.</li>
    <li><b>rating:</b> Gives information about the Movie or TV Show are in which category (eg like the movies are only for students, or adults, etc).</li>
    <li><b>duration:</b> Gives information about the duration of Movie or TV Show.</li>
    <li><b>listed_in:</b> Gives information about the genre of Movie or TV Show.</li>
    <li><b>description:</b> Gives information about the description of Movie or TV Show.</li>
</ul>

In [21]:
df['release_year'].describe() # There is something weird with the min for the release year

count    8807.000000
mean     2014.180198
std         8.819312
min      1925.000000
25%      2013.000000
50%      2017.000000
75%      2019.000000
max      2021.000000
Name: release_year, dtype: float64

<h4>Data Cleaning</h4>

In [22]:
print(f'Number of missing values in DataFrame:\n{df.isna().sum()}')


Number of missing values in DataFrame:
show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64


<p>There are many columns that need to be cleaned. Let's start with director and cast, because we are not going to do any analysis with those parameters.</p>

In [23]:
df = df.dropna(how='any', subset=['cast', 'director'])

Now, let's to drop all the null values

In [26]:
df = df.dropna()

In [33]:
print(f'Number of missing values in DataFrame:\n{df.isna().sum()}')


Number of missing values in DataFrame:
show_id         0
type            0
title           0
director        0
cast            0
country         0
date_added      0
release_year    0
rating          0
duration        0
listed_in       0
description     0
year_added      0
month_added     0
season_count    0
dtype: int64


Next, we have to convert some columns into proper date time format

In [27]:
df['date_added'] = pd.to_datetime(df['date_added'])
df['year_added'] = df['date_added'].apply(lambda datetime: datetime.year)
df['month_added'] = df['date_added'].apply(lambda datetime: datetime.month)

Also, looking at duration column, there are many values that can be classified as a season too.

In [30]:
df['season_count'] = df.apply(lambda x: x['duration'].split(" ")[0] if "Season" in x['duration'] else "", axis=1)
df['duration'] = df.apply(lambda x: x['duration'].split(" ")[0] if "Season" not in x['duration'] else "", axis=1)


In [32]:
df.head(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,season_count
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125.0,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021,9,
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,2021-09-24,2021,TV-14,,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...,2021,9,9.0
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104.0,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021,9,


Now, let's change the column name "listed_in" by "genre"

In [34]:
df = df.rename(columns={'listed_in':'genre'})
df['genre'] = df['genre'].apply(lambda x: x.split(",")[0])

In [35]:
df.head(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,genre,description,year_added,month_added,season_count
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125.0,Dramas,"On a photo shoot in Ghana, an American model s...",2021,9,
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,2021-09-24,2021,TV-14,,British TV Shows,A talented batch of amateur bakers face off in...,2021,9,9.0
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104.0,Comedies,A woman adjusting to life after a loss contend...,2021,9,


<h3>Exploratory Data Analysis (EDA)</h3>

<p>Let's start analyzing what type of content is the most watched.</p>
<h4>Graph 1</h4>

In [40]:

fig_donut = px.pie(df, names='type', height=300, width=600, hole=0.7, title='Most watched on Netflix', color_discrete_sequence=['#b20710', '#221f1f'])

fig_donut.update_traces(hovertemplate=None, textposition='outside', textinfo='percent+label', rotation=90)
fig_donut.update_layout(margin=dict(t=100, b=30, l=0, r=0), showlegend=False,
                        plot_bgcolor='#333', paper_bgcolor='#333', title_font=dict(size=45, color='#8a8d93', family="Lato, sans-serif"), font=dict(size=17, color='#8a8d93'), hoverlabel=dict(bgcolor="#444", font_size=13, font_family="Lato, sans-serif"))


<h5>Conclusion 1</h5>
<p>From graph 1, we may see that the ratio to Movies:TV Shows is 97:3. So, there is a clear people's preference for movies over TV Shows.</p>

<h4>Graph 2</h4>
<p>Now, let's see how has been the impact of movies and tv shows over the years.</p>

In [39]:
d1 = df[df['type'] == "TV Show"]
d2 = df[df['type'] == "Movie"]

col = 'year_added'

vc1 = d1[col].value_counts().reset_index().rename(columns={col: 'count', 'index': col})
vc1['percent'] = vc1['count'].apply(lambda x: 100*x/sum(vc1['count']))
vc1 = vc1.sort_values(col)

vc2 = d2[col].value_counts().reset_index().rename(columns={col: 'count', 'index': col})
vc2['percent'] = vc2['count'].apply(lambda x: 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)

trace1 = go.Scatter(x=vc1[col], y=vc1['count'], name="TV Shows", marker=dict(color='orange'))
trace2 = go.Scatter(x=vc2[col], y=vc2['count'], name="Movie", marker=dict(color='#b20710'))
data = [trace1, trace2]
fig_line = go.Figure(data)

fig_line.update_traces(hovertemplate=None)
fig_line.update_xaxes(showgrid=False)
fig_line.update_yaxes(showgrid=False)

large_title_format = 'Tv Show and Movies impact over the Year'
small_title_format = "<span style='font-size:13px; font-family:Tahoma'>Due to Covid updatation of content is slowed."
fig_line.update_layout(title=large_title_format + "<br>" + small_title_format, height=400,
                    margin=dict(t=130, b=0, l=70, r=40),
                    hovermode='x unified', xaxis_title=' ', yaxis_title=' ', plot_bgcolor='#333', paper_bgcolor='#333', title_font=dict(size=25, color='#8a8d93', family='Lato, sans-serif'), font=dict(color='#8a8d93'), legend=dict(orientation="h", yanchor="bottom", y=1, xanchor="center", x=0.5))

fig_line.add_annotation(dict(x=0.8, y=0.3, ax=0, ay=0, xref="paper", yref="paper",
                        text="Highest number of <b>Tv Shows</b><br> were released in <b>2020</b><br> followed by 2017."))
fig_line.add_annotation(dict(x=0.9, y=1, ax=0, ay=0, xref="paper", yref='paper',
                        text="Highest number of <b>Movies</b> were relased<br> in <b>2019</b> followed by 2020"))

fig_line.show()


<h4>Conclusion 2</h4>
<p>As expected, most content watched on Netflix is the movies. However, call my attention to the fact that 2019 had the highest movie released while 2020 had a spike in TV shows released. This effect probably can be attributed to the pandemic and curfews around the world.</p>

<h4>Graph 3</h4>
<p>The next step is to investigate which month is the best one to release content.</p>

In [41]:
df_month = pd.DataFrame(df.month_added.value_counts()).reset_index().rename(columns={'index':'month',
                        'month_added':'count'})
df_month['month_final'] = df_month['month'].replace({1:'Jan', 2:'Feb', 3:'Mar', 4:'Apr',
                        5:'May', 6:'Jun', 7:'Jul', 8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'})

fig_month = px.funnel(df_month, x='count', y='month_final', title='Best Month for Releasing Content',
                    height=350, width=600, color_discrete_sequence=['#b20710'])
fig_month.update_xaxes(showgrid=False, ticksuffix=' ', showline=True)
fig_month.update_traces(hovertemplate=None, marker=dict(line=dict(width=0)))
fig_month.update_layout(margin=dict(t=60, b=20, l=70, r=40),
                        xaxis_title=' ', yaxis_title=' ',
                        plot_bgcolor='#333', paper_bgcolor='#333',
                        title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
                        font=dict(color='#8a8d93'),
                        hoverlabel=dict(bgcolor='black', font_size=13, font_family='Lato, sans-serif'))


<h4>Conclusion 3</h4>
<p>Looking at the funnel chart, we may see that looks like the last quarter of the year is the best time to release content.</p>

<h4>Graph 4</h4>
<p>As next step let's see which country has added the most content to Netflix</p>

In [43]:
df['country'] = df['country'].apply(lambda x: x.split(",")[0])


In [58]:

df_ctry = pd.DataFrame(df.country.value_counts()).reset_index()

fig_bars_ctry = px.bar(df_ctry[:20], x='country', y='index', text='index',
                    title='Country that has Added Most Content',
                    color_discrete_sequence=['#b20710'])
fig_bars_ctry.update_traces(hovertemplate=None, marker=dict(line=dict(width=0)))
fig_bars_ctry.update_xaxes(visible=False)
fig_bars_ctry.update_yaxes(visible=False, categoryorder='total ascending')
fig_bars_ctry.update_layout(height=600,
                            margin=dict(t=100, b=20, l=70, r=40),
                            hovermode='y unified',
                            plot_bgcolor='#333', paper_bgcolor='#333',
                            title_font=dict(size=25, color='#8a8d93', family='Lato, sans-serif'),
                            font=dict(color='#8a8d93', size=13))



<h4>Conclusion 4</h4>
<ul>
    <li>USA is the country that has added more content to Netflix, followed by India and UK.</li>
    <li>For South America, Argentina, and Brazil are in positions 19 and 20 respectively, with 52 and 51 shows added to Netflix.</li>
</ul>

<h4>Graph 5</h4>
<p>Now, let's see how has been the dynamics of content by years.</p>

In [62]:
df_ctry2 = df.groupby('year_added')['country'].value_counts().reset_index(name='counts')

fig = px.choropleth(df_ctry2, locations='country', color='counts',
                    locationmode='country names',
                    animation_frame='year_added',
                    range_color=[0,200],
                    color_continuous_scale=px.colors.sequential.RdBu)

fig.update_layout(title='Dynamics of content by years')

fig.show()


<h4>Conclusion 5</h4>
<p>The analysis above shows a few interesting things about the content by country.
<ul>
    <li>From 2008 till 2012, the USA was practically the only country adding content to Netflix. Just in 2011, Spain did contribute with one movie.</li>
    <li>Just in 2015, countries outside of North America started to contribute content to Netflix, being Mexico the first Latin American country to do so.</li>
    <li>In 2016, India started to add content to Netflix, as well as South American countries, however, since that year India has been adding a high amount of content.</li>
</ul>
</p>

<h4>Graph 6</h4>
<p>Now, let's have a look on the ratings of shows on the Netflix platform.</p>

In [66]:
dfCopy = df.copy()

df_tvShow = dfCopy[dfCopy['type']=='TV Show'][['rating', 'type']].rename(columns={'type':'tv_show'})
df_movie = dfCopy[dfCopy['type']=='Movie'][['rating', 'type']].rename(columns={'type':'movie'})
df_movie = pd.DataFrame(df_movie.rating.value_counts()).reset_index().rename(columns={'index':'movie'})

df_tvShow = pd.DataFrame(df_tvShow.rating.value_counts()).reset_index().rename(columns={'index':'tv_show'})
df_tvShow['rating_final'] = df_tvShow['rating']
df_tvShow['rating'] *= -1 # this is to do the rating negative

# chart
fig = make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_yaxes=True, horizontal_spacing=0)

# bar plot for tv shows
fig.append_trace(go.Bar(x=df_tvShow.rating, y=df_tvShow.tv_show, orientation='h', showlegend=True,
                        text=df_tvShow.rating_final, name='TV Show', marker_color='#221f1f'), 1, 1)

# bar plot for movies
fig.append_trace(go.Bar(x=df_movie.rating, y=df_movie.movie, orientation='h', showlegend=True,
                        text=df_movie.rating, name='Movie', marker_color='#b20710'), 1, 2)

# styling the chart
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False, categoryorder='total ascending', ticksuffix=' ', showline=False)
fig.update_traces(hovertemplate=None, marker=dict(line=dict(width=0)))
fig.update_layout(title='Which has the highest rating TV shows or Movies?',
                margin=dict(t=80, b=0, l=70, r=40),
                hovermode='y unified',
                xaxis_title=' ', yaxis_title=' ',
                plot_bgcolor= '#333', paper_bgcolor='#333',
                title_font=dict(size=25, color='#8a8d93', family='Lato, sans-serif'),
                font=dict(color='#8a8d93'),
                legend=dict(orientation='h', yanchor='bottom', y=1, xanchor='center', x=0.5),
                hoverlabel=dict(bgcolor='black', font_size=13, font_family= 'Lato, sans-serif'))

fig.show()


<h4>Conclusion 6</h4>
<p>Netflix content most voted, in both cases TV shows and movies, are TV-MA, followed by TV-14.</p>

<h4>Graph 7</h4>
<p>Now, let's analyze the people's preferred genre</p>

In [68]:
df_m = df[df['type'] == 'Movie']
df_m = pd.DataFrame(df_m['genre'].value_counts()).reset_index()

fig_bars = px.bar(df_m[:5], x='genre', y='index', text='index',
                title='Most preferd Genre for Movies',
                color_discrete_sequence=['#b20710'])
fig_bars.update_traces(hovertemplate=None, marker=dict(line=dict(width=0)))
fig_bars.update_xaxes(visible=False)
fig_bars.update_yaxes(visible=False, categoryorder='total ascending')
fig_bars.update_layout(height=300,
                    margin=dict(t=100, b=20, l=70, r=40),
                    hovermode="y unified",
                    plot_bgcolor='#333', paper_bgcolor='#333',
                    title_font=dict(size=25, color='#8a8d93',family="Lato, sans-serif"),
                    font=dict(color='#8a8d93', size=13))


In [70]:
df_tv = df[df['type'] == 'TV Show']
df_tv = pd.DataFrame(df_tv['genre'].value_counts()).reset_index()

fig_tv = px.bar(df_tv[:5], x='genre', y='index', text='index',
                title='Most preferd Genre for TV Shows',
                color_discrete_sequence=['#b20710'])
fig_tv.update_traces(hovertemplate=None, marker=dict(line=dict(width=0)))
fig_tv.update_xaxes(visible=False)
fig_tv.update_yaxes(visible=False, categoryorder='total ascending')
fig_tv.update_layout(height=300,
                    margin=dict(t=100, b=20, l=70, r=40),
                    hovermode="y unified",
                    plot_bgcolor='#333', paper_bgcolor='#333',
                    title_font=dict(size=25, color='#8a8d93',family="Lato, sans-serif"),
                    font=dict(color='#8a8d93', size=13))

fig_tv.show()


<h4>Conclusion 7</h4>
<p>From the graphs above, we may conclude that people prefer:</p>
<ul>
    <li>About movies, drama and comedies are the most preferred genre.</li>
    <li>For TV Shows, international tv shows and crime tv shows are preferred by the public.</li>
</ul>


<h4>Graph 8</h4>
<p>What about with watching movies over the years</p>

In [73]:
d2 = df[df["type"] == "Movie"]
col = "year_added"

vc2 = d2[col].value_counts().reset_index().rename(
    columns={col: "count", "index": col})
vc2['percent'] = vc2['count'].apply(lambda x: 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)

vc2

Unnamed: 0,year_added,count,percent
12,2008,1,0.019286
11,2009,2,0.038573
13,2010,1,0.019286
8,2011,13,0.250723
10,2012,3,0.057859
9,2013,6,0.115718
7,2014,14,0.27001
6,2015,47,0.906461
5,2016,195,3.760849
4,2017,702,13.539055


In [77]:
fig2 = go.Figure(go.Waterfall(
    name="Movie", orientation="v",
    x=["2008", "2009", "2010", "2011", "2012", "2013", "2014",
        "2015", "2016", "2017", "2018", "2019", "2020", "2021"],
    textposition="auto",
    text=["1", "2", "1", "13", "3", "6", "14", "47",
        "195", "702", "1085", "1236", "1151", "729"],
    y=[1, 2, -1, 13, -3, 6, 14, 47, 195, 702, 1085, 1236, -1151, -729],
    connector={"line": {"color": "#b20710"}},
    increasing={"marker": {"color": "#b20710"}},
    decreasing={"marker": {"color": "orange"}},

))
fig2.update_xaxes(showgrid=False)
fig2.update_yaxes(showgrid=False, visible=False)
fig2.update_traces(hovertemplate=None)
fig2.update_layout(title='Watching Movies over the year', height=350,
                margin=dict(t=80, b=20, l=50, r=50),
                hovermode="x unified",
                xaxis_title=' ', yaxis_title=" ",
                plot_bgcolor='#333', paper_bgcolor='#333',
                title_font=dict(size=25, color='#8a8d93',family="Lato, sans-serif"),
                font=dict(color='#8a8d93'))


In [76]:
d3 = df[df["type"] == "TV Show"]
col = "year_added"

vc3 = d3[col].value_counts().reset_index().rename(
    columns={col: "count", "index": col})
vc3['percent'] = vc3['count'].apply(lambda x: 100*x/sum(vc3['count']))
vc3 = vc3.sort_values(col)

vc3


Unnamed: 0,year_added,count,percent
7,2013,1,0.680272
6,2015,3,2.040816
5,2016,7,4.761905
3,2017,22,14.965986
4,2018,16,10.884354
1,2019,29,19.727891
0,2020,43,29.251701
2,2021,26,17.687075


In [78]:
fig3 = go.Figure(go.Waterfall(
    name="TV Show", orientation="v",
    x=["2013", "2015", "2016", "2017", "2018", "2019", "2020", "2021"],
    textposition="auto",
    text=["1", "3", "7", "22", "16", "29", "43", "26"],
    y=[1, 3, 7, 22, -16, 29, 43, -26],
    connector={"line": {"color": "#b20710"}},
    increasing={"marker": {"color": "#b20710"}},
    decreasing={"marker": {"color": "orange"}},

))
fig3.update_xaxes(showgrid=False)
fig3.update_yaxes(showgrid=False, visible=False)
fig3.update_traces(hovertemplate=None)
fig3.update_layout(title='Watching TV Shows over the year', height=350,
                margin=dict(t=80, b=20, l=50, r=50),
                hovermode="x unified",
                xaxis_title=' ', yaxis_title=" ",
                plot_bgcolor='#333', paper_bgcolor='#333',
                title_font=dict(size=25, color='#8a8d93',family="Lato, sans-serif"),
                font=dict(color='#8a8d93'))


<h4>Conclusion 8</h4>
<p>The Waterfall chart shows that 2010 and 2012, were not good years for movie sales. However, 2020 and 2021 were the worst, and this effect can be attributed to the impact of COVID-19 worldwide.</p>
<p>For TV Shows, 2016 was a bad year for sales, however, unlike the movies, 2020 was a good year for sales, probably people preferred to run a marathon of TV Shows, instead of watching a movie. The situation changes in 2021, here we can see also a big fall in both, movies and TV Shows.</p>

<h4>Graph 9</h4>
<p>Finally, let's see the time duration of the movies.</p>

In [105]:
import cufflinks as cf

df_movies = df.loc[df['type'] == 'Movie']
df_movies['duration'] = df_movies['duration'].apply(lambda x: float(x))

cf.go_offline()
cf.set_config_file(sharing='public', theme='space',offline=True, world_readable=True)

df_movies['duration'].iplot(kind='hist', xTitle='Duration (minutes)', yTitle='Num of Movies',
                            title='Histogram: Time Duration of Movies', color='#b20710')


<h4>Conclusion 9</h4>
<ul>
    <li>There are very few short films added into Netflix</li>
    <li>Also, there are some long films, with a duration bigger than 3 hours!</li>
    <li>In a general way, movies with a duration close to 100 minutes, are the majority.</li>
</ul>

<h4>Final Conclusions</h4>
<p>Through this notebook, we have developed different kinds of analyses in order to understand the Netflix data, and those analyses have allowed us to understand the data and get a good amount of conclusions. However, in the beginning, we raised four concerns, that we can now answer.</p>
<ol>
    <li><b>What are the most preferred genres?</b> From conclusion 7, we know that for movies the preferred genre is drama, and for TV Shows, the preferred genre is the international tv shows.</li>
    <li><b>Which Netflix shows have the highest ratings?</b> Regarding conclusion 6, we have that for both cases, movies and tv shows, the highest rating is the TV-MA (Mature Audience Only), i.e., programs specifically designed to be viewed by adults and therefore may be unsuitable for children under 17.</li>
    <li><b>What’s the best time of the year to release a show on Netflix?</b> Recalling conclusion 6 looks like the best period of the year to release movies is the last quarter, i.e., from October to December.</li>
    <li><b>Which countries have added the most content?</b> From conclusion 4, we have that the countries that have added content are the USA, followed by India, and United Kingdom.</li>
</ol>