In [129]:
# importing the necccessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [132]:
# loading our datset which is csv type file
df=pd.read_csv("netflix_titles.csv")

In [131]:
df.head(5) # lets look at first 5 rows of our dataframe

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Action & Adventure","To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war."
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV Comedies","In a city of coaching centers known to train India’s finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life."


In [133]:
df.info() # # let's see the high level data details


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


### what did you observed? yes you are right,
The above code shows that there are some null values in the data as well as Shows the total rows, name and number of columns and their datatypes
   

another approach you can make is 

In [20]:
def missing_pct(df):
    # Calculate missing value and their percentage for each column
    missing_count_percent = df.isnull().sum() * 100 / df.shape[0]
    df_missing_count_percent = pd.DataFrame(missing_count_percent).round(2)
    df_missing_count_percent = df_missing_count_percent.reset_index().rename(
                    columns={
                            'index':'Column',
                            0:'Missing_Percentage (%)'
                    }
                )
    df_missing_value = df.isnull().sum()
    df_missing_value = df_missing_value.reset_index().rename(
                    columns={
                            'index':'Column',
                            0:'Missing_value_count'
                    }
                )
    # Sort the data frame
    #df_missing = df_missing.sort_values('Missing_Percentage (%)', ascending=False)
    Final = df_missing_value.merge(df_missing_count_percent, how = 'inner', left_on = 'Column', right_on = 'Column')
    Final = Final.sort_values(by = 'Missing_Percentage (%)',ascending = False)
    return Final

missing_pct(df)

Unnamed: 0,Column,Missing_value_count,Missing_Percentage (%)
3,director,2634,29.91
5,country,831,9.44
4,cast,825,9.37
6,date_added,10,0.11
8,rating,4,0.05
9,duration,3,0.03
0,show_id,0,0.0
1,type,0,0.0
2,title,0,0.0
7,release_year,0,0.0


The function ```missing_pct``` takes a data frame as an input and returns a data frame, where each row corresponds to a column in the original dataframe and contains column's name, number of missing values in that column as well as percentage of the missing values.


<a id="section-two"></a>
# Handling the missing data and deleting duplicates

It is important to handle missing data because any statistical results based on a dataset with non-random missing values could be biased. So you really want to see if these are random or non-random missing values.

Drop the columns which has high number of missing values.

We can impute(filling the missing values using the available information such as mean, median) but we should carefully see the pattern of the column before doing imputation.

1. Rating - manually filling the data usin data from Netflix website

2. Country - replacing blank countries with the most common country

3. Cast - replacing null value with "Data not available"

4. Director - replacing null value with "Data not available"

In [26]:
df["rating"]

0       PG-13
1       TV-MA
2       TV-MA
3       TV-MA
4       TV-MA
        ...  
8802        R
8803    TV-Y7
8804        R
8805       PG
8806    TV-14
Name: rating, Length: 8807, dtype: object

In [32]:
# Rating data is mentioned incorrectly for few titles in the input file. Hence correcting it by checking the Maturity rating online

df['rating'] = df['rating'].replace({'74 min': 'TV-MA', '84 min': 'TV-MA', '66 min': 'TV-MA'})
df['rating'] = df['rating'].replace({'TV-Y7-FV': 'TV-Y7'})

In [35]:
df['rating'].unique() # these are all the unique values of rating

array(['PG-13', 'TV-MA', 'PG', 'TV-14', 'TV-PG', 'TV-Y', 'TV-Y7', 'R',
       'TV-G', 'G', 'NC-17', 'NR', nan, 'UR'], dtype=object)

In [36]:
# Renaming vaules for Rating for better understanding
# Source : https://help.netflix.com/en/node/2064
df['rating'] = df['rating'].replace({
                'PG-13': 'Teens - Age above 12',
                'TV-MA': 'Adults',
                'PG': 'Kids - with parental guidence',
                'TV-14': 'Teens - Age above 14',
                'TV-PG': 'Kids - with parental guidence',
                'TV-Y': 'Kids',
                'TV-Y7': 'Kids - Age above 7',
                'R': 'Adults',
                'TV-G': 'Kids',
                 'G': 'Kids',
                'NC-17': 'Adults',
                'NR': 'NR',
                'UR' : 'UR'
                
})

In [37]:
df["rating"]

0                Teens - Age above 12
1                              Adults
2                              Adults
3                              Adults
4                              Adults
                    ...              
8802                           Adults
8803               Kids - Age above 7
8804                           Adults
8805    Kids - with parental guidence
8806             Teens - Age above 14
Name: rating, Length: 8807, dtype: object

In [38]:
df['country'] = df['country'].fillna(df['country'].mode()[0])

df['cast'].replace(np.nan, 'No Data',inplace  = True)
df['director'].replace(np.nan, 'No Data',inplace  = True)
df.dropna(inplace=True)

# Drop Duplicates
df.drop_duplicates(inplace= True)

In [45]:
# splitting the genres in different rows to use it in the viz later

#df_genre = df[df['title'].isin(['Blood & Water', 'Dick Johnson Is Dead', 'Ganglands' ])]
df_genre = df[['show_id', 'title','type', 'listed_in' ]]
df_genre = (df_genre.drop('listed_in', axis=1)
             .join
             (
             df_genre.listed_in
             .str
             .split(', ',expand=True)
             .stack()
             .reset_index(drop=True, level=1)
             .rename('listed_in')           
             ))


In [48]:
df_genre

Unnamed: 0,show_id,title,type,listed_in
0,s1,Dick Johnson Is Dead,Movie,Documentaries
1,s2,Blood & Water,TV Show,International TV Shows
1,s2,Blood & Water,TV Show,TV Dramas
1,s2,Blood & Water,TV Show,TV Mysteries
2,s3,Ganglands,TV Show,Crime TV Shows
...,...,...,...,...
8805,s8806,Zoom,Movie,Children & Family Movies
8805,s8806,Zoom,Movie,Comedies
8806,s8807,Zubaan,Movie,Dramas
8806,s8807,Zubaan,Movie,International Movies


In [49]:
# Creating new columns

df['month'] = pd.DatetimeIndex(df['date_added']).month

In [50]:
df['month']

0        9
1        9
2        9
3        9
4        9
        ..
8802    11
8803     7
8804    11
8805     1
8806     3
Name: month, Length: 8790, dtype: int64

In [52]:
# Total Shows and movies

df_count = df['show_id'].count().sum()
print(df_count)


8790


In [53]:
# Split of showes and TV
df_type = df.groupby('type')['show_id'].count().reset_index()
df_type = df_type.rename(columns = {"show_id":"count_showids"})

In [58]:
df_type

Unnamed: 0,type,count_showids
0,Movie,6126
1,TV Show,2664


<a id="section-four"></a>
# Visualization

In [150]:
import plotly.subplots as sp
import plotly.graph_objects as go

# Define a function to create subplots
def make_subplots(rows, cols, specs):
    return sp.make_subplots(rows=rows, cols=cols, specs=specs)

# Create a new figure
fig = go.Figure()

# Add an indicator trace to the figure
fig.add_trace(go.Indicator(
    value=df_count
))

# Update the layout of the figure to customize the appearance
fig = fig.update_layout(
    template={'data': {'indicator': [{'title': {'text': "Total content on Netflix"}}]}}
)

fig = fig.update_layout(
    height=100,
    margin=dict(l=50, r=50, b=0, t=1)
)

# Show the figure
fig.show()

# Create subplots with 1 row and 2 columns, with specified plot types
fig = make_subplots(rows=1, cols=2, specs=[[{'type': 'bar'}, {'type': 'pie'}]])

# Add a bar trace to the first subplot
fig.add_trace(
    go.Bar(
        x=df_type['count_showids'],
        y=df_type['type'],
        orientation='h',
        marker=dict(color=["Maroon", "blue"]),
        showlegend=False,
        text=df_type['count_showids'],
        textposition='auto'
    ),
    row=1,
    col=1
)

# Add a pie trace to the second subplot
fig.add_trace(
    go.Pie(
        labels=df_type['type'],
        values=df_type['count_showids'],
        marker_colors=["Maroon", "blue"]
    ),
    row=1,
    col=2
)

# Update the layout of the subplots
fig.update_layout(
    title_text="What type of content is more uploaded on Netflix?"
)

# Show the subplots
fig.show()


***We observe that there are more movies than TV shows on Netflix***


In [88]:
# splitting the countries in different rows 
#df_genre = df[df['title'].isin(['Blood & Water', 'Dick Johnson Is Dead', 'Ganglands' ])]
df_country = df[['show_id', 'title','type', 'country' ]]
df_country = (df_country.drop('country', axis=1)
             .join
             (
             df_country.country
             .str
             .split(', ',expand=True)
             .stack()
             .reset_index(drop=True, level=1)
             .rename('country')           
             ))


In [139]:
# import plotly.express as px

# # Calculate the total count of movies in each country
# df_country_viz_total = df_country[["title", "country"]]
# df_country_viz_total = df_country_viz_total.groupby(['country'])["title"].count().reset_index().sort_values('title', ascending=False).head(10)
# df_country_viz_total = df_country_viz_total.rename(columns={"title": "movies_count"})

# # Create a bar chart for the top 10 countries with Netflix content
# fig1 = px.bar(df_country_viz_total, x='country', y='movies_count', color_discrete_sequence=px.colors.sequential.RdBu,
#               title='Top 10 countries with Netflix Content',)

# # Calculate the count of movies in each country and the count of movies by type (Movie/TV Show)
# df_country_viz = df_country[["title", "country"]]
# df_country_viz = df_country_viz.groupby(['country'])["title"].count().reset_index().sort_values('title', ascending=False).head(10)

# df_country_viz1 = df_country[["title", "type", "country"]]
# df_country_viz1 = df_country_viz1.groupby(['country', 'type'])["title"].count().reset_index().sort_values('title', ascending=False)
# df_country_viz1 = df_country_viz1.rename(columns={"title": "movies_count"})

# # Merge the dataframes to calculate the percentage of each type (Movie/TV Show) in each country
# final1 = df_country_viz.merge(df_country_viz1, how='left', left_on='country', right_on='country')
# final1['percentage'] = (final1['movies_count'] / final1['title']) * 100
# final1['percentage'] = final1['percentage'].round(1)
# final1['percent_string'] = final1['percentage'].astype(str) + '%'

# # Create a bar chart showing the percentage of Movie/TV Show split for the top 10 countries
# fig2 = px.bar(final1, x='country', y='percentage', color='type',
#               title='Top 10 countries with Movie/TV show split')

# # Display the charts
# fig1.show()
# fig2.show()


In [148]:
fig = go.Figure()
fig.add_trace(
    
go.Bar(x= df_country_viz_total['country'], y= df_country_viz_total['movies_count'], marker_color = 'purple',
           text = df_country_viz_total['movies_count'], textposition='auto'))

fig.update_layout(title_text = "Top 10 countries with Netflix Content"
                  , yaxis=dict(title='Movies/TV Shows Count'))
fig.show()

final_movie = final1.query("type == 'Movie'")
final_show = final1.query("type == 'TV Show'")

fig = go.Figure()
fig.add_trace(go.Bar(
    x=  final_movie['country'],
    y= final_movie['percentage'],
    showlegend=True,
    text = final_movie['percent_string'], 
    textposition='auto',
    name='Movie',
    marker_color='skyblue'    
    
))
fig.add_trace(go.Bar(
    x= final_show['country'],
    y= final_show['percentage'],
    showlegend=True,
    text = final_show['percent_string'], 
    textposition='auto',
    name='TV Show',
    marker_color='grey' 
))



# Here we modify the tickangle of the xaxis, resulting in rotated labels.
fig.update_layout(barmode='stack', title_text = 'Top 10 countries with Movie/TV show split '
                  , yaxis=dict(title='% Movies/TV Shows Count'))
fig.show()




# We see that the top 5 countries for Netflix viewers are:
### ['United States', 'India', 'United Kingdom', 'Japan', 'South Korea']
 
#### We will use only those 5 countries' data to see their viewing habits

In [144]:
  import plotly.express as px

# Selecting relevant columns from the dataframe
df_country_viz = df_country[["title", "type", "country"]]

# Grouping by country and type and counting the number of titles
df_country_viz = df_country_viz.groupby(['country', 'type'])["title"].count().reset_index().sort_values('title', ascending=False)

# Renaming the 'title' column to 'movies_count'
df_country_viz = df_country_viz.rename(columns={"title": "movies_count"})

# Filtering the dataframe for movies and selecting the top 10 countries
df_country_movie = df_country_viz.query("type == 'Movie'").head(10)

# Creating a bar chart for the top 10 countries with the most Netflix movies
fig1 = px.bar(df_country_movie, x='country', y='movies_count', color_discrete_sequence=['purple'],
              title='Top 10 countries with the most Netflix movies')

# Filtering the dataframe for TV shows and selecting the top 10 countries
df_country_tvshow = df_country_viz.query("type == 'TV Show'").head(10)

# Creating a bar chart for the top 10 countries with the most Netflix TV shows
fig2 = px.bar(df_country_tvshow, x='country', y='movies_count', color_discrete_sequence=['gray'],
              title='Top 10 countries with the most Netflix TV Shows')

# Displaying the charts
fig1.show()
fig2.show()


In [93]:
df_2 = df.query("type == 'Movie'")
df_2 = df_2[["title", "rating"]]
df_2 = df_2.groupby(['rating'])["title"].count().reset_index().sort_values('title', ascending = False)
df_2 = df_2.rename(columns = {"title": "movies_count"})
px.bar(df_2, x='rating', y='movies_count', color_discrete_sequence=px.colors.sequential.RdBu,
       title='For which category the maximum content(Movies) are uploaded? ')


It seems the most content(Movies) on Netflix caters to Adults and then teens.


In [94]:
df_3 = df.query("type == 'TV Show'")
df_3 = df_3[["title", "rating"]]
df_3 = df_3.groupby('rating')["title"].count().reset_index().sort_values('title', ascending = False)
df_3 = df_3.rename(columns = {"title": "movies_count"})
px.bar(df_3, x='rating', y='movies_count', color_discrete_sequence=['grey'],
       title='For which category the maximum content(TV Shows) are uploaded?')


It seems the most content(TV shows) on Netflix caters to Adults and then teens.


In [95]:
df_5 = df.query("release_year >= 2007")
df_5 = df_5.groupby("release_year")["show_id"].count().reset_index()

fig = px.area(df_5, x='release_year', y='show_id', color_discrete_sequence=px.colors.sequential.RdBu,
      title='Overall content release Trend')
fig.show()

In 2007, Netflix introduced streaming media and video on demand. We see a slow in the beginning but then it picked up in 2014-2015 and there is a rapid increase till 2018.

By 2018, the content on netlix was 13 times of 2007 year's content. But it has declined since 2019 since the beginning of covid. The other factor could be - In 2019, Disney plus was also launched. Films and television series produced by The Walt Disney Studios and Walt Disney Television, such as Marvel movies moved to Disney plus.



In [123]:
# Filter the DataFrame to include only movies released after or in 2007
df_4 = df.query("release_year >= 2007")

# Group the data by type (Movie/TV Show) and release_year, and count the number of show_id values
df_4 = df_4.groupby(["type", "release_year"])["show_id"].count().reset_index()

# Filter the DataFrame to include only movie data
df_4_movie = df_4.query("type == 'Movie'")

# Filter the DataFrame to include only TV show data
df_4_show = df_4.query("type == 'TV Show'")

# Create a new figure
fig = go.Figure()

# Add a scatter plot trace for movies
fig.add_trace(go.Scatter(
    x=df_4_movie['release_year'],
    y=df_4_movie['show_id'],
    showlegend=True,
    text=df_4_movie['show_id'],
    name='Movie',
    marker_color='Maroon'
))

# Add a scatter plot trace for TV shows
fig.add_trace(go.Scatter(
    x=df_4_show['release_year'],
    y=df_4_show['show_id'],
    showlegend=True,
    text=df_4_show['show_id'],
    name='TV Show',
    marker_color='Grey'
))

# Set the trace mode to include lines and markers
fig.update_traces(mode='lines+markers')

# Update the layout with a title for the plot
fig.update_layout(title_text='Movies/TV Shows Release Yearly Trend')

# Display the plot
fig.show()


It seems like Netflix focused on movies, and the movie count increases significantly till 2018. There's been a decline in the movies count but a steady growth in the TV shows since 2018. 


In [124]:
import plotly.graph_objects as go

# Filter the DataFrame for records with release year greater than or equal to 2007
df_4 = df.query("release_year >= 2007")

# Select the required columns for analysis
df_4 = df_4[["type", "month", "release_year", "show_id"]]

# Group the data by release year, month, and type and calculate the count of show IDs
df_4 = df_4.groupby(['release_year', 'month', 'type'])['show_id'].count().reset_index()

# Rename the 'show_id' column to 'total_shows'
df_4 = df_4.rename(columns={"show_id": "total_shows"})

# Group the data by month and type and calculate the average number of shows
df_4 = df_4.groupby(['month', 'type'])['total_shows'].mean().reset_index()

# Filter the DataFrame for records with type 'Movie'
df_4_movie = df_4.query("type == 'Movie'")

# Filter the DataFrame for records with type 'TV Show'
df_4_show = df_4.query("type == 'TV Show'")

# Create a new figure object
fig = go.Figure()

# Add a scatter trace for movies
fig.add_trace(go.Scatter(
    x=df_4_movie['month'],
    y=df_4_movie['total_shows'],
    showlegend=True,
    text=df_4_movie['total_shows'],
    name='Movie',
    marker_color='Maroon'
))

# Add a scatter trace for TV shows
fig.add_trace(go.Scatter(
    x=df_4_show['month'],
    y=df_4_show['total_shows'],
    showlegend=True,
    text=df_4_show['total_shows'],
    name='TV Show',
    marker_color='Grey'
))

# Update trace mode to display lines and markers
fig.update_traces(mode='lines+markers')

# Update the layout with the title
fig.update_layout(title_text='Movies/TV Shows Average Monthly Release Trend')

# Display the figure
fig.show()


It appears that there is no specific pattern in which we could see more movies are added to particular months,


In [125]:
import plotly.graph_objects as go

def trend_yearwise(year):
    # Create the title for the plot
    title = f"Movies/TV Show Release Month Trend for year {year}"

    # Filter the dataframe for the specified year
    df_year = df[df["release_year"] == year]

    # Group the data by type (Movie/TV Show) and month, and count the number of shows
    df_monthly = df_year.groupby(["type", "month"])["show_id"].count().reset_index()

    # Filter the data for movies
    df_movies = df_monthly[df_monthly["type"] == "Movie"]

    # Filter the data for TV shows
    df_tv_shows = df_monthly[df_monthly["type"] == "TV Show"]

    # Create a new figure
    fig = go.Figure()

    # Add the scatter plot for movies
    fig.add_trace(
        go.Scatter(
            x=df_movies["month"],
            y=df_movies["show_id"],
            showlegend=True,
            text=df_movies["show_id"],
            name="Movie",
            marker_color="Maroon"
        )
    )

    # Add the scatter plot for TV shows
    fig.add_trace(
        go.Scatter(
            x=df_tv_shows["month"],
            y=df_tv_shows["show_id"],
            showlegend=True,
            text=df_tv_shows["show_id"],
            name="TV Show",
            marker_color="Grey"
        )
    )

    # Update the traces to display lines and markers
    fig.update_traces(mode="lines+markers")

    # Update the layout with the title
    fig.update_layout(title_text=title)

    # Show the figure
    fig.show()

# Call the function with the desired year
trend_yearwise(2019)


In [126]:
import plotly.express as px

# Select the necessary columns from the DataFrame
df_genre_viz = df_genre[["title", "type", "listed_in"]]

# Group the data by genre and type, and count the number of titles in each group
df_genre_viz = df_genre_viz.groupby(['listed_in', 'type'])["title"].count().reset_index().sort_values('title')

# Rename the columns for better readability
df_genre_viz = df_genre_viz.rename(columns={"title": "movies_count", "listed_in": "Genre"})

# Filter the data for movies
df_genre_movie = df_genre_viz.query("type == 'Movie'")

# Filter the data for TV shows
df_genre_tvshow = df_genre_viz.query("type == 'TV Show'")

# Create a bar chart for movies
fig1 = px.bar(
    df_genre_movie,
    x='movies_count',
    y='Genre',
    color_discrete_sequence=px.colors.sequential.RdBu,
    title='For which Genre the maximum content (Movies) are uploaded?',
    height=600
)

# Create a bar chart for TV shows
fig2 = px.bar(
    df_genre_tvshow,
    x='Genre',
    y='movies_count',
    color_discrete_sequence=['Grey'],
    title='For which Genre the maximum content (Shows) are uploaded?'
)

# Display the bar charts
fig1.show() 
fig2.show()



It looks like for both TV shows and movies - the three most common genres are international movies and dramas.

In [127]:
# Filter the dataframe for TV shows
df_9 = df.query("type == 'TV Show'")

# Select the required columns
df_9 = df_9[["title", "duration"]]

# Group by duration and count the number of TV shows in each duration category
df_9 = df_9.groupby(['duration'])["title"].count().reset_index().sort_values('title', ascending=False)

# Rename the columns for clarity
df_9 = df_9.rename(columns={"title": "TV Shows", "duration": "Seasons"})

# Filter the dataframe for movies
df_10 = df.query("type == 'Movie'")

# Clean and convert the duration values to integers
df_10['duration'] = df_10['duration'].fillna("0")
df_10['duration'] = df_10['duration'].str.split(" ").str[0].astype(int)

# Create a bar chart for TV show seasons
fig_show = px.bar(df_9, x='Seasons', y='TV Shows', color_discrete_sequence=['grey'],
                  title='TV Shows by Seasons')

# Create a histogram for movie durations
fig_Movie = px.histogram(df_10, x="duration", nbins=20, color_discrete_sequence=px.colors.sequential.RdBu,
                         title="Movie Duration")

# Display the figures
fig_Movie.show()
fig_show.show()




The duration for most movies on netflix falls between 80-120 mins with very few movies more than 150 mins.
Most shows on Netflix has only season1.


   # THE CONCLUSION

We did exploratory data analysis on Netflix Movie Data. 
We found a lot of insights from the data. 




**our observations are as follows:**

<li>We observe that there are `more movies than TV shows` on Netflix</li>

<li>We see that the top 5 countries for Netflix viewers are:</li>
['United States', 'India', 'United Kingdom', 'Japan', 'South Korea']



<li>It seems the most content(TV shows) on Netflix caters to Adults and then teens.</li>
<li>It seems the most content(Movies) on Netflix caters to Adults and then teens</li>
<li>In 2007, Netflix introduced streaming media and video on demand. We see a slow in the beginning but then it picked up in 2014-2015 and there is a rapid increase till 2018.</li>

<li>By 2018, the content on netlix was 13 times of 2007 year's content. But it has declined since 2019 since the beginning of covid.</li>
<li>The other factor could be - In 2019, Disney plus was also launched. Films and television series produced by The Walt Disney Studios and Walt Disney Television, such as Marvel movies moved to Disney plus.</li>


<li>It seems like Netflix focused on movies, and the movie count increases significantly till 2018. There's been a decline in the movies count but a steady growth in the TV shows since 2018.</li>
<li>It appears that there is no specific pattern in which we could see more movies are added to particular months,
<li>It looks like for both TV shows and movies - the three most common genres are international movies and dramas.</li>
<li>The duration for most movies on netflix falls between 80-120 mins with very few movies more than 150 mins.</li>
<li>Most shows on Netflix has only season1.</li>
