<a href="https://colab.research.google.com/github/sanket4132/sanket4132/blob/main/Netflix_Content_Strategy_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"


In [None]:
netflix_data = pd.read_csv("/netflix_content_2023 new.csv",encoding='latin1') # or 'ISO-8859-1' or cp1252

In [None]:
netflix_data.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,23/03/23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,05/01/23,665100000,English,Show
2,The Glory: Season 1 // _ ___: __ 1,Yes,30/12/22,622800000,Korean,Show
3,Wednesday: Season 1,Yes,23/11/22,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,04/05/23,503000000,English,Movie


Let me start with cleaning and preprocessing the “Hours Viewed” column to prepare it for analysis:

In [None]:
netflix_data ['Hours Viewed']= netflix_data ['Hours Viewed'].replace(',', '', regex=True).astype(float)
netflix_data[['Title', 'Hours Viewed']].head()

Unnamed: 0,Title,Hours Viewed
0,The Night Agent: Season 1,812100000.0
1,Ginny & Georgia: Season 2,665100000.0
2,The Glory: Season 1 // _ ___: __ 1,622800000.0
3,Wednesday: Season 1,507700000.0
4,Queen Charlotte: A Bridgerton Story,503000000.0


The “Hours Viewed” column has been successfully cleaned and converted to a numeric format. Now, I’ll analyze trends in content type to determine whether shows or movies dominate viewership. Let’s visualize the distribution of total viewership hours between Shows and Movies:




In [None]:
content_type_viewership = netflix_data.groupby('Content Type')['Hours Viewed'].sum()



In [None]:
fig = go.Figure(data=[
    go.Bar(
        x=content_type_viewership.index,
        y=content_type_viewership.values,
        marker_color=['skyblue', 'salmon']
    )
])
fig.show()


In [None]:
fig.update_layout(
    title='Total Viewership Hours by Content Type (2023)',
    xaxis_title='Content Type',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis_tickangle=0,
    height=500,
    width=500
)
fig.show()

Next, let’s analyze the distribution of viewership across different languages to understand which languages are contributing the most to Netflix’s content consumption:

In [None]:
language_viewership = netflix_data.groupby('Language Indicator')['Hours Viewed'].sum().sort_values(ascending=False)

fig = go.Figure(data=[
    go.Bar(
        x=language_viewership.index,
        y=language_viewership.values,
        marker_color='lightcoral'
    )
])


In [None]:
fig.update_layout(
    title='Total Viewership Hours by Language (2023)',
    xaxis_title='Language',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis_tickangle=45,
    height=600,
    width=1000
)

fig.show()

Next, I’ll analyze how viewership varies based on release dates to identify any trends over time, such as seasonality or patterns around specific months:

In [None]:
netflix_data['Release Date'] = pd.to_datetime(netflix_data['Release Date'])
netflix_data['Release Month'] = netflix_data['Release Date'].dt.month


Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.



In [None]:
monthly_viewership = netflix_data.groupby('Release Month')['Hours Viewed'].sum()

In [None]:
fig = go.Figure(data=[
    go.Scatter(
        x=monthly_viewership.index,
        y=monthly_viewership.values,
        mode='lines+markers',
        marker=dict(color='blue'),
        line=dict(color='blue')
    )
])


In [None]:
fig.update_layout(
    title='Total Viewership Hours by Release Month (2023)',
    xaxis_title='Month',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    height=600,
    width=1000
)

fig.show()

In [None]:
# extract the top 5 titles based on viewership hours
top_5_titles = netflix_data.nlargest(5, 'Hours Viewed')
top_5_titles
# create a bar chart of the top 5 titles

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type,Release Month
0,The Night Agent: Season 1,Yes,2023-03-23,812100000.0,English,Show,3
1,Ginny & Georgia: Season 2,Yes,2023-05-01,665100000.0,English,Show,5
18227,King the Land: Limited Series // ____: ____ ___,Yes,2023-06-17,630200000.0,Korean,Movie,6
2,The Glory: Season 1 // _ ___: __ 1,Yes,2022-12-30,622800000.0,Korean,Show,12
18214,ONE PIECE: Season 1,Yes,2023-08-31,541900000.0,English,Show,8
