<a href="https://colab.research.google.com/github/iamgirupashankar/Netflix-Content-Strategy-Analysis/blob/main/netflix_content_strategy_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Content Strategy Analysis means analyzing how content is created, released, distributed, and consumed to achieve specific goals, such as maximizing audience engagement, viewership, brand reach, or revenue.**

# Netflix Content Strategy Analysis: Getting Started  
For the task of Netflix Content Strategy Analysis, we need data based on content titles, type (show or movie), genre, language, and release details (date, day of the week, season) to understand timing and content performance. Viewership metrics like hours viewed are also crucial for measuring audience engagement.  

I found an ideal dataset for this task, which contains data about title, release date, language, content type (show or movie), availability status, and viewership hours of the content on Netflix of all the shows and movies released in 2023.

# Data Loading
Netflix Content Strategy Analysis with Python  
Now, let’s get started with the task of Netflix Content Strategy Analysis by importing the necessary Python libraries and the dataset:

In [66]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"

netflix_data = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/thecleverprogrammer/Data Analytics Projects with Python/Market and Research Analytics/1. Netflix Content Strategy Analysis with Python/netflix_content_2023.csv")

netflix_data.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie


Let me start with cleaning and preprocessing the “Hours Viewed” column to prepare it for analysis:

In [52]:
netflix_data['Hours Viewed'] = netflix_data['Hours Viewed'].replace(',', '', regex=True).astype(float)
netflix_data[['Title', 'Hours Viewed']].head()

Unnamed: 0,Title,Hours Viewed
0,The Night Agent: Season 1,812100000.0
1,Ginny & Georgia: Season 2,665100000.0
2,The Glory: Season 1 // 더 글로리: 시즌 1,622800000.0
3,Wednesday: Season 1,507700000.0
4,Queen Charlotte: A Bridgerton Story,503000000.0


The “Hours Viewed” column has been successfully cleaned and converted to a numeric format.  

# 🤔 Analyze:
Now, I’ll analyze trends in content type to determine whether shows or movies dominate viewership. Let’s visualize the distribution of total viewership hours between Shows and Movies:

In [53]:
# aggregate viewership hours by content type
content_type_viewership = netflix_data.groupby('Content Type')['Hours Viewed'].sum().reset_index()

# create a pie chart
fig = px.pie(content_type_viewership, values='Hours Viewed', names='Content Type', title='Total Viewership by Content Type')
fig.show()

In [54]:
display(content_type_viewership)

Unnamed: 0,Content Type,Hours Viewed
0,Movie,50637800000.0
1,Show,107764100000.0


📒 Note: The visualization indicates that shows dominate the total viewership hours on Netflix in 2023 compared to movies. This suggests that Netflix’s content strategy leans heavily toward shows, as they tend to attract more watch hours overall.

# 🤔 Analyze:
Next, let’s analyze the distribution of viewership across different languages to understand which languages are contributing the most to Netflix’s content consumption:

In [55]:
# aggregate viewership hours by language
language_viewership = netflix_data.groupby('Language Indicator')['Hours Viewed'].sum().sort_values(ascending=False).reset_index()

# create a bar chart
fig = px.bar(language_viewership, x='Language Indicator', y='Hours Viewed', title='Total Viewership by Language')
fig.show()

📒 Note: The visualization reveals that English-language content significantly dominates Netflix’s viewership, followed by other languages like Korean. It indicates that Netflix’s primary audience is consuming English content, although non-English shows and movies also have a considerable viewership share, which shows a diverse content strategy.

# 🤔 Analyze:
Next, I’ll analyze how viewership varies based on release dates to identify any trends over time, such as seasonality or patterns around specific months:

In [56]:
# convert the 'Release Date to a date time format and extract the month
netflix_data['Release Date'] = pd.to_datetime(netflix_data['Release Date'])
netflix_data['Month'] = netflix_data['Release Date'].dt.month
display(netflix_data.head())

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type,Month,Release Season,Release Day
0,The Night Agent: Season 1,Yes,2023-03-23,812100000.0,English,Show,3.0,Spring,Thursday
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000.0,English,Show,1.0,Winter,Thursday
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000.0,Korean,Show,12.0,Winter,Friday
3,Wednesday: Season 1,Yes,2022-11-23,507700000.0,English,Show,11.0,Fall,Wednesday
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000.0,English,Movie,5.0,Spring,Thursday


In [57]:
# aggregate viewership hours by month
monthly_viewership = netflix_data.groupby('Month')['Hours Viewed'].sum().reset_index()
display(monthly_viewership.head())
# create a line chart
fig = px.line(monthly_viewership, x='Month', y='Hours Viewed', title='Monthly Viewership Trends')
fig.show()

Unnamed: 0,Month,Hours Viewed
0,1.0,7271600000.0
1,2.0,7103700000.0
2,3.0,7437100000.0
3,4.0,6865700000.0
4,5.0,7094600000.0


📒 Note: The graph shows the total viewership hours by month, which reveals a notable increase in viewership during June and a sharp rise toward the end of the year in December. It suggests that Netflix experiences spikes in audience engagement during these periods, possibly due to strategic content releases, seasonal trends, or holidays, while the middle months have a steady but lower viewership pattern.

# 🤔 Analyze:
To delve deeper, we can analyze the most successful content (both shows and movies) and understand the specific characteristics, such as genre or theme, that may have contributed to high viewership:

In [58]:
# extract the top 5 titles based on viewership hours
top_5_titles = netflix_data.nlargest(5, 'Hours Viewed')[['Title', 'Hours Viewed', 'Language Indicator', 'Content Type', 'Release Date']].reset_index(drop=True)
print("The top 5 most-viewed titles on Netflix in 2023 are:")
display(top_5_titles)

The top 5 most-viewed titles on Netflix in 2023 are:


Unnamed: 0,Title,Hours Viewed,Language Indicator,Content Type,Release Date
0,The Night Agent: Season 1,812100000.0,English,Show,2023-03-23
1,Ginny & Georgia: Season 2,665100000.0,English,Show,2023-01-05
2,King the Land: Limited Series // 킹더랜드: 리미티드 시리즈,630200000.0,Korean,Movie,2023-06-17
3,The Glory: Season 1 // 더 글로리: 시즌 1,622800000.0,Korean,Show,2022-12-30
4,ONE PIECE: Season 1,541900000.0,English,Show,2023-08-31


📒 Note: English-language shows dominate the top viewership spots. But, Korean content also has a notable presence in the top titles, which indicates its global popularity.

# 🤔 Analyze:
Now, let’s have a look at the viewership trends by content type.

In [59]:
# aggregate viewership hours by content type and release month
monthly_viewership_by_type = netflix_data.groupby(['Month', 'Content Type'])['Hours Viewed'].sum().reset_index()
fig = px.line(monthly_viewership_by_type, x='Month', y='Hours Viewed', color='Content Type', title='Monthly Viewership Trends by Content Type')
fig.show()

📒 Note: The graph compares viewership trends between movies and shows throughout 2023. It shows that shows consistently have higher viewership than movies, peaking in December. Movies have more fluctuating viewership, with notable increases in June and October. This indicates that Netflix’s audience engages more with shows across the year, while movie viewership experiences occasional spikes, possibly linked to specific releases or events

# 🤔 Analyze:
Now, let’s explore the total viewership hours distributed across different release seasons:

In [60]:
# define seasons based on release months
def get_season(month):
  if month in [12, 1, 2]:
    return 'Winter'
  elif month in [3, 4, 5]:
    return 'Spring'
  elif month in [6, 7, 8]:
    return 'Summer'
  else:
    return 'Fall'

# apply the season categorization to the dataset
netflix_data['Release Season'] = netflix_data['Month'].apply(get_season)
display(netflix_data.head())

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type,Month,Release Season,Release Day
0,The Night Agent: Season 1,Yes,2023-03-23,812100000.0,English,Show,3.0,Spring,Thursday
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000.0,English,Show,1.0,Winter,Thursday
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000.0,Korean,Show,12.0,Winter,Friday
3,Wednesday: Season 1,Yes,2022-11-23,507700000.0,English,Show,11.0,Fall,Wednesday
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000.0,English,Movie,5.0,Spring,Thursday


In [61]:
# aggregate viewership hours by release season
seasonal_viewership = netflix_data.groupby('Release Season')['Hours Viewed'].sum()
seasonal_viewership = seasonal_viewership.reindex(['Winter', 'Spring', 'Summer', 'Fall'])
display(seasonal_viewership)

fig = px.bar(seasonal_viewership, x=seasonal_viewership.index, y=seasonal_viewership.values, title='Total Viewership by Release Season')
fig.show()


Unnamed: 0_level_0,Hours Viewed
Release Season,Unnamed: 1_level_1
Winter,24431100000.0
Spring,21397400000.0
Summer,21864600000.0
Fall,90708800000.0


📒 Note:  The graph indicates that viewership hours peak significantly in the Fall season, with over 90 billion hours viewed, while Winter, Spring, and Summer each have relatively stable and similar viewership around the 20 billion mark. This suggests that Netflix experiences the highest audience engagement during the Fall.

# 🤔 Analyze:
Now, let’s analyze the number of content releases and their viewership hours across months:

In [62]:
monthly_releases = netflix_data['Month'].value_counts().sort_index()
monthly_viewership = netflix_data.groupby('Month')['Hours Viewed'].sum()
fig = go.Figure()
fig.add_trace(go.Bar(x=monthly_releases.index, y=monthly_releases.values, name='Number of Releases', yaxis='y1'))
fig.add_trace(go.Scatter(x=monthly_viewership.index, y=monthly_viewership.values, name='Viewership Hours', yaxis='y2'))
fig.update_layout(title='Monthly Content Releases and Viewership', xaxis_title='Month', yaxis_title='Number of Releases', yaxis2=dict(title= 'Viewership Hours', overlaying='y', side='right'))
fig.show()

📒 Note: While the number of releases is relatively steady throughout the year, viewership hours experience a sharp increase in June and a significant rise in December, despite a stable release count. This indicates that viewership is not solely dependent on the number of releases but influenced by the timing and appeal of specific content during these months.

# 🤔 Analyze:
Next, let’s explore whether Netflix has a preference for releasing content on specific weekdays and how this influences viewership patterns:

In [63]:
netflix_data['Release Day'] = netflix_data['Release Date'].dt.day_name()
weekday_releases = netflix_data['Release Day'].value_counts().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)
# aggregate viewership hours by day of the week
weekday_viewership = netflix_data.groupby('Release Day')['Hours Viewed'].sum().reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
fig = go.Figure()
fig.add_trace(go.Bar(x=weekday_releases.index, y=weekday_releases.values, name='Number of Releases', yaxis='y1'))
fig.add_trace(go.Scatter(x=weekday_viewership.index, y=weekday_viewership.values, name='Viewership Hours', yaxis='y2'))
fig.update_layout(title='Weekly Content Releases and Viewership', xaxis_title='Day of the Week', yaxis_title='Number of Releases', yaxis2=dict(title='Viewership Hours', overlaying='y', side='right'))
fig.show()

📒 Note: The graph highlights that most content releases occur on Fridays, with viewership hours also peaking significantly on that day. This suggests that Netflix strategically releases content toward the weekend to maximize audience engagement. The viewership drops sharply on Saturdays and Sundays, despite some releases, indicating that the audience tends to consume newly released content right at the start of the weekend, which makes Friday the most impactful day for both releases and viewership.

# 🤔 Analyze:
To further understand the strategy, let’s explore specific high-impact dates, such as holidays or major events, and their correlation with content releases.

In [64]:
# define significant holidays and events in 2023
important_dates = [
    '2023-01-01',  # new year's day
    '2023-02-14',  # valentine's ay
    '2023-07-04',  # independence day (US)
    '2023-10-31',  # halloween
    '2023-12-25'   # christmas day
]

# convert to datetime
important_dates = pd.to_datetime(important_dates)

# check for content releases close to these significant holidays (within a 3-day window)
holiday_releases = netflix_data[netflix_data['Release Date'].apply(
    lambda x: any((x - date).days in range(-3, 4) for date in important_dates)
)]

# aggregate viewership hours for releases near significant holidays
holiday_viewership = holiday_releases.groupby('Release Date')['Hours Viewed'].sum()

holiday_releases[['Title', 'Release Date', 'Hours Viewed']]

Unnamed: 0,Title,Release Date,Hours Viewed
2,The Glory: Season 1 // 더 글로리: 시즌 1,2022-12-30,622800000.0
6,La Reina del Sur: Season 3,2022-12-30,429600000.0
11,Kaleidoscope: Limited Series,2023-01-01,252500000.0
29,Perfect Match: Season 1,2023-02-14,176800000.0
124,Lady Voyeur: Limited Series // Olhar Indiscret...,2022-12-31,86000000.0
...,...,...,...
22324,The Romantics: Limited Series,2023-02-14,1000000.0
22327,Aggretsuko: Season 5 // アグレッシブ烈子: シーズン5,2023-02-16,900000.0
22966,The Lying Life of Adults: Limited Series // La...,2023-01-04,900000.0
22985,Community Squad: Season 1 // División Palermo:...,2023-02-17,800000.0


The data reveals that Netflix has strategically released content around key holidays and events. Some of the significant releases include:

- **New Year’s Period:** The Glory: Season 1, La Reina del Sur: Season 3, and Kaleidoscope: Limited Series were released close to New Year’s Day, resulting in high viewership.
- **Valentine’s Day:** Perfect Match: Season 1 and The Romantics: Limited Series were released on February 14th, which align with a romantic theme and capitalize on the holiday’s sentiment.

# **Conclusion**  

So, the content strategy of Netflix revolves around maximizing viewership through targeted release timing and content variety. Shows consistently outperform movies in viewership, with significant spikes in December and June, indicating strategic releases around these periods. The Fall season stands out as the peak time for audience engagement. Most content is released on Fridays, which aims to capture viewers right before the weekend, and viewership aligns strongly with this release pattern. While the number of releases is steady throughout the year, viewership varies, which suggests a focus on high-impact titles and optimal release timing over sheer volume.