# Netflix Content Strategy Analysis

## Project overview
In this project, I will analyze strategies for maximizing viewership through targeted release timing and content variety. This analysis is guided by projects from [The Clever Programmer](https://thecleverprogrammer.com/2024/09/30/netflix-content-strategy-analysis-with-python/).

## Import required libraries and dataset

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
data = pd.read_csv(r'D:\REGINA\DA DS\netflix_strategy\netflix_content_2023.csv')
data.head(5)

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24812 entries, 0 to 24811
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Title                24812 non-null  object
 1   Available Globally?  24812 non-null  object
 2   Release Date         8166 non-null   object
 3   Hours Viewed         24812 non-null  object
 4   Language Indicator   24812 non-null  object
 5   Content Type         24812 non-null  object
dtypes: object(6)
memory usage: 1.1+ MB


## Data cleaning and preprocessing
We can see that the 'Hours Viewed' data type is incorrect. Therefore, we need to remove the commas and convert it to a numeric data type.

In [4]:
data['Hours Viewed'] = data['Hours Viewed'].str.replace(',', '').astype('float')
data.head(5)

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000.0,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000.0,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000.0,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000.0,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000.0,English,Movie


The 'Hours Viewed' column has already been cleaned, and the dataset is now ready for analysis.

## Exploratory data analysis

In [5]:
data['Content Type'].unique()

array(['Show', 'Movie'], dtype=object)

First, I will analyze the distribution of total viewership hours between shows and movies.