# Movies on Netflix, Prime Video, Hulu and Disney+

The **"Movies on Netflix, Prime Video, Hulu, and Disney+"** dataset on Kaggle provides details on over 5,000 movies across these streaming platforms. It includes features such as *title*, *year*, *age rating*, *runtime*, *genres*, and streaming platform availability, allowing users to analyze content diversity, exclusivity, and availability by service. This dataset is particularly useful for comparing content strategies and audience targeting across these platforms.

For full details, visit the dataset on Kaggle: [Movies on Netflix, Prime Video, Hulu, and Disney+](https://www.kaggle.com/datasets/ruchi798/movies-on-netflix-prime-video-hulu-and-disney).

This notebook analyzes this dataset and aims to perform the following tasks by taking proper statistical measures: 

- Perform a detailed descriptive analysis of the data set. Use appropriate statistical measures to
describe it. Include at least one statistical graphic. The descriptive analysis should be aimed at
answering the above questions.

- Perform appropriate statistical hypotheses tests to answer the two above questions: Is the age
restriction for movies on Disney+ lower than for movies on Netflix? Is there a difference in Rotten
Tomatoes Score for movies on those two platforms? Give reasons for your choice of test.

## 0. Dataset Loading

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display

file_path = 'MoviesOnStreamingPlatforms.csv'
df = pd.read_csv(file_path)

## 1. Data Overview

### 1.1 Check Head and Tail

In [13]:
display(df.head())

display(df.tail())


Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type
0,0,1,The Irishman,2019,18+,98/100,1,0,0,0,0
1,1,2,Dangal,2016,7+,97/100,1,0,0,0,0
2,2,3,David Attenborough: A Life on Our Planet,2020,7+,95/100,1,0,0,0,0
3,3,4,Lagaan: Once Upon a Time in India,2001,7+,94/100,1,0,0,0,0
4,4,5,Roma,2018,18+,94/100,1,0,0,0,0


Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type
9510,9510,9511,Most Wanted Sharks,2020,,14/100,0,0,0,1,0
9511,9511,9512,Doc McStuffins: The Doc Is In,2020,,13/100,0,0,0,1,0
9512,9512,9513,Ultimate Viking Sword,2019,,13/100,0,0,0,1,0
9513,9513,9514,Hunt for the Abominable Snowman,2011,,10/100,0,0,0,1,0
9514,9514,9515,Women of Impact: Changing the World,2019,7+,10/100,0,0,0,1,0


### 1.2 Data Info Summary

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9515 entries, 0 to 9514
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Unnamed: 0       9515 non-null   int64 
 1   ID               9515 non-null   int64 
 2   Title            9515 non-null   object
 3   Year             9515 non-null   int64 
 4   Age              5338 non-null   object
 5   Rotten Tomatoes  9508 non-null   object
 6   Netflix          9515 non-null   int64 
 7   Hulu             9515 non-null   int64 
 8   Prime Video      9515 non-null   int64 
 9   Disney+          9515 non-null   int64 
 10  Type             9515 non-null   int64 
dtypes: int64(8), object(3)
memory usage: 817.8+ KB


## 2. Summary Statistics

### 2.1 Descriptive Statistics for Numerical Columns

In [15]:
df.describe()

Unnamed: 0.1,Unnamed: 0,ID,Year,Netflix,Hulu,Prime Video,Disney+,Type
count,9515.0,9515.0,9515.0,9515.0,9515.0,9515.0,9515.0,9515.0
mean,4757.0,4758.0,2007.422386,0.388334,0.110037,0.432265,0.0969,0.0
std,2746.888239,2746.888239,19.130367,0.487397,0.312952,0.495417,0.295837,0.0
min,0.0,1.0,1914.0,0.0,0.0,0.0,0.0,0.0
25%,2378.5,2379.5,2006.0,0.0,0.0,0.0,0.0,0.0
50%,4757.0,4758.0,2015.0,0.0,0.0,0.0,0.0,0.0
75%,7135.5,7136.5,2018.0,1.0,0.0,1.0,0.0,0.0
max,9514.0,9515.0,2021.0,1.0,1.0,1.0,1.0,0.0


### 2.2 Counts for Categorical Columns

In [17]:
categorical_columns = df.select_dtypes(include='object').columns
for col in categorical_columns:
    print(f"{col}:\n{df[col].value_counts()}\n")

Title:
Title
The Irishman                           1
Burden                                 1
Elon Musk: The Real Life Iron Man      1
Albion: The Enchanted Stallion         1
Songwriter                             1
                                      ..
Berlin Berlin                          1
Bollywood Calling                      1
Ujala                                  1
Babamın Ceketi                         1
Women of Impact: Changing the World    1
Name: count, Length: 9515, dtype: int64

Age:
Age
18+    2276
7+     1090
13+     998
all     698
16+     276
Name: count, dtype: int64

Rotten Tomatoes:
Rotten Tomatoes
44/100    311
46/100    298
47/100    291
49/100    290
43/100    289
         ... 
23/100      1
24/100      1
97/100      1
95/100      1
96/100      1
Name: count, Length: 85, dtype: int64



## 3. Missing Values

### 3.1 Missing Value Counts and Percentages

In [18]:
missing_values = df.isnull().sum()
missing_percentage = (df.isnull().sum() / len(df)) * 100
missing_data = pd.DataFrame({'Missing Values': missing_values, 'Percentage': missing_percentage})
missing_data[missing_data['Missing Values'] > 0]

Unnamed: 0,Missing Values,Percentage
Age,4177,43.899107
Rotten Tomatoes,7,0.073568
