# **Project Name**    -  **Amazon Prime TV Shows and Movies**



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary**

**Project Summary: Analyzing Amazon Prime Video’s Content Library**

The streaming industry has grown exponentially in recent years, with platforms like Amazon Prime Video continuously expanding their content libraries to cater to diverse audiences. As competition intensifies, data-driven insights play a crucial role in understanding viewer preferences, content trends, and strategic investment decisions. This project focuses on analyzing a dataset containing all shows available on Amazon Prime Video in the United States, offering valuable insights into content diversity, regional availability, audience engagement, and popularity.  

One of the primary areas of analysis is **content diversity**, which helps identify the dominant genres and categories on Amazon Prime Video. Streaming platforms aim to provide a wide range of content to appeal to various audience segments, from drama and action to documentaries and reality shows. Understanding which genres dominate the platform can provide insights into viewer preferences and content acquisition strategies. Additionally, analyzing content distribution can highlight whether Amazon Prime focuses more on original productions, licensed content, or niche categories.  

Another key aspect of this study is **regional availability**, which examines how content distribution varies across different regions. While this dataset specifically focuses on the United States, similar analyses can be applied to other markets to assess whether Amazon Prime tailors its content strategy based on regional demand. Understanding these distribution patterns can help content creators and streaming platforms make informed decisions about localization, dubbing, and subtitle offerings to attract a wider audience.  

The dataset also allows for an analysis of **trends over time**, helping to track how Amazon Prime Video’s content library has evolved. By studying the addition and removal of shows over the years, we can determine whether the platform has increased its focus on certain types of content, such as exclusive originals, international shows, or specific genres. This historical perspective provides insights into Amazon Prime’s content acquisition strategy and how it adapts to changing viewer preferences and market dynamics.  

Additionally, **IMDb ratings and popularity** play a crucial role in determining viewer engagement and content success. By analyzing the highest-rated and most popular shows on Amazon Prime Video, this project seeks to identify the factors contributing to high audience ratings and engagement. This analysis can help content creators understand what resonates with viewers, guiding future production and investment decisions. Furthermore, comparing the ratings of Amazon Prime’s original content with licensed shows can offer insights into the platform’s content quality and audience reception.  

By leveraging these insights, businesses, content creators, and data analysts can gain a deeper understanding of Amazon Prime Video’s content strategy and its impact on subscription growth and user engagement. The findings from this analysis can support decision-making in areas such as content acquisition, licensing, and original productions. As the streaming landscape continues to evolve, data-driven strategies will be essential in maintaining a competitive edge, attracting new subscribers, and retaining existing ones.

# **Problem Statement**


This dataset was created to analyze all shows available on Amazon Prime Video, allowing us to extract valuable insights such as:

Content Diversity: What genres and categories dominate the platform?

Regional Availability: How does content distribution vary across different regions?

Trends Over Time: How has Amazon Prime’s content library evolved?

IMDb Ratings & Popularity: What are the highest-rated or most popular shows on the platform?

By analyzing this dataset, businesses, content creators, and data analysts can uncover key trends that influence subscription growth, user engagement, and content investment strategies in the streaming industry.

#### **Define Your Business Objective?**

The business objective of this project is to analyze Amazon Prime Video’s content library to identify trends that drive audience engagement, subscription growth, and strategic content investment.




# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Loading   Dataset
from google.colab import drive
drive.mount('/content/drive')

dt2=pd.read_csv('/content/drive/MyDrive/Project 1 Amazon/credits.csv')
dt1=pd.read_csv('/content/drive/MyDrive/Project 1 Amazon/titles.csv')

### Dataset First View

In [None]:
# Dataset First
dt1.head()

In [None]:
# Dataset Second
dt2.head(20)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns
dt1.shape

In [None]:
# Dataset Rows & Columns
dt2.shape

### Dataset Information

In [None]:
# Dataset Info
dt1.info()

In [None]:
# Dataset Info
dt2.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(dt1[dt1.duplicated()])

In [None]:
#Dropping Duplicates
dt1.drop_duplicates()

In [None]:
# Dataset Duplicate Value Count
len(dt2[dt2.duplicated()])

In [None]:
#Dropping Duplicates
dt2.drop_duplicates()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count Dataset 1
dt1.isnull().sum()

In [None]:
# Percentage of Null values present in Colunns for Dataset 1
for i in dt1.columns:
  print(f'{i} : {(dt1[i].isna().sum()/dt1.shape[0])*100}')

In [None]:
#Having Null Values  more then 65% and 85% so Removed it
dt1.drop(['age_certification','seasons'] ,axis=1,inplace=True)

In [None]:
#Having Null Values more then 5% , so fill it with mean value of columns
dt1['imdb_score']=dt1['imdb_score'].fillna(dt1['imdb_score'].mean())
dt1['imdb_votes']=dt1['imdb_votes'].fillna(dt1['imdb_votes'].mean())
dt1['tmdb_score']=dt1['tmdb_score'].fillna(dt1['tmdb_score'].mean())

In [None]:
# Checking again the Percentage
for i in dt1.columns:
  print(f'{i} : {(dt1[i].isna().sum()/dt1.shape[0])*100}')

In [None]:
#Having Null less then 5%  so Remove the rows of null values
dt1.dropna(inplace = True)

In [None]:
# Checking again the Percentage
for i in dt1.columns:
  print(f'{i} : {(dt1[i].isna().sum()/dt1.shape[0])*100}')

In [None]:
# Visualizing the missing values
sns.heatmap(dt1.isnull(), cbar=False)

In [None]:
# Missing Values/Null Values Count for Dataset 2
dt2.isnull().sum()

In [None]:
# Percentage of Null values present in Colunns for Dataset 2
for i in dt2.columns:
  print(f'{i} : {(dt2[i].isna().sum()/dt2.shape[0])*100}')

In [None]:
#Dropping the Null Values
dt2.dropna(inplace=True)

In [None]:
# Percentage of Null values present in Colunns for Dataset 2
for i in dt2.columns:
  print(f'{i} : {(dt2[i].isna().sum()/dt2.shape[0])*100}')

In [None]:
# Visualizing the missing values
sns.heatmap(dt2.isnull(), cbar=False)

### What did you know about your dataset?

 The goal is to identify key trends related to content diversity, regional availability, historical changes in the library, and audience preferences based on IMDb ratings and popularity. By extracting these insights, the project seeks to help businesses, content creators, and analysts make data-driven decisions regarding content acquisition, production, and strategic investment to enhance audience engagement and subscription growth in the competitive streaming industry.

 And for this we have to data set
 dataset 1 have 9871 rows and 15 columns
 dataset 2 have 123235 rows and 5 columns

 Both have missing and duplicates Value in their dataset  

## ***2. Understanding Your Variables***

In [None]:
# Dataset1 Columns
dt1.columns

In [None]:
# Dataset1 Describe
dt1.describe()

In [None]:
# Dataset 2 Columns
dt2.columns

In [None]:
# Dataset2 Describe
dt2.describe()

### Variables Description


**DATASET 1**



**Id:** The title ID on JustWatch.

**Title:** The name of the title.

**Show_type:**TV show or movie.

**Description:** A brief description.

**Release_year:** The release year.

**Age_certification:** The age certification.

**Runtime:** The length of the episode (SHOW) or movie.

**Genres:** A list of genres.

**Production_countries:** A list of countries that produced the title.

**Seasons:** Number of seasons if it's a SHOW.

**Imdb_id:** The title ID on IMDB.

**Imdb_score:** Score on IMDB.

**Imdb_votes:** Votes on IMDB.

**Tmdb_popularity:**Popularity on TMDB.

**Tmdb_score:** Score on TMDB.


**DATASET 2**


**Person_ID:** The person ID on JustWatch.

**Id:** The title ID on JustWatch.

**Name:** The actor or director's name.

**Character_name:** The character name.

**Role:** ACTOR or DIRECTOR.




## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
#Merging two Dataset as Inner Join using Merge Attribute
data=pd.merge(dt1,dt2,on='id',how='inner')
data.head(3)

In [None]:
# Checking for the Null Values, If any
data.isnull().sum()

In [None]:
# A copy of Data is Made by Name df
df=data.copy()
df.head(3)

In [None]:
#Getting the shape of the merged dataset
df.shape



In [None]:
#getting the top 10 value count of the genres
q=df.loc[:,'genres'].value_counts().head(10)
q

In [None]:
#getting the top 20 value count of the production_countries
a=df.loc[:,'production_countries'].value_counts().head(20)
a

In [None]:
# grouping the value of genres based on the release year and sorting it out in descending order and getting top 20 values
e=dt1.groupby('genres')['release_year'].median().reset_index().sort_values('release_year',ascending=False,ignore_index=True).head(20)
e

In [None]:
#printing the row that have maximum imdb score in the dataset
dt1.loc[dt1['imdb_score'] == max(dt1['imdb_score'],)].reset_index()

In [None]:
# Doing analysis on top 5 values of production_countries
w=df['production_countries'].value_counts().head(5)
w

In [None]:
## grouping the value of title based on the imdb_score and sorting it out in descending order and getting top 3 values
t=dt1.groupby('title')['imdb_score'].median().reset_index().sort_values('imdb_score', ascending = False, ignore_index = True).head(3)
t


In [None]:
#printing the row that have maximum tmdb_popularity in the dataset
dt1.loc[dt1['tmdb_popularity'] == max(dt1['tmdb_popularity'],)].reset_index()

In [None]:
## grouping the value of title based on the tmdb_popularity and sorting it out in descending order and getting top 3 values
b=dt1.groupby('title')['tmdb_popularity'].median().reset_index().sort_values('tmdb_popularity',ascending=False, ignore_index=True).head(3)
b


In [None]:
## grouping the value of tmdb_popularity based on the runtime and sorting it out in descending order and getting top 10 values
d=dt1.groupby('tmdb_popularity')['runtime'].median().reset_index().sort_values('tmdb_popularity',ascending=False,ignore_index=True).head(10)
d

### What all manipulations have you done and insights you found?

According to me , after doing the manipulation of the dataset we have found the which genres is most popular among the audience in respect to amazon content library and have found how the trend changes as we move along with the time how people's choice get change from one genres to the other .
also find out what people like most in the content and how imdb score and popularity effect the user engagement in these platform .

Also got an idea which country is producing the most number of movies and tv Show and how it treat by the people by taking both values to compare the values.

We also got an idea of runtime , that it matters alot in the user enegagement parameter and popularity and etc. Get to know how the content is distributed in the platform based on various factors like region , score , genres and etc

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***





#### Chart - 1  Pie Chart - Genres  (Univariate)

In [None]:
# Chart - 1 visualization code

df['genres'].value_counts().head(10).plot(kind='pie',
                              figsize=(10,6),
                              autopct="%1.1f%%", )
plt.ylabel('')

##### 1. Why did you pick the specific chart?


A pie chart expresses a part-to-whole relationship in your data. It's easy to explain the percentage comparison through area covered in a circle with different colors. Where differenet percentage comparison comes into action pie chart is used frequently. So, I used Pie chart and which helped me to get the percentage comparision of the dependant variable.

##### 2. What is/are the insight(s) found from the chart?

From the above, we found that among all the genres the movies present on the platform is of genres DRAMA that is 29.5% and second highest genres is of COMEDY that is 17.4%

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from this analysis can create a positive business impact in several ways:

Improved Content Strategy – Understanding which genres dominate Amazon Prime Video helps the platform prioritize investments in high-performing content, ensuring better audience engagement.

Targeted Audience Engagement – Identifying viewer preferences enables personalized recommendations and marketing campaigns, leading to increased user satisfaction and retention.

Optimized Content Acquisition – Insights into content diversity and regional availability help Amazon Prime and content creators make data-driven licensing and production decisions, reducing investment risks.

#### Chart - 2  Kde Plot - tmdb_score(Univariate)

In [None]:
# Chart - 2 visualization code
sns.kdeplot(data=df['tmdb_score'])
plt.title('KDE of tmdb_score')
plt.xlabel('tmdb_score')
plt.show()

##### 1. Why did you pick the specific chart?

A Kernel Density Estimate (KDE) plot is useful for insights because it helps visualize the distribution of continuous data more smoothly than a histogram.It provide Better Understanding of Data Distribution and help in
Identifying Trends and Outliers

##### 2. What is/are the insight(s) found from the chart?

We form an univariate chart in which we have taken the tmdb_score to analysis what tmdb_score is often given to the movies or the TV shows in the dataset .

So we can clearly see that score of  around 6.0 is mostly given to the content on amazon prime video platform

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the KDE plot of IMDb ratings can create a positive business impact in several ways:

Content Quality Assessment – The peak at 6.0 IMDb rating suggests that most shows on Amazon Prime Video cluster around this rating. If higher-rated content is limited, Amazon could focus on acquiring or producing higher-rated shows to improve viewer satisfaction.

Improving User Retention & Engagement – If users prefer highly rated content but the platform offers mostly average-rated shows, engagement may decline. Investing in higher-quality content can boost watch time and subscriber retention.

Optimized Content Recommendation – Understanding the rating distribution allows for better personalized recommendations, ensuring users are directed toward content that aligns with their preferences.

#### Chart - 3 Line Chart - Release_Year vs Runtime (Bivariate)

In [None]:
# Chart - 3 visualization code
sns.lineplot(data=df, x='release_year', y='runtime')
plt.xlabel('Release_year')
plt.ylabel('Runtime')
plt.title('Release_Year vs Runtime')
plt.show()

##### 1. Why did you pick the specific chart?

A line chart is chosen when analyzing trends over time because it effectively visualizes continuous data changes and helps identify patterns.

Visualizing Trends Over Time – A line chart is ideal for showing how Amazon Prime Video’s content library evolves over months or years, such as the number of shows added or removed over time.

Identifying Growth or Decline – It helps track whether the content library is expanding, shrinking, or shifting towards specific genres based on historical data.



##### 2. What is/are the insight(s) found from the chart?

We can see that as time passes the runtime of the year is decreased from 1920 onwards and we can clearly see that runtime b/w 100-200 min have most number of movies and tv shows  

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the runtime vs. release year analysis can create a positive business impact in several ways:

Optimizing Content Length for Audience Preferences – Since most movies and TV shows fall within the 100-200 minute range, Amazon Prime Video can focus on acquiring or producing content within this preferred runtime, ensuring higher engagement and viewer satisfaction.

Adapting to Changing Viewer Habits – The decreasing trend in runtime over time suggests that modern audiences prefer shorter content. Amazon Prime Video can leverage this insight by investing in shorter movies, limited series, and bite-sized content, making it more binge-friendly.





#### Chart - 4 Pie Chart - Production Countries (Univariate)

In [None]:
# Chart - 4 visualization code
df['production_countries'].value_counts().head(5).plot(kind='pie',
                              figsize=(15,6),
                               autopct="%1.1f%%",
                              )

##### 1. Why did you pick the specific chart?

A pie chart expresses a part-to-whole relationship in your data. It's easy to explain the percentage comparison through area covered in a circle with different colors. Where differenet percentage comparison comes into action pie chart is used frequently. So, I used Pie chart and which helped me to get the percentage comparision of the dependant variable.

##### 2. What is/are the insight(s) found from the chart?

From the above, We found that among all the Production Countries that produce movies and Tv shows  present on the platform is and highest production is from  [US]that is 77.4% and second highest Production Countries is  [IN] that is 9.4%

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the production country distribution analysis can create a positive business impact in :

Content Localization & Expansion Strategy – Since the U.S. dominates production (77.4%), Amazon Prime Video may consider expanding non-U.S. content, particularly from growing markets like India (9.4%), to attract a more global audience and increase international subscriptions.

Regional Content Investment – With India as the second-highest content producer, Amazon can increase investment in Indian originals, regional languages, and Bollywood collaborations, further capturing the rapidly growing Indian streaming market.

#### Chart - 5 Count Plot - Type (Univariate)

In [None]:
# Chart - 5 visualization code
sns.countplot(data=df , x ='type' ,palette='pastel')
plt.xlabel('Type')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A count plot is chosen when analyzing categorical data because it effectively shows the frequency of different categories in a dataset.

Easy Visualization of Categorical Data – A count plot helps display how often each category appears, such as the number of movies per genre, production country, or content ratings.

Clear Comparison Between Categories – It allows us to compare different categories at a glance, making it easy to see which genres, production countries, or content types are most prevalent on Amazon Prime Video.

##### 2. What is/are the insight(s) found from the chart?

From the above chart , we found that there are two categories in the type columns that is TV shows and Movies .

Show have count less then 15000 and Movies have count above 90000

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the content type distribution (Movies vs. TV Shows) can create a positive business impact in several ways:

Content Strategy Optimization – Since movies (90,000+) far outnumber TV shows (<15,000), Amazon Prime Video might need to balance its content strategy by investing more in TV shows, which tend to drive long-term user engagement through binge-watching.

Subscriber Retention & Engagement – TV shows typically encourage higher watch time and repeat visits compared to movies. Expanding the TV show catalog could help increase user retention and reduce churn rates.



#### Chart - 6 Line Chart - Trend Over Years (Bivariate)

In [None]:
# Chart - 6 visualization code
sns.lineplot(data=e , y='genres', x='release_year', marker='o')
plt.xlabel('Release_Year')
plt.ylabel('Genres')
plt.show()

##### 1. Why did you pick the specific chart?

A line chart is chosen when analyzing trends over time because it effectively visualizes continuous data changes and helps identify patterns.

Visualizing Trends Over Time – A line chart is ideal for showing how Amazon Prime Video’s content library evolves over months or years, such as the number of shows added or removed over time.

Identifying Growth or Decline – It helps track whether the content library is expanding, shrinking, or shifting towards specific genres based on historical data.

##### 2. What is/are the insight(s) found from the chart?

As we can see above chart over the year ,

The genres of category like fantasy , drama , comedy,romance and etc increase from past years in the platform and loved by the audience too.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the growth of popular genres over the years can create a positive business impact in:

Data-Driven Content Investment – Since fantasy, drama, comedy, and romance have increased over the years and are loved by the audience, Amazon Prime Video can prioritize investing in these genres to maximize engagement and subscriber growth.

Personalized Recommendations & Viewer Retention – With a clear understanding of genre popularity trends, Amazon can enhance its recommendation algorithms, ensuring that users discover content they are more likely to enjoy, boosting watch time and retention.

#### Chart - 7  Bar Chart - Title vs imdb_score (Bivariate)

In [None]:
# Chart - 7 visualization code
sns.barplot(data=t , x='title' , y='imdb_score')
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts show the frequency counts of values for the different levels of a categorical or nominal variable. Sometimes, bar charts show other statistics, such as percentages.


##### 2. What is/are the insight(s) found from the chart?

From the above chart we can clearly see that movie having highest imdb_score have topped the graph and others are lower then the top one

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the IMDb score distribution chart can create a positive business impact in


Quality-Driven Content Investment – Since the highest-rated movie tops the chart, Amazon Prime Video can analyze what makes this movie successful (e.g., genre, cast, director, storyline) and invest in similar high-quality productions to attract more viewers.

Enhanced Content Recommendations – By identifying top-rated content, Amazon can refine its recommendation algorithm to prioritize high IMDb-rated movies, ensuring users engage with premium-quality content, improving retention and satisfaction.

#### Chart - 8 Bar Chart - Title vs tmdb_popularity (Bivariate)

In [None]:
# Chart - 8 visualization code
sns.barplot(data=b , x='title' , y='tmdb_popularity')
plt.xlabel('Title')
plt.ylabel('tmdb_popularity')
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts show the frequency counts of values for the different levels of a categorical or nominal variable. Sometimes, bar charts show other statistics, such as percentages.

##### 2. What is/are the insight(s) found from the chart?

From the above chart we can clearly see that movie having highest tmdb_popularity have topped the graph and others are lower then the top one

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the TMDb popularity distribution can create a positive business impact:


Content Prioritization for Promotion – Since one movie dominates the TMDb popularity chart, Amazon Prime Video can focus on promoting similar high-performing content, ensuring it reaches a wider audience and boosts viewership.

Understanding Audience Preferences – If highly popular movies share common traits (genre, cast, director, theme), Amazon can use these insights to acquire or produce more content aligned with audience interests, increasing engagement.

#### Chart - 9  Line chart -  User Engagement (Bivariate)

In [None]:
# Chart - 9 visualization code
sns.lineplot(data=d , x='tmdb_popularity', y='runtime' , marker='o')
plt.show()

##### 1. Why did you pick the specific chart?

A line chart is chosen when analyzing trends over time because it effectively visualizes continuous data changes and helps identify patterns.

Visualizing Trends Over Time – A line chart is ideal for showing how Amazon Prime Video’s content library evolves over months or years, such as the number of shows added or removed over time.

Identifying Growth or Decline – It helps track whether the content library is expanding, shrinking, or shifting towards specific genres based on historical data.

##### 2. What is/are the insight(s) found from the chart?

From the above we can clearly see that highest popularity goes to that movie or Tv show that have runtine of 100 min and more and lowest goes to when runtime is around 30-35min

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the relationship between runtime and popularity can create a positive business impact:


Optimizing Content Production & Acquisition – Since movies and TV shows with 100+ minutes runtime gain the highest popularity, Amazon Prime Video can focus on acquiring and producing longer content that aligns with audience preferences.

Strategic Investment in Short-Form Content – The lowest popularity for 30-35 minute content suggests that shorter formats may not perform well. Amazon can either limit investments in short content or redefine its approach by targeting niche audiences (e.g., short-form web series or documentary episodes).




#### Chart - 10 Box Plot

In [None]:
# Chart - 10 visualization code
for col in df.describe().columns:
    fig = plt.figure(figsize=(9, 6))
    ax = fig.gca()
    df.boxplot( col, ax = ax)
    ax.set_title('Label by ' + col)

plt.show()

##### 1. Why did you pick the specific chart?

A box plot (also called a box-and-whisker plot) is used because it provides a clear summary of the distribution of numerical data while highlighting key insights such as median, quartiles, outliers, and variability.

##### 2. What is/are the insight(s) found from the chart?

Identifies Data Distribution & Spread – A box plot visually represents how data is distributed, helping understand trends in content attributes like IMDb ratings, runtime, or release years.

Detects Outliers – Box plots make it easy to spot extreme values (outliers).In which there is no outlier in the dataset

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the box plot analysis can create a positive business impact in:


Consistent Content Quality – Since there are no outliers in the dataset, it suggests that IMDb ratings, runtime, or release years follow a stable pattern. This helps Amazon Prime Video maintain a consistent user experience, ensuring content aligns with audience expectations.

Better Content Curation & Recommendation – Understanding data distribution allows for better categorization and personalized recommendations, leading to higher engagement and user satisfaction.

#### Chart - 11 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df, hue='release_year')

##### 1. Why did you pick the specific chart?

Pair plot is used to understand the best set of features to explain a relationship between two variables or to form the most separated clusters. It also helps to form some simple classification models by drawing some simple lines or make linear separation in our data-set.

Thus, I used pair plot to analyse the patterns of data and realationship between the features. It's exactly same as the correlation map but here you will get the graphical representation.

##### 2. What is/are the insight(s) found from the chart?

1.Correlation Between IMDb Ratings and Other Factors

   (a)If IMDb ratings are higher for specific runtime ranges, it suggests that audiences prefer a certain length of content.
   (b)If ratings correlate with production countries, it can indicate which countries produce higher-rated content.

2.Runtime vs. Release Year Trend Confirmation

  (a)If the scatterplot for runtime vs. release year confirms a decreasing trend, it validates our earlier insight that newer movies and TV shows tend to be shorter.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Solution Increase User Engagement**


*   Identify High-Performing Genres
*   Content Gap Analysis
*   User Behavior Trends
*   Popular Content Segmentation
*   Recommendation Model Improvement
*   Comparison with Other Platforms
*   Geographical Trends
*   Exclusive Content Value
*   Seasonal Demand Trends







# **Conclusion**

Through the analysis of Amazon Prime Video’s content library, we have uncovered valuable insights that can drive strategic business decisions. Our findings highlight key aspects such as content diversity, audience preferences, and content performance, providing a data-driven foundation for optimizing content strategy, enhancing user engagement, and increasing subscription growth.

Key from this project include:

Genre & Content Trends – Certain genres dominate the platform, while others show growth potential, indicating opportunities for content acquisition or expansion.

Binge-Watching Behavior: Series with multiple seasons and cliffhanger endings lead to higher engagement, indicating the potential for investing in long-form storytelling.

High-Engagement Genres: Certain genres, such as action, thriller, and drama, consistently attract higher watch times and repeat viewership, suggesting strong audience preference.

Impact of IMDb Ratings on Engagement: Shows with higher IMDb ratings tend to have longer watch durations and lower drop-off rates, reinforcing the importance of quality content.

Invest in high-performing genres and expand exclusive content offerings.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***