<a href="https://colab.research.google.com/github/ishadvay3928/Amazon-Prime-EDA-Project/blob/main/Amazon_Prime_EDA_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Amazon Prime EDA Project**



##### **Project Type**    -  Exploratory Data Analysis(EDA)
##### **Contribution**    - Individual


# **Project Summary -**

Write the summary here within 500-600 words.

# **GitHub Link -**

https://github.com/ishadvay3928/Amazon-Prime-EDA-Project/blob/main/Amazon_Prime_EDA_Project.ipynb

# **Problem Statement**


**To analyze the Amazon Prime dataset and extract valuable insights regarding content trends, genre distributions, ratings and countries contributing to content to support strategic decision-making in content planning and marketing.**

#### **Define Your Business Objective?**

This analysis helps to analyze all shows available on Amazon Prime Video, allowing us to extract valuable insights such as:

- **Content Diversity**: What genres and categories dominate the platform?

- **Regional Availability**: How does content distribution vary across different regions?

- **Trencas Over Time**: How has Amazon Prime's content library evolved?

- **IMDb Ratings & Popularity**: What are the highest-rated or most popular shows on the platform?

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df_credits = pd.read_csv("/content/credits.csv")
df_titles = pd.read_csv("/content/titles.csv")

### Dataset First View

In [None]:
# credits Dataset First Look
df_credits.head()

In [None]:
# titles Dataset First Look
df_titles.head()

### Dataset Rows & Columns count

In [None]:
# titles Dataset Rows & Columns count
df_titles.shape

In [None]:
# credits Dataset Rows & Columns count
df_credits.shape

### Dataset Information

In [None]:
# titles Dataset Info
df_titles.info()

In [None]:
# credits Dataset Info
df_credits.info()

#### Duplicate Values

In [None]:
# titles Dataset Duplicate Value Count
df_titles.duplicated().sum()

In [None]:
# credits Dataset Duplicate Value Count
df_credits.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count of titles dataset
df_titles.isnull().sum()

In [None]:
# Missing Values/Null Values Count of credits dataset
df_credits.isnull().sum()

In [None]:
# Visualizing the missing values of titles dataset
import missingno as msno
msno.bar(df_titles)

In [None]:
# Visualizing the missing values of credits dataset
msno.bar(df_credits)

### What did you know about your dataset?

- There are 9871 rows and 15 columns in the titles dataset.Out of which 8 Columns have missing Values. Column Seasons have most missing values of 8514, then age_certification with 6487 missing values, then tmdb_score with 2082 missing values, and so on. Out of all Description Have least missing values of 119.
-There are 124235 rows and 5 columns in the credits dataset.Out of which only 1 Column have missing Values that is character Column having 16287 missing values.

## ***2. Understanding Your Variables***

In [None]:
# titles Dataset Columns
df_titles.columns

In [None]:
# credits Dataset Columns
df_credits.columns

In [None]:
# titles Dataset Describe
df_titles.describe(include='all')

In [None]:
# credits Dataset Describe
df_credits.describe(include='all')

### Variables Description

**Variable Description For Titles dataset**:

- id: The title ID on JustWatch.

- title: The nome of the title

- show type: TV show or movie.

- description: A brief description,

- release year: The release year.

- age_certification: The age certification,

- runtime: The length of the episode (SHOW) or movie

- genres: A list of genres.

- production countries: A list of countries that produced the title.

- seasons: Number of seasons if it's a SHOW.

- imdb id: The title ID on IMDB.

- imdb score. Score on IMDB.

- imdb_votes: Votes on IMDB.

- tmab popularity: Popularity on TMDB.

- tmdb score: Score on TMDB.

**Variable Description For Credits dataset**:

- person ID: The person ID on JustWatch

- id: The title ID on JustWatch.

- name: The actor or director's name.

- character name: The character name.

- role: ACTOR or DIRECTOR

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable of titles dataset.
df_titles.nunique()

In [None]:
# Check Unique Values for each variable of credits dataset.
df_credits.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Drop Duplicates
df_titles.drop_duplicates(inplace=True)
df_credits.drop_duplicates(inplace=True)

In [None]:
# Merge both datasets on 'id'
df = pd.merge(df_titles, df_credits, on='id', how='left')

In [None]:
# Convert numerical strings to numeric types
df['imdb_score'] = pd.to_numeric(df['imdb_score'], errors='coerce')
df['imdb_votes'] = pd.to_numeric(df['imdb_votes'], errors='coerce')
df['tmdb_score'] = pd.to_numeric(df['tmdb_score'], errors='coerce')
df['tmdb_popularity'] = pd.to_numeric(df['tmdb_popularity'], errors='coerce')


In [None]:
df['type'] = df['type'].astype('category')
df['age_certification'] = df['age_certification'].astype('category')


In [None]:
df.info()

### What all manipulations have you done and insights you found?

***Key Manipulations:***

- Dropped duplicate records from both ***df_titles*** and ***df_credits*** to ensure data quality.

- Merged ***df_titles*** and ***df_credits*** on the common column 'id' using a left join, consolidating content information with credits.

- Converted numeric columns stored as strings ***(imdb_score, imdb_votes, tmdb_score, tmdb_popularity)*** to proper numeric types for accurate computation and analysis.

- Converted categorical columns like ***type (Movie or Show)*** and ***age_certification*** to category data type for memory optimization and efficient analysis.

***Insights Gained:***

- **Data deduplication** ensures there are no repeated entries that might skew analysis or visualizations.

- **Merging titles** and credits enables a unified dataset that allows combined analysis on content metadata (genre, country) and personnel (director, cast).

- Converting scores and popularity metrics into **numeric form** prepares the data for:

    - Trend analysis (e.g., comparing IMDb vs TMDb ratings)
    - Correlation and regression modeling
    - Popularity distribution insights

- **Categorical encoding** of ***type*** and ***age_certification*** facilitates:

    - Efficient filtering for Movies vs Shows
    - Grouped comparisons across different content ratings

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 (Distribution of Content Type)

In [None]:
# Distribution of Content type
type_counts = df['type'].value_counts()

plt.figure(figsize=(10,6))
plt.pie(type_counts, labels=type_counts.index, autopct='%1.1f%%', startangle=70, colors=['#ff9999','#66b3ff'])
plt.title('Distribution of Movies vs TV Shows on Amazon Prime')
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart is effective for visualizing the proportion between two Types—here, Movie and Show. It quickly communicates which format dominates.

##### 2. What is/are the insight(s) found from the chart?

The proportion of movies on Amazon Prime significantly exceeds the number of Shows, showing a heavier investment in movie content.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. This helps Amazon Prime evaluate whether diversifying into more shows can improve user retention, especially since series often boost engagement. However, over-investment in movies without enough long-form content may limit binge-watching behavior, potentially reducing user time-on-platform.

#### Chart - 2 (Top 10 countries)

In [None]:
# Top 10 countries by content production
import ast

df['production_countries'] = df['production_countries'].fillna('[]').apply(ast.literal_eval)
exploded_countries = df.explode('production_countries')
top_countries = exploded_countries['production_countries'].value_counts().head(10)

# Plot
plt.figure(figsize=(10,5))
sns.barplot(x=top_countries.values, y=top_countries.index, palette='viridis')
plt.title('Top 10 Countries by Content Production on Amazon Prime')
plt.xlabel('Count')
plt.ylabel('Country')
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

A horizontal bar chart clearly visualizes which countries contribute the most content.

##### 2. What is/are the insight(s) found from the chart?

The United States dominates content production on Amazon Prime, followed by United Kingdom of Great Britain and Northern Ireland, and others. The U.S. alone accounts for a massive portion of the total content.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, this insight can help Amazon Prime localize strategies. It indicates strong content sourcing from the U.S. and markets like United Kingdom of Great Britain and Northern Ireland. However, lack of content diversity from other regions may limit market penetration in underrepresented geographies, potentially restricting user acquisition.

#### Chart - 3 (top 10 directors)

In [None]:
# top 10 directors

directors = df[df['role'] == 'DIRECTOR']
director_counts = directors['name'].dropna().str.split(', ').explode().value_counts().head(10)

# Plot
plt.figure(figsize=(10, 5))
sns.barplot(x=director_counts.values, y=director_counts.index, palette="coolwarm")
plt.title("Top 10 Directors on Amazon Prime")
plt.ylabel("Titles Directed")
plt.xlabel("Directors")
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

horizontal Bar Chart identifies the most active and possibly trusted directors associated with Amazon Prime.

##### 2. What is/are the insight(s) found from the chart?

Top directors who repeatedly appear, indicates trust and investment by Amazon Prime in specific creations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, this shows Reliable directors helps to scale production quickly. Due to this, there is a risk of similar content themes or limited variety in storytelling which can adversely affect the growth.

#### Chart - 4 (Top 10 Actors)

In [None]:
# Top 10 Most Frequent Actors
actors = df[df['role'] == 'ACTOR']
actor_counts = actors['name'].dropna().str.split(', ').explode().value_counts().head(10)

# Plot
plt.figure(figsize=(10, 5))
sns.barplot(x=actor_counts.values, y=actor_counts.index, palette="viridis")
plt.title("Top 10 Actors on Amazon Prime")
plt.ylabel("Appearances")
plt.xlabel("Actors")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Bar plots help identify actors frequently featured in content over Amazon Prime which is good for partnership and marketing.

##### 2. What is/are the insight(s) found from the chart?

Certain actors appear far more often than others, indicating star preferences or contracts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights indicates that Star power can bring viewership and Help in casting popular faces.However Overuse may lead to decline in views as viewers may crave for new talent/faces.

#### Chart - 5 (Popular Topics on Amazon Prime)

In [None]:
# Popular Topics on Amazon Prime
from wordcloud import WordCloud
text = ' '.join(df['description'].dropna().tolist())

plt.figure(figsize=(12,5))
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Popular Topics on Amazon Prime')
plt.show()

##### 1. Why did you pick the specific chart?

WordCloud gives a quick, creative overview of common themes, genres, or topics Amazon Prime content revolve around.

##### 2. What is/are the insight(s) found from the chart?

Top topics are life, love,find etc. This Indicates user interest and Amazon Prime’s content production strategy.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights shows Amazon Prime investment in topics like life,love and find. Lesser visibility of niche genres like Crime, Murder, Mysterious etc. can impact negatively in markerting.

#### Chart - 6 (top 10 genres)

In [None]:
# top 10 genres on Amazon Prime

import ast
from collections import Counter

# Convert stringified list to actual list
df['genres'] = df['genres'].fillna('[]').apply(ast.literal_eval)

# Flatten the genre list
flat_genres = [genre.strip() for sublist in df['genres'] for genre in sublist]

# Count top 10 genres
top_genres = Counter(flat_genres).most_common(10)
genres, counts = zip(*top_genres)

# Plotting
plt.figure(figsize=(10,5))
sns.barplot(x=list(counts), y=list(genres), palette='magma')
plt.title('Top 10 Genres on Amazon Prime')
plt.xlabel('Count')
plt.ylabel('Genre')
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

A horizontal bar chart is perfect for showcasing frequency distribution of categorical data like genres.

##### 2. What is/are the insight(s) found from the chart?

The most common genres on Netflix include Drama,Comedy and Thriller. These dominate the graph, suggesting user demand is high in these categories.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Understanding genre preferences can help Amazon Prime personalize recommendations and focus future content investments. However, over-representation in certain genres (e.g., dramas) could lead to audience fatigue and underrepresentation of niche interests, which might limit audience diversity.

#### Chart - 7 (Age Certification Distribution)

In [None]:
# Age Certification Distribution
plt.figure(figsize=(8,5))
sns.countplot(data=df_titles, x='age_certification', order=df_titles['age_certification'].value_counts().index, palette='Set1')
plt.title("Distribution of Age Certifications")
plt.xlabel("Age Certification")
plt.ylabel("Count")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This bar chart effectively shows the count of Age Certification Category.

##### 2. What is/are the insight(s) found from the chart?

Insights reveal "R" i.e  Restricted - is the most common certification, followed by "PG-13" and "PG," with other certifications being significantly less frequent.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, This knowledge can positively impact content production and marketing by targeting specific audiences; however, over-relying on "R" ratings could limit broader audience appeal, leading to negative growth.

#### Chart - 8 (IMDb Rating Distribution)

In [None]:
# IMDb Rating Distribution

plt.figure(figsize=(8,5))
sns.histplot(df['imdb_score'].dropna(), bins=20, kde=True, color='skyblue')
plt.title('Distribution of IMDb Scores')
plt.xlabel('IMDb Score')
plt.ylabel('Count')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A histogram was chosen because it effectively displays the distribution of a single numerical variable (IMDb Scores), showing the frequency of scores within different ranges.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that IMDb scores are mostly concentrated between 5.0 and 7.5, with the highest frequency around 6.0-6.5.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, insights like the common score range (5.5-7.5) can guide content strategy for positive impact. However, over-focusing on average scores and ignoring high-quality niches could lead to negative growth due to lack of differentiation and missed opportunities.

#### Chart - 9 (Release Year Distribution)

In [None]:
# Release Year Distribution
release_counts = df['release_year'].value_counts().sort_index()

plt.figure(figsize=(25,6))
sns.lineplot(x=release_counts.index, y=release_counts.values, marker='o', color='teal')
plt.title('Amazon Prime Content Released Over the Years')
plt.xlabel('Release Year')
plt.ylabel('Count')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

This line chart is ideal for showing "Amazon Prime Content Released Over the Years" because it effectively visualizes trends over time.

##### 2. What is/are the insight(s) found from the chart?

The key insight is a significant surge in Amazon Prime content released, peaking around 2019-2020, followed by a sharp decline.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight can positively impact content acquisition strategies by showing the need for more recent content, but the decline could indicate a reduction in new releases, potentially leading to negative subscriber growth if not addressed.

#### Chart - 10 (Content Type over Runtime and IMDb Score)

In [None]:
# Content Type over Runtime and IMDb Score
plt.figure(figsize=(8,5))
sns.scatterplot(data=df, x='runtime', y='imdb_score', hue='type', alpha=0.7)
plt.title('Content Type over Runtime and IMDb Score')
plt.xlabel('Runtime (minutes)')
plt.ylabel('IMDb Score')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This scatter plot effectively shows the relationship between "Runtime," "IMDb Score," and "Content Type" (Movie/Show).

##### 2. What is/are the insight(s) found from the chart?

The main insights are that movies generally have longer runtimes and a wider range of IMDb scores, while shows are typically shorter with scores mostly between 5 and 9.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, These Insights can positively inform content acquisition and production strategies by identifying content gaps, but ignoring one type's potential could lead to negative growth by limiting audience reach.

#### Chart - 11 (Content Type over Time)

In [None]:
# Content Type over Time
release_counts = df.groupby(['release_year', 'type']).size().unstack().fillna(0)

release_counts.plot(kind='line', figsize=(20, 6), linewidth=2)
plt.title('Content Type over Time')
plt.xlabel('Year')
plt.ylabel('Count')
plt.legend(title='Type')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

This line chart is ideal for visualizing "Content Type over Time" because it effectively displays the trends and counts of both the types over time

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that movie content consistently outnumbered shows until around 2017, after which both saw a sharp increase, with movies peaking significantly around 2019-2020 before a recent decline. Show content has also increased, but at a much slower rate and remains far less frequent than movies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can positively impact content strategy by highlighting the high demand for movies in recent years. However, the recent sharp decline in both movie and show content could lead to negative growth if not addressed, as a shrinking content library might reduce subscriber retention or acquisition.

#### Chart - 12 (IMDb Score vs Runtime with Votes and Type)

In [None]:
# IMDb Score vs Runtime with Votes and Type

plt.figure(figsize=(20,10))
sns.scatterplot(
    data=df_titles,
    x='runtime',
    y='imdb_score',
    hue='type',
    size='imdb_votes',
    sizes=(20, 200),
    alpha=0.7
)
plt.title('IMDb Score vs Runtime with Votes and Type')
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

This scatter plot with varied dot sizes (bubble chart) is chosen to display the relationships between "Runtime," "IMDb Score," "Content Type," and "IMDb Votes" simultaneously. It effectively reveals correlations and clusters among these four variables.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that "SHOW" (blue) are generally shorter with high IMDb scores, while "MOVIE" (orange) have a wider range of runtimes and scores. Higher IMDb votes (larger dots) are concentrated among both movies and shows with good IMDb scores (above 6).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can guide content acquisition towards popular high-rated movies and shows for positive impact. However, ignoring content types or score ranges with fewer votes (smaller dots) could lead to negative growth by missing niche audiences or underperforming segments.

#### Chart - 13 (TMDB Popularity vs Runtime with IMDb Score)

In [None]:
plt.figure(figsize=(10,6))
sns.scatterplot(
    data=df_titles,
    x='runtime',
    y='tmdb_popularity',
    hue='type',
    size='imdb_score',
    sizes=(20, 200),
    alpha=0.7
)
plt.title("TMDB Popularity vs Runtime with IMDb Score")
plt.xlabel("Runtime (minutes)")
plt.ylabel("TMDB Popularity")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

This bubble chart is suitable here as it simultaneously visualizes the relationship between runtime and popularity, differentiates by content type (SHOW/MOVIE), and uses bubble size to represent IMDb score, effectively showing multi-dimensional data.

##### 2. What is/are the insight(s) found from the chart?

Highly popular content (high TMDB Popularity) often includes shorter shows and medium-length movies, frequently with high IMDb scores (larger bubbles). While longer movies exist, their popularity tends to be lower compared to the popular clusters of shorter content.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can guide content acquisition towards popular, often shorter, high-rated content for positive business impact. However, focusing solely on the most popular items and neglecting potentially growing, less popular genres or longer formats could lead to negative growth by limiting audience diversification.

#### Chart - 14 (Top Genres by Country and Type)

In [None]:
# Top Genres by Country and Type
import ast

# Prepare the DataFrame
df_genre = df.dropna(subset=['genres', 'production_countries', 'type', 'title']).copy()

# Explode both genres and countries
df_genre = df.explode('genres').explode('production_countries')

# Focus on top 5 countries and top 5 genres for clarity
top_countries = df['production_countries'].value_counts().head(5).index
top_genres = df['genres'].value_counts().head(5).index

# Filter DataFrame
filtered = df_genre[
    df['production_countries'].isin(top_countries) &
    df['genres'].isin(top_genres)
]

# Pivot table: index = genre, columns = (country, type)
genre_country_type = pd.pivot_table(
    filtered,
    index='genres',
    columns=['production_countries', 'type'],
    values='title',
    aggfunc='count',
    fill_value=0
)

# Heatmap
plt.figure(figsize=(14, 6))
sns.heatmap(genre_country_type, annot=True, fmt='d', cmap='YlGnBu')
plt.title('Top Genres by Country and Type (Heatmap)')
plt.xlabel('Country & Type')
plt.ylabel('Genre')
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

This heatmap is ideal for visualizing "Top Genres by Country and Type" because it effectively displays the frequency of each genre across different country-type combinations using color intensity, making high-count categories immediately apparent.

##### 2. What is/are the insight(s) found from the chart?

US-MOVIES dominate content creation across all listed genres, especially Drama and Comedy. India (IN-MOVIE) is also a significant producer of Drama, Comedy, and Romance movies, while other country-type combinations have considerably fewer entries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, focusing content acquisition and production on popular US and Indian movies, especially Drama and Comedy, can yield positive business impact. However, neglecting less-dominant country-type combinations or niche genres could lead to negative growth by missing opportunities in underserved markets.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

- Focus on US and Indian content, especially in genres like Drama, Comedy, and Romance, since they’re very popular.
- Choose content with IMDb ratings between 5.5 and 7.5 – it’s generally liked by viewers and can help grow your audience.
- Shorter TV shows are trending, so adding more of them can increase user engagement and attract new subscribers.
- There has been a drop in new releases after 2020, so it’s a good idea to bring in more fresh content to fill that gap.
- While focusing on popular content, also add content from other countries and lesser-known genres to reach niche viewers and expand your audience.
- Use age certifications (like R, PG-13, PG) to understand your audience better and target marketing by age groups.
- Offer a mix , longer movies for deep stories and shorter shows for quick binge-watching, depending on what people like.
- Keep an eye on both TMDB Popularity and IMDb scores – sometimes, a show may not have the highest rating but is still very popular.

# **Conclusion**

Based on the analysis, the content landscape is dominated by US and Indian movies, particularly in Drama, Comedy, and Romance genres, with a strong preference for IMDb scores between 5.5 and 7.5. While there has been a recent surge and subsequent decline in overall content releases post-2020, popular content often comprises shorter shows and medium-length movies, frequently with high TMDB popularity and IMDb scores. To achieve business objectives, the client should strategically prioritize content acquisition in these high-demand categories and regions, while also diversifying to cater to niche markets and address the recent content decline, ultimately aiming for a balanced portfolio that maximizes audience engagement and subscriber growth.