In [1]:
import pandas as pd
import altair as alt
import altair.expr as expr

In [2]:
df = pd.read_csv("Movies_Full_With_Genres.csv")
df = df.dropna(subset=["Title", "Year"])


In [3]:

if "Rotten Tomatoes" in df.columns:
    df["Rotten Tomatoes"] = df["Rotten Tomatoes"].astype(str).str.extract(r"(\d+)").astype(float)


In [4]:
# Stacked Area Chart
# "How has the volume of content released by each major streaming platform changed over time (2000–2021)?"
platform_cols = ["Netflix", "Hulu", "Prime Video", "Disney+"]
melted_df = df.melt(id_vars=["Title", "Year"], value_vars=platform_cols, 
                   var_name="Platform", value_name="Available")
melted_df = melted_df[melted_df["Available"] == 1]

melted_df = melted_df[~((melted_df['Platform'] == 'Disney+') & (melted_df['Year'] < 2019))]

yearly_counts = melted_df.groupby(["Year", "Platform"]).size().reset_index(name="Count")
yearly_counts = yearly_counts[(yearly_counts["Year"] >= 2000) & (yearly_counts["Year"] <= 2021)]

area_chart = alt.Chart(yearly_counts).mark_area().encode(
    x=alt.X("Year:O", title="Year", axis=alt.Axis(labelAngle=0)),
    y=alt.Y("Count:Q", stack="normalize", title="Proportion of Total Content"),
    color=alt.Color("Platform:N", title="Platform"),
    tooltip=["Year", "Platform", "Count"]
).properties(
    width=700,
    height=400,
    title="Content Growth Over Time by Platform (2000–2021)"
).interactive()

area_chart


The stacked area chart titled **"Content Growth Over Time by Platform (2000–2021)"** visualizes the proportion of total content available on each streaming platform year over year.

**Netflix's** meteoric rise to dominance: Netflix grew from a minor share of content in the early 2000s to the dominant platform by 2021, holding roughly two-thirds of total content by the end of the period. It surpassed the previous leader (Amazon's Prime Video) in the mid-2010s, marking a major shift in platform dominance.
    
**Prime Video's** early lead and decline: Amazon Prime Video initially commanded the largest share of content (dominant in the early 2000s) but saw its proportion steadily erode over time. By 2021, Prime Video's share had fallen to around one-fifth of the total, as Netflix's expansion (and the arrival of new entrants) ate into its early lead.
    
**Hulu's** modest, steady presence: Hulu remained a relatively small player throughout the timeline. After its launch in the late 2000s, Hulu's content share rose modestly (peaking in the low-teens percentage of total content) but then stabilized and slightly declined by 2021, never approaching the levels of Netflix or Prime Video.
    
**Disney+'s** late entry and quick uptake: Disney+ entered the market in 2019 and quickly captured a noticeable slice of the content share. By 2021, it contributed only a few percent of total content -- a rapid ascent for a new platform -- but remained far behind the established giants in terms of share.

Overall Insight: Netflix’s aggressive growth reshaped the streaming landscape, overtaking Prime Video’s early lead and establishing clear dominance by 2021.
Hulu maintained a modest but stable presence, while Disney+ quickly captured market share after its 2019 launch, signaling ongoing disruption in the industry.

In [5]:
#HeatMap
# Which platforms specialize in which genres?
df = df.dropna(subset=["Genres", "Rotten Tomatoes"])
df["Genres"] = df["Genres"].str.split(", ")
df = df.explode("Genres")

platform_df = df.melt(id_vars=["Title", "Year", "Genres", "Rotten Tomatoes"], 
                      value_vars=platform_cols,
                      var_name="Platform", value_name="Available")
platform_df = platform_df[platform_df["Available"] == 1]

heatmap_df = platform_df.groupby(["Genres", "Platform"])["Rotten Tomatoes"].mean().reset_index()

heatmap = alt.Chart(heatmap_df).mark_rect().encode(
    x=alt.X("Platform:N", title="Platform"),
    y=alt.Y("Genres:N", title="Genre"),
    color=alt.Color("Rotten Tomatoes:Q", scale=alt.Scale(scheme='blues'), title="Avg. RT Rating"),
    tooltip=["Platform", "Genres", "Rotten Tomatoes"]
).properties(
    width=600,
    height=400,
    title="Average Rotten Tomatoes Rating by Genre and Platform"
)

heatmap



The heatmap titled **"Average Rotten Tomatoes Rating by Genre and Platform"** visualizes the average critic scores across various genres for four major streaming platforms. **Darker blue shades** represent **higher average Rotten Tomatoes ratings**, indicating stronger critical reception and genre specialization. 

**Disney+**:  
Disney+ performs strongly in **Action**, **Fantasy**, **Science Fiction**, and **Adventure**, with average ratings consistently in the **65–70 range**, highlighting its focus on high-quality, audience-friendly genres.

**Hulu**:  
Hulu’s standout genres are **Adventure**, **History**, and **War**, all exhibiting higher ratings (darker shades), suggesting Hulu’s strength in more mature and historically driven narratives.

**Netflix**:  
Netflix shows its strongest performance in **History**, **Western**, and **War** genres, with these categories receiving higher critic scores compared to others like **Thriller** and **Drama**.

**Prime Video**:  
Prime Video generally displays **lighter shades** across most genres, indicating comparatively **lower average ratings** overall. However, within its platform, **History** emerges as the highest-rated genre.

**Overall Insight**: Genre-Specific Strengths Drive Platform Differentiation
The heatmap highlights how each streaming platform leverages **genre-specific strengths** to build its critical reputation. These differentiated content strategies play a crucial role in **platform positioning** and **audience targeting** in an increasingly competitive streaming landscape.

In [6]:
# Bubble chart
#How does genre popularity compare across different platforms?
melted_df = df.melt(
    id_vars=["Title", "Genres", "Rotten Tomatoes"],
    value_vars=["Netflix", "Hulu", "Prime Video", "Disney+"],
    var_name="Platform",
    value_name="Availability"
)

melted_df = melted_df[melted_df["Availability"] == 1]

bubble_df = melted_df.groupby(["Platform", "Genres"]).agg(
    avg_rating=("Rotten Tomatoes", "mean"),
    count=("Title", "count")
).reset_index()
platform_options = ["All"] + sorted(bubble_df["Platform"].dropna().unique().tolist())
genre_options = ["All"] + sorted(bubble_df["Genres"].dropna().unique().tolist())

platform_dropdown = alt.binding_select(options=platform_options, name="Platform: ")
genre_dropdown = alt.binding_select(options=genre_options, name="Genre: ")

platform_select = alt.selection_point(fields=["Platform"], bind=platform_dropdown, value="All", name="PlatformSelect")
genre_select = alt.selection_point(fields=["Genres"], bind=genre_dropdown, value="All", name="GenreSelect")

bubble_chart = alt.Chart(bubble_df).mark_circle().encode(
    x=alt.X("Genres:N", title="Genre"),
    y=alt.Y("avg_rating:Q", title="Average Rotten Tomatoes Rating"),
    size=alt.Size("count:Q", title="Number of Titles", scale=alt.Scale(range=[20, 1000])),
    color=alt.Color("Platform:N", title="Platform"),
    tooltip=["Genres", "Platform", "avg_rating", "count"]
).add_params(
    platform_select,
    genre_select
).transform_filter(
    "(PlatformSelect.Platform == 'All' || datum.Platform == PlatformSelect.Platform) && "
    "(GenreSelect.Genres == 'All' || datum.Genres == GenreSelect.Genres)"
).properties(
    width=800,
    height=450,
    title="Interactive Bubble Plot: Genre vs Rating vs Platform"
)

bubble_chart

The interactive bubble plot visualizes **genre popularity through a combination of three dimensions: average rating (Y-axis), genre (X-axis), and the number of titles (bubble size)**. This chart provides insight into both quantity and quality across platforms per genre. 

**Disney+**: High Quality, Selective Quantity

Disney+ consistently achieves higher average Rotten Tomatoes ratings across genres such as **Action, Science Fiction, and Adventure**. However, the number of titles remains relatively limited, reflected by the smaller bubble sizes.

**Netflix**: High Volume, Moderate Ratings

Netflix dominates in terms of the number of titles across almost all genres, as seen by the large bubble sizes. However, its average Rotten Tomatoes ratings often fall into a moderate range, particularly in genres like **Drama, Crime, and Horror**.

**Hulu**: Strong in Niche Genres

Hulu occupies a middle-ground position, showing relatively strong critical reception in select genres such as **War, Western, and History**. Although it offers fewer titles compared to Netflix, its quality in specific genres is notable.

**Prime Video**: Lower Ratings with Historical Strength

Prime Video generally appears toward the lower end of the average Rotten Tomatoes ratings across most genres. However, the **History** genre emerges as a relative strength compared to its performance in other categories.

**Overall Insight**: Trade-Off Between Quantity and Quality

The bubble chart highlights that streaming platforms adopt different content strategies: Disney+ focuses on critically acclaimed but limited catalogs, Netflix prioritizes large-scale content production, and Hulu and Prime Video develop niche strengths to compete.

In [7]:
# Pie_Chart
# What is the internal genre composition of each platform's content?
melted_df = df.melt(
    id_vars=["Title", "Genres"],
    value_vars=["Netflix", "Hulu", "Prime Video", "Disney+"],
    var_name="Platform",
    value_name="Availability"
)


melted_df = melted_df[melted_df["Availability"] == 1]


melted_df["Platform_Filter"] = melted_df["Platform"]


combined_genre_dist = (
    melted_df.groupby(["Platform", "Genres", "Platform_Filter"])
    .agg(Count=("Title", "count"))
    .reset_index()
)


platform_dropdown = alt.selection_point(
    fields=["Platform_Filter"],
    bind=alt.binding_select(
        options=["All"] + sorted(combined_genre_dist["Platform_Filter"].unique()),
        name="Platform: "
    ),
    name="PlatformSelect",
    value="All"
)


hover = alt.selection_point(
    on="mouseover",
    fields=["Genres"],
    empty="none"
)


pie_chart = alt.Chart(combined_genre_dist).mark_arc(innerRadius=50).encode(
    theta=alt.Theta("Count:Q"),
    color=alt.Color("Genres:N", title="Genre", scale=alt.Scale(scheme="category20")),
    opacity=alt.condition(hover, alt.value(1.0), alt.value(0.5)),
    stroke=alt.condition(hover, alt.value("white"), alt.value("transparent")),
    strokeWidth=alt.condition(hover, alt.value(3), alt.value(0)),
    tooltip=["Platform", "Genres", "Count"]
).add_params(
    platform_dropdown,
    hover
).transform_filter(
    "(PlatformSelect.Platform_Filter == 'All') || (datum.Platform_Filter == PlatformSelect.Platform_Filter)"
).properties(
    width=450,
    height=450,
    title="Genre Distribution by Platform (Including All)"
)

pie_chart

The donut-style pie chart **"Genre Distribution by Platform (Including All)"** offers a breakdown of genre share per platform. When the dropdown is set to "All", the chart reveals that drama is the most dominant genre across the entire streaming landscape. This genre occupies the largest slice of the pie regardless of platform, indicating its universal appeal.

**Disney+:**  
In terms of content volume, the **Family** genre is the most dominant, accounting for **552 titles** and occupying a significant portion of the pie chart. It is followed by **Comedy** with **425 titles** and **Animation** with **335 titles**.  
This distribution aligns with Disney+'s brand identity, focusing on family-friendly, lighthearted, and animated content to cater to a broad audience, particularly children and families. Genres like Drama, Adventure, and Fantasy also exist but contribute less significantly compared to Family, Comedy, and Animation.

**Netflix:**  
On Netflix, the genre distribution is skewed towards more mature and diverse storytelling. **Drama** dominates the catalog with **1,482 titles**, followed closely by **Comedy** with **1,428 titles**, and then **Romance** with **607 titles**.  
This reflects Netflix’s emphasis on a wide variety of content targeting adult audiences, offering a rich mix of emotional narratives, humor, and romantic stories.

**Hulu:**  
On Hulu, **Drama** is the most dominant genre, followed by **Comedy**.  
The platform also has strong representation in **Thriller**, **Horror**, **History**, and **Crime**, reflecting its focus on mature and suspenseful storytelling.  
**Family** and **Animation** genres are less prominent, indicating that Hulu primarily targets adult audiences with a diverse and serious content lineup.

**Prime Video:**  
On Prime Video, **Drama** is the most dominant genre, with **1,693 titles**, followed by **Comedy** with **1,142 titles**.  
**Thriller** comes next with **748 titles**, closely followed by **Action** (**643 titles**) and **Romance** (**631 titles**).  
This distribution shows Prime Video's strong focus on **mature, action-packed, and suspenseful content**, offering a wide variety across dramatic storytelling, thrillers, and romantic narratives.  
Compared to Disney+, Netflix, and Hulu, Prime Video stands out with a **more balanced mix across Drama, Comedy, Thriller, and Action**, appealing to a diverse, global audience.

Overall, while **Drama** remains the foundational genre across the streaming landscape, platforms differentiate themselves by strategically prioritizing other genres to carve out unique brand identities and cater to specific audience segments.



In conclusion, Netflix wins on volume, Disney+ wins on quality in selected genres, and Hulu carves out critical success in niche categories.
 and prime video had a pretty constant take overall. This multidimensional analysis highlights that quantity alone isn't enough — platforms strategically position themselves through genre focus and critical acclaim.
