# Project : Netflix Content Trends Analysis

- **Course:** COMP 6934 - Intro to Data Visualization
- **Student Name :** Tofa Hossain 
- **Student ID:** 202482200
- **Dataset:** Netflix Interactive
- **Dataset Link:** (https://www.kaggle.com/datasets/willianoliveiragibin/netflix-interactive)  

## 1. Introduction

**General Area of Study:**  
The dataset analyzes Netflix's content catalog, focusing on genre distributions, recommendation networks, and content diversity across languages and maturity ratings. It examines how movies and TV shows are categorized and interconnected through Netflix's recommendation system, while tracking temporal trends in content releases from 1962 to 2025. This analysis provides insights for content strategy, user experience design, and market research in the entertainment streaming industry.

**Data Sources:** 
I have used a dataset(Primary) "Netflix_Data_new.csv" and for the vizualisation demand I needed to create another dataset(seceondary) "genre_audio_recommendations.csv", which is  an edited version of the primary dataset.

*What the dataset is about and how it was gathered?*

The Primary dataset contains metadata about movies and TV shows available on Netflix, including
- Approximately 5500-6404 unique movies and TV shows
- Captures entertainment trends across six decades(1962-2025)
- Contains content from many original languages
- Content Types:
  - Mix of films (majority) and TV series
  - International and domestic productions

Exact collection method of the dataset is not specified in the resource of dataset. It is a public dataset and can be found in Kaggle. The reference link :
[Netflix Interactive](https://www.kaggle.com/datasets/willianoliveiragibin/netflix-interactive)

The file is less than 2MB, and I have stored it locally in the project file. 

**The client (real or imaginary) and their interest in the data:**
The **hypothetical client** for this project is **Netflix Content Analyst**. They want to better understand the genre and recommendation trends over time. The goal is to design visualizations that will help the team understand user behavior, identify popular genres and movie recommendations, and potentially guide their content acquisition and recommendation strategies.
- Understand genre popularity trends over time
- Identify which genres receive the most recommendations
- Analyze which genres offer multilingual audio for global appeal

**Final Submission:**

The submitted project folder contains all necessary files for the project.

- `Bar_Chart_Race.html` | Animated Bar Chart Race *(Design 1)*
- `Donut_Chart_Recommendations.html` | Animated Donut Chart *(Design 2)*
- `Heatmap_Audio_Recommendations` | Genre Audio Recommendations Heatmap *(Design 3)*
- `genre_audio_recommendations.csv` | Data preparation dataset file for Bar chart race
- `Netflix_Data_new.csv` | Main Dataset
- `Documentation.ipynb` | Analysis and through *Documentation* of the Project 
- `data_preparation3.ipynb` | Data preprocessing file used for Heatmap
- `projectEvalution.ipynb` | Professor’s instructions on evaluation
- `projectInstructions` | professor’s instructions

*The viewer should expect:*

- 3 working visualizations: Bar Chart Race, Donut Chart, Heatmap
- All visuals align with Munzner’s principles and professor’s instructions
- Clear Munzner’s Analysis breakdowns in `Documentation.ipynb`
- Clean, Clear and well commented code 

No external dataset download, API, or web server is needed.

---

###  To Run Visualizations:

1. Open html files using any browser (**Chrome, Firefox, Edge**)

*If any visualization seems not loaded fully, please reload*

---


# 2. Dataset Details

### Preliminary Examination:
- Cleaned(ignored in visualizations that did not require them) missing values in Original Audio and Main Genre  
- Extracted Original Audio Count by splitting language strings 
- Extracted Recommendation Count by splitting recommendation strings
- Transformed categorical values from [Main Genre] into individual columns representing each unique category

***Further description of the data if necessary***

The Netflix dataset's key characteristics were outlined in the Introduction section. To avoid redundancy, I will explore its specific applications when discussing the Munzner framework and visualizations.

### A complete Munzner WHAT analysis for all data items in the dataset

| Attribute        | Type                     | Description                                  |
|------------------|--------------------------|----------------------------------------------|
| N_id       | Nominal    | Movie/TV show unique id          |
| Main Genre       | Categorical, Nominal     | Movie/TV show genre          |
| Sub Genres       | Categorical, Nominal     | Movie/TV show sub genres          |
| Release Year     | Ordered, Sequential      | Temporal data                   |
| Original Audio   | Categorical, Nominal     | Language options (e.g., English)            |
| Recommendations  | Network                  | Linked titles (counted per genre)           |

*`N_id` and `Sub Genre` attributes are not relevent for the designs implemented*

#### Type of Dataset and Structure Identified    
- The datasets are **multivariate**.  
- The structure is **tabular**.
  

### Observations on Data Curation Techniques

### Techniques applied to prepare data for visualization, as evidenced in the code.

---

## a. Aggregation & Filtering

**File: `Donut_Chart_Recommendations.html`**
- **Grouping by Genre**: Recommendations are summed per Main Genre.
- **Threshold Filtering**: Genres with <3% of total recommendations are grouped into `"Others"`.
- **Normalization**: Percentages calculated for proportional representation.

**File: `Bar_Chart_Race.html`**
- **Cumulative Counts**: Yearly genre counts are aggregated cumulatively.
- **Dynamic Filtering**: Genres can be toggled via checkboxes (e.g., `selectedGenres`).

---

## b. Network Data Flattening

**File: `Donut_Chart_Recommendations.html`**
- The `Recommendations` column (network data) is flattened into counts per genre by splitting comma-separated IDs
  `row['Recommendations'].split(',')`
  
## c. Derived Attributes
**File: `Heatmap_Audio_Recommendations.html`:**

- Relative Metrics: `Single Audio Recs` and `Multi Audio Recs` are compared via a heatmap.

- Color Scaling: Sequential blue scale (`d3.interpolateBlues`) encodes recommendation counts.

## d. Temporal Handling
**File: `Bar_Chart_Race.html`:**

- Time-Series Animation: Data is partitioned by `Release Year` for animated transitions.

- Pseudo-3D Perspective: Uses `perspectiveAngle` and `barDepth` to simulate depth over time.

## e. Missing Data Handling
**File: Donut_Chart_Recommendations.html:**

- Explicit check for empty `Recommendations` (`if (row['Recommendations'] && row['Recommendations'].trim() !== ''`).




## 3. Project Goals & Objectives


## Q1. Key Questions Posed for the Netflix Project  
**Client**: Netflix Content Analyst  
**Goal**: Optimize recommendation algorithms and content strategy by understanding genre-based patterns.  

### Question 1  
*Which genres receive the most recommendations, and how are they distributed?*  
**Client Motivation**: Identify high-value genres to prioritize in recommendation algorithms and content acquisition.  
**Visualization Link**: `Donut_Chart_Recommendations.html` (Donut Chart + Table)  

### Question 2  
*How do contents with single-audio vs. multi-audio recommendations vary by genre?*  
**Client Motivation**: Allocate resources for dubbing/localization based on genre-specific audio preferences.  
**Visualization Link**: `Heatmap_Audio_Recommendations.html` (Heatmap)  

### Question 3  
*How has the cumulative popularity of genres evolved over time?*  
**Client Motivation**: Track long-term genre trends to guide content production and licensing.  
**Visualization Link**: `Bar_Chart_Race.html` (Animated Bar Race)  

---

## Q2. Munzner-Style WHY Analysis  

### For Question 1 (`Donut_Chart_Recommendations.html`)  
| **Action**   | **Target**         | **Level**  | **Explanation**                                                                 |  
|--------------|--------------------|------------|---------------------------------------------------------------------------------|  
| **Summarize** | Genre distribution | Aggregate  | Donut chart shows % of total recommendations per genre.                           |  
| **Compare**   | Genre performance  | Group      | Table ranks genres by absolute counts and percentages.                          |  
| **Query**     | Specific genre     | Cell       | Tooltip/hover reveals exact counts and % for focused comparison.                |  

### For Question 2 (`Heatmap_Audio_Recommendations.html`)  
| **Action**   | **Target**               | **Level**  | **Explanation**                                                                 |  
|--------------|--------------------------|------------|---------------------------------------------------------------------------------|  
| **Compare**   | Audio type (Single/Multi)| Group      | Heatmap cells compare recs for each audio type per genre.                      |  
| **Identify**  | Outlier genres           | Individual | Color intensity highlights genres with disproportionate audio preferences.     |  
| **Discover**  | Cross-genre patterns     | Grouped    | Reveals if certain genres favor multi-audio.                      |  

### For Question 3 (`Bar_Chart_Race.html`)  
| **Action**   | **Target**          | **Level**  | **Explanation**                                                                 |  
|--------------|---------------------|------------|---------------------------------------------------------------------------------|  
| **Track**     | Temporal trends     | Series     | Animation shows cumulative genre growth over years.                             |  
| **Filter**    | Genre subsets       | Group      | Checkbox filters enable focused analysis on selected genres.                    |  
| **Query**     | Year-genre pairs    | Cell       | Tooltip displays exact cumulative counts for any year-genre combination.        |  

---

## Q3. Semantics of the Action/Target pairs 

1. **Compare**  
   - Donut chart (`Donut_Chart_Recommendations.html`) contrasts genre percentages  
   - Heatmap (`Heatmap_Audio_Recommendations.html`) compares single vs. multi-audio by genre  

2. **Identify**  
   - Bar race (`Bar_Chart_Race.html`) highlights fastest-growing genres  
   - Heatmap (`Heatmap_Audio_Recommendations.html`) color saturation spots audio preference outliers  

3. **Query**  
   - All visualizations implement `mouseover` tooltips  
   - `Bar_Chart_Race.html` adds interactive genre filters  

4. **Summarize**  
   - "Others" category in `Donut_Chart_Recommendations.html` consolidates minor genres  
   - Cumulative counts in `Bar_Chart_Race.html` show macro trends  

---

### Q4. Additional insights into project objectives?

**Answer:**

- **Descriptive** (What happened?)  
 e.g., *What are the trends in Netflix genre popularity and recommendations?*
    - Bar Chart Race: Shows which genres grew most over time.
    - Donut Chart: Reveals distribution of recommendations 
    - Heatmap: Highlights which genres have more multi-audio recommendations 

- **Diagnostic** (Why is it happening?)  
  e.g., *Why some genres favor multi-audio?*
    - Heatmap: Highlights which genres have more multi-audio recommendations. The analysis is insufficient. It shows single vs. multi-audio recommendations, but does not explain why some genres favor multi-audio.
      
- **Prescriptive** (What to do next?)  
  e.g., *How can Netflix optimize content strategy based on this data?*
    - Heatmap: Suggests expanding multi-audio support for high-growth genres, which has low multi-audio counts but growing popularity.
    - Donut Chart: Recommends diversifying recommendations to highlight niche genres in user feeds.
    - Bar Chart Race: Proposes investing in emerging genres showing steady growth.
---
This structure ensures the project provides insights that are both visually intuitive and strategically useful, enabling the client to transition from observing user behavior to making more informed business decisions.

---

## 4. Initial Data Analysis

## Overview
Before designing the final visualizations, an exploratory data analysis (EDA) was conducted to understand the Netflix dataset's structure and relationships. This guided the selection of visual idioms and aggregation techniques.

## Findings Shaping the Three Selected Visualizations

1. **Animated Bar Chart Race (`Bar_Chart_Race.html`)**  
   - Shows cumulative genre trends over time  
   - Reveals dominance and growth patterns of different genres  
   - Allows tracking of genre popularity evolution  

2. **Donut Chart of Recommendations (`Donut_Chart_Recommendations.html`)**  
   - Displays the distribution of recommendations by genre  
   - Highlights the relative share of recommendations across genres  
   - Groups minor genres into "Others" for clearer visualization  

3. **Heatmap of Audio Recommendations (`Heatmap_Audio_Recommendations.html`)**  
   - Compares single vs. multi-audio recommendations across genres  
   - Uses color intensity to show relative recommendation counts  
   - Reveals genres where multi-audio content is more prevalent  

### Key Insights from Each Visualization:
- **Bar Chart Race**: Comedy and Drama consistently lead in cumulative counts  
- **Donut Chart**: Comedy and Drama account for the largest share of recommendations  
- **Heatmap**: Documentary and Kids genres show strong multi-audio preference  


# Basic Statistical Insights

| **Attribute**           | **Observation**                                                                 |
|-------------------------|----------------------------------------------------------------------------------|
| **Main Genre**          | 20 unique genres (e.g., Comedy, Drama, Action, etc.)                             |
| **Release Year**        | Range varies (data spans multiple years)                                        |
| **Single Audio Recs**   | Skewed distribution (some genres have very few)                                 |
| **Multi Audio Recs**    | Higher counts for genres like Comedy, Drama, Documentary                        |
| **Recommendations**     | Aggregated per genre in `Netflix_Data_new.csv`                                  |

---

###  Impact on Aggregation & Visualization Techniques

These statistics informed various design and aggregation decisions, including:

-  **Bar Chart Race**:  
  Cumulative counts were calculated over time to visualize genre growth trends.

-  **Donut Chart**:  
  Percentages were used to normalize recommendation counts for genre distribution clarity.

-  **Heatmap**:  
  Compared single vs. multi-audio recommendations across genres to highlight content accessibility trends.

--- 

#  Exploratory Aggregations and What They Revealed

1. **Cumulative Genre Counts Over Time (`Bar_Chart_Race.html`)**  
   - Grouped by Release Year and Main Genre  
   - Calculated cumulative counts to show growth trends.  
   - **Result:** Identified dominant genres (e.g., Comedy, Drama) and trends over time.

2. **Recommendation Distribution by Genre (`Donut_Chart_Recommendations.html`)**  
   - Aggregated Recommendations per genre  
   - Computed percentages and grouped minor genres (<3%) into "Others."  
   - **Result:** Highlighted genres with the highest recommendation influence (e.g., Comedy, Drama).

3. **Single vs. Multi-Audio Recommendations (`Heatmap_Audio_Recommendations.html`)**  
   - Compared Single Audio Recs vs. Multi Audio Recs per genre  
   - Used a heatmap to show relative performance.  
   - **Result:** Revealed genres where multi-audio content is more recommended (e.g., Kids, Documentary).


#  How This Supports Munzner’s Methodology

## WHAT — Understanding the Data Attributes

| **Attribute**           | **Type**               | **How It Was Used**                                                                 |
|-------------------------|------------------------|--------------------------------------------------------------------------------------|
| Main Genre              | Nominal                | X-axis in heatmap, Donut chart segments, bar chart race categories                    |
| Single Audio Recs       | Quantitative (Ratio)   | Heatmap color encoding                                                              |
| Multi Audio Recs        | Quantitative (Ratio)   | Heatmap color encoding                                                              |
| Recommendations         | Quantitative (Count)   | Donut chart size, bar chart race height                                               |
| Release Year            | Ordered Temporal       | Bar chart race timeline                                                              |

---

## WHY — Clarifying Client-Centered Tasks

| **Task**     | **Applied Where**                  | **Purpose**                                                                       |
|--------------|------------------------------------|------------------------------------------------------------------------------------|
| Compare      | Heatmap, Donut Chart                 | Compare single vs. multi-audio recs, genre-wise recommendation share               |
| Discover     | Bar Chart Race                     | Identify genre trends over time                                                    |
| Query        | Interactive Filters (`Bar_Chart_Race.html`) | Allow genre selection and 2D/3D toggle                              |
| Summarize    | Donut Chart                          | Show high-level recommendation distribution                                       |

> **Example:** The heatmap revealed that Documentary and Kids genres have significantly more multi-audio recommendations, suggesting localization efforts.

---

## HOW — Visual Encodings Informed by Data Shape

| **Finding from EDA**                      | **Visual Design Choice**                                                          |
|------------------------------------------|------------------------------------------------------------------------------------|
| Skewed recommendation counts             | Used percentages in Donut chart                                                     |
| Temporal trends in genre popularity      | Animated bar chart race with cumulative counts                                    |
| Comparison of two audio types            | Heatmap with color gradient                                                       |
| Many genres with varying volumes         | Filtering in bar chart race                                                       |

> **Example:** The bar chart race was chosen after observing that some genres (e.g., Comedy, Drama) dominate over time, while others grow slowly.

---

##  Role of EDA in Project Methodology

This phase served as both technical preparation and a strategic design driver. EDA enabled:

- Informed visual encoding choices (e.g., heatmap for comparisons, Donut chart for distributions).
- Discovery of key trends (e.g., multi-audio preferences in certain genres).
- Effective use of interactivity (filters, tooltips, animations) to enhance insights.

By grounding visualization decisions in EDA, the final deliverables became not just charts but tools for actionable insights.

---

# 5. Visualization Design Choices

## Visualization 1: Animated Bar Chart Race with Genre Filtering

**Title:** Cumulative Netflix Genre Count Over Years  
**Type:** Interactive Bar Chart Race (D3.js)

**Visualization Link:** [Bar_Chart_Race](./Bar_Chart_Race.html)

Professor emphasized on adding something new to this design to make it unique, as discussed `genre filter` and `2D/3D toggle view` are implemented.

---

### a. HOW Analysis

This visualization was designed to show the evolution of genre popularity over time using an animated bar chart race. It implements a temporal idiom to display changing rankings and cumulative counts of Netflix genres across years.

**MARKS:**  
- **Rectangles:** Represent genres, with length encoding cumulative count  
- **Text labels:** Show genre names and exact values  
- **Emoji icons:** Enhance genre recognition  

**CHANNELS:**

| **Channel** | **Encoding**                    | **Why It Was Used**                          |
|-------------|----------------------------------|-----------------------------------------------|
| Position    | Genre (Y-axis), Count (X-axis)   | Accurate for quantitative comparison          |
| Length      | Bar length = cumulative count    | Intuitive magnitude representation            |
| Color hue   | Distinct color per genre         | Quick visual differentiation                  |
| Time        | Animation through years          | Shows temporal progression naturally          |
| Text        | Genre labels and values          | Provides precise reference points             |

This chart allows viewers to:

- Track genre popularity trends over time
- Compare growth rates between genres
- Focus on specific genres using filters

This visualization aligns with Munzner's principle of "Expressiveness and Effectiveness" by using position/length for quantitative data, color for categories, and animation for temporal trends while prioritizing accurate perception.

---

### b. Addressed Actions/Targets

*Referencing Question 3 in the WHY analysis: How has the cumulative popularity of genres evolved over time?*


| WHY Action | Target           | Used? | How It's Represented |
|------------|------------------|-------|----------------------|
| Track      | Temporal trends  | Yes   | Animation shows cumulative genre growth year-by-year (bars race over time) |
| Filter     | Genre subsets    | Yes   | Checkbox filters allow isolating specific genres (e.g., Comedy + Drama vs. niche genres) |
| Query      | Year-genre pairs | Yes   | Tooltips display exact cumulative counts for any genre at any year |
| Compare    | Genre growth rates | Yes  | Bar lengths and positions enable direct comparison of growth trajectories |
| Discover   | Emerging trends  | Yes   | Animation reveals genres gaining/losing dominance (e.g., Anime's rise over time) |

---

### c. HOW Methods Applied

| **Design Feature**         | **Method**                                                   |
|----------------------------|--------------------------------------------------------------|
| Bar chart race layout      | Cumulative aggregation of genre counts by year              |
| Color encoding             | Ordinal color scale for genre differentiation               |
| Animation system           | Year-by-year transitions with smooth interpolation          |
| 3D effect                  | Pseudo-3D rendering with perspective polygons                |
| Genre filtering            | Interactive checkbox system for dynamic data filtering      |
| Playback controls          | Custom player with play/pause and scrubbing functionality   |

---

### d. Idioms and Channel Justification

| **Element**         | **Justification (WHY + WHAT)**                                      |
|---------------------|----------------------------------------------------------------------|
| Bar chart race       | Ideal for showing ranking changes over time                        |
| Length encoding      | Accurate perception of quantitative values     |
| Color                | Helps distinguish between many competing genres                    |
| Animation            | Natural way to represent temporal progression                      |
| 3D toggle            | Provides alternative viewing perspective without losing core meaning |

---

### e. Design Inspiration and Personal Contribution

This visualization is inspired by bar chart race concepts. The whole key design points of the animated bar chart race visualization:

- Temporal Encoding
    - Uses animation to show genre popularity evolution year-by-year
    - Cumulative counts highlight long-term trends (not just yearly fluctuations)

- Comparative Layout
    - Bars ordered by current rank (Y-axis)
    - Length (X-axis) encodes quantitative values for easy comparison

- Interactive Filtering
    - Checkbox system to isolate specific genres
    - Preserves context while focusing analysis

- Dual-View Display
    - Toggle between 2D and pseudo-3D perspectives
    - Maintains core readability in both views

- Annotated Tooltips
    - On-hover details show exact counts + year
    - Provides precision without visual clutter

- Playback Control
    - Play/pause/scrub timeline for user-directed exploration
    - Allows both overview and frame-by-frame analysis

- Visual Hierarchy
    - Rainbow color scale distinguishes genres
    - Emoji + text labels enhance genre recognition
    - Year display prominently anchored

- Data Aggregation
    - Cumulative counts avoid misleading spikes/dips
    - "Others" category manages long-tail genres 

---

### f. Originality and Extension

This visualization builds on bar chart race concepts, but `significant enhancements has been added`. In addition, all the code, logic, structure, and data pre- prosessing are developed by me. The significant enhancements that differentiate this design :

- **Genre filtering system:** Fully custom interactive filter panel  
- **3D/2D toggle:** Unique perspective switching capability  
- **Emoji integration:** Creative use of symbols for better genre recognition  
- **Custom animation controls:** Beyond basic D3 transitions  
- **Responsive design:** Adapts to different screen sizes  


---

### g. Comments

**(1) Goal Alignment:**  
- Answers the question: *How has genre popularity evolved over time on Netflix?*  
- Supports both high-level trend analysis and detailed genre-specific investigation  
- Enables Compare, Discover, and Query actions effectively  

**(2) Pros and Cons:**

**Pros** 
-  Engaging, intuitive animation
-  Multiple viewing options (2D/3D)
-  Comprehensive filtering
-  Combines visual and numeric data

**Cons**    
- Performance impact with many genres
- 3D view may slightly distort perception
- when deselected all cannot be applied

**(3) Suggested Improvements:**

- Adding a "speed control" for animation playback  
- Include a small multiples view for side-by-side genre comparison  
- Add trend lines or annotations for significant events  
- Implement genre highlighting across all views (for example- the top genre will be highlighted over time)


The design successfully achieved its goal of showing genre popularity trends over time. The animation effectively communicates how different genres grow in cumulative counts, while the filtering functionality allows users to focus on specific genres of interest. The 2D/3D toggle provides alternative viewing perspectives, enhancing the exploratory nature of the visualization.

---


## Visualization 2: Animated Donut Chart with Data Table

**Title:** Distribution of Netflix Recommendations by Genre  
**Type:** Interactive Donut Chart with Supplementary Table (D3.js)

**Visualization Link:** [Netflix Recommendations by Genre](./Donut_Chart_Recommendations.html)

---

### a. HOW Analysis

This visualization presents the proportional distribution of Netflix recommendations across genres using a Donut chart idiom enhanced with animations and a supporting data table. The donut chart variant was implemented instead of a standard pie chart to improve readability and make the visual more creative. 

**MARKS:**  
- **Arcs:** Represent genres, with angle encoding recommendation count  
- **Text labels:** Show genre names and percentages  
- **Table rows:** Provide exact numeric values for each genre  

**CHANNELS:**

| **Channel** | **Encoding**                     | **Why It Was Used**                            |
|-------------|----------------------------------|-------------------------------------------------|
| Angle       | Recommendation count proportion  | Traditional for part-to-whole relationships     |
| Color hue   | Distinct color per genre         | Quick visual differentiation                    |
| Area        | Arc size corresponds to value    | Supports proportional comparison                |
| Position    | Table rows for exact values      | Provides precise reference points               |
| Text        | Genre labels and percentages     | Enhances readability and precision              |


This chart allows viewers to:

- Understand the proportional distribution of recommendations across genres
- Compare dominant vs. niche genres at a glance
- Identify exact percentages through interactive tooltips and data tables
- Focus on key genres by collapsing long-tail categories into "Others"

This Donut chart visualization aligns with Munzner's principle of "Expressiveness and Effectiveness" by using angle encoding for proportions, categorical colors for genres, and text labels for precise values while maintaining perceptual accuracy.


---

### b. Addressed Actions/Targets

*Referencing Question 1: Which genres receive the most recommendations, and how are they distributed?*


| WHY Action | Target           | Used? | How It's Represented |
|------------|------------------|-------|----------------------|
| Summarize      | Genre distribution  | Yes   | Donut chart slices show percentage share of total recommendations per genre |
| Compare     | Genre performance    | Yes   | Relative sizes of genre segments   |
| Identify      | Dominant/niche genres | Yes   | Large slices (Comedy/Drama) vs. "Others" category highlight high/low performers. |
| Query    | Specific genre stats | Yes  | Tooltips and table cells provide exact counts/percentages for focused analysis. |
| Discover   | Recommendation bias  | Yes   | Disproportionate slice sizes reveal algorithmic preferences (e.g., Comedy dominance). |

---

### c. HOW Methods Applied

| **Design Feature**       | **Method**                                                   |
|--------------------------|--------------------------------------------------------------|
| Donut chart layout         | Angle encoding of proportional distribution                 |
| Color encoding           | Spectral color scale for distinct genre differentiation     |
| Animation system         | Smooth arc growth transitions                               |
| Data aggregation         | Grouping of minor genres (<3%) into "Others" category       |
| Table integration        | Synchronized data presentation in multiple formats          |
| Interactive tooltips     | Detailed hover information without cluttering visualization |

---

### d. Idioms and Channel Justification

| **Element**    | **Justification (WHY + WHAT)**                                        |
|----------------|------------------------------------------------------------------------|
| Donut chart       | Conventional choice for proportional part-to-whole relationships     |
| Angle encoding  | Direct mapping of values to visual proportions                       |
| Color           | Helps distinguish between many competing genres                      |
| Table           | Provides exact numeric values that are hard to judge visually        |
| Animation       | Guides viewer through data construction and builds narrative         |

---

### e. Additional Design Inspiration and Personal Contribution

This visualization combines elements from observableHQ's animated Donut charts and business dashboard best practices. Key design points:

- Proportional Encoding
    - Uses angle/area to show recommendation distribution percentages
    - Animated construction builds understanding of composition

- Dual-Representation
    - Donut chart for quick visual patterns
    - Data table for precise numeric comparison

- Smart Aggregation
    - Groups sub-3% genres into "Others" category
    - Preserves minor genres' collective impact without clutter

- Interactive Elements
    - Tooltips reveal exact counts/percentages on hover
    - Slice highlighting (opacity/stroke change) for focus

- Annotated Tooltips
    - On-hover details show exact counts + year
    - Provides precision without visual clutter

- Visual Hierarchy
    - Spectral color scale maximizes genre differentiation
    - Label prioritization: Only shows % for large enough slices

- Data Optimization
    - Pre-sorted by recommendation count (descending)
    - Percentage rounding to one decimal for readability

- Animation Technique
    - Staggered growth (each slice animates sequentially)
    - Smooth transitions via d3.interpolate

- Comparative Features
    - Table provides ranked view (genre performance)
    - Donut slices allow relative size comparison

---

### f. Originality and Extension

This visualization builds on Donut chart concepts, but `significant enhancements has been added`. In addition, all the code, logic, structure, and data pre- prosessing are developed by me. The significant enhancements that differentiate this design :

- **Animated transitions:** Smooth growth animations for each segment  
- **"Others" category:** Intelligent aggregation of small segments  
- **Dual presentation:** Combined chart and table for different reading styles  
- **Interactive elements:** Hover effects and detailed tooltips  
- **Responsive design:** Adapts to different screen sizes  


---

### g. Comments

**(1) Goal Alignment:**  
- Answers the question: *"How are Netflix recommendations distributed across genres?"*  
- Supports both quick proportional understanding and detailed value lookup  
- Enables Summarize, Compare, and Lookup actions effectively  

**(2) Pros and Cons:**

**Pros** 

- Clear proportional representation
- Dual presentation (chart + table)
- Engaging animations 
- Comprehensive data aggregation

**Cons**    
- Angle perception less precise than length
- Many colors may challenge colorblind users
- Limited to static time period 


**(3) Suggested Improvements:**

- Add a time period selector to show trends  
- Implement click-to-explode segments  
- Include a color legend for reference  
- Add export functionality for the table data  


The visualization effectively communicates the proportional distribution of recommendations across genres. The combination of Donut chart and data table provides both quick overview and precise values. The "Others" category successfully handles long-tail genres without cluttering the visualization.

---



## Visualization 3: Genre Audio Recommendations Heatmap

**Title:** Audio Recommendations Heatmap: Single vs Multi-Audio by Genre  
**Type:** Interactive Heatmap with Legend (D3.js)

**Visualization Link:** [Genre Audio Recommendations](./Heatmap_Audio_Recommendations.html)

*I have mentioned about the data preprocessing of this design in the "File Pipeline" section*

---

### a. HOW Analysis

This visualization compares recommendation counts between single and multi-audio content across genres using a heatmap idiom with color intensity encoding.

**MARKS:**  
- **Rectangles:** Represent genre-audio type combinations  
- **Axis labels:** Show genres and audio types  
- **Legend:** Explains color scale  

**CHANNELS:**

| **Channel**       | **Encoding**                               | **Why It Was Used**                            |
|-------------------|---------------------------------------------|-------------------------------------------------|
| Position          | Genre (Y-axis), Audio Type (X-axis)        | Clear categorical organization                  |
| Color intensity   | Recommendation count                       | Effective for magnitude comparison              |
| Text              | Axis labels and values                     | Provides exact reference points                 |
| Size              | Consistent cell dimensions                 | Fair comparison across categories               |



This chart allows viewers to:

- Compare recommendation patterns between Single vs. Multi-Audio formats across genres
- Identify genre-specific biases (e.g., which genres favor one format over another)
- Spot outliers (unusually high/low recommendations for specific genre-format pairs)

This visualization aligns with Munzner's principle of "Expressiveness and Effectiveness" by using encode values with color intensity, compare formats via position, ensure clarity through sorting/contrast, and prioritize accessibility

---

### b. Addressed Actions/Targets

*Referencing Question 2: How do single-audio vs. multi-audio recommendations vary by genre?*


| WHY Action | Target           | Used? | How It's Represented |
|------------|------------------|-------|----------------------|
| Compare      | Audio type performance  | Yes   | Side-by-side heatmap cells show Single vs. Multi-Audio counts per genre. |
| Identify     | Localization opportunities    | Yes   | Darker blue cells highlight genres with strong multi-audio preference (e.g., Documentary). |
| Discover      | Cross-genre patterns | Yes   | Vertical comparison reveals Genres with strong multi-audio presence  |
| Query    | Exact recommendation counts | Yes  | Tooltips display precise numbers when hovering any cell. |
| Summarize   | Genre-level preferences  | Yes   | Row-wise color patterns show overall audio strategy per genre. |

---


### c. HOW Methods Applied

| **Design Feature**        | **Method**                                                   |
|---------------------------|--------------------------------------------------------------|
| Heatmap layout            | Matrix organization of genres × audio types                 |
| Color encoding            | Sequential blue gradient for quantitative values            |
| Interactive tooltips      | Detailed counts on hover                                    |
| Legend implementation     | Custom gradient bar with scale                              |
| Axis design               | Clear labeling for both dimensions                          |

---

### d. Idioms and Channel Justification

| **Element**       | **Justification (WHY + WHAT)**                                      |
|-------------------|---------------------------------------------------------------------|
| Heatmap           | Ideal for comparing two categorical dimensions with a quantitative measure |
| Color gradient    | Intuitive representation of value magnitude (darker = more)         |
| Fixed cell size   | Ensures fair visual comparison across categories                    |
| Tooltips          | Provides exact values without cluttering the visualization          |

---

### e. Additional Design Inspiration and Personal Contribution
The key design elements to enhance for Audio Recommendations Heatmap by Genre, focusing on clarity, visual appeal, and effective data communication:

- Color as Primary Encoder
    - Use a single-hue sequential palette (e.g., light blue → dark blue) to represent recommendation counts.
    - Ensure high contrast between low and high values (avoid subtle gradients).
    - Add a white stroke around cells to separate them visually.

- Clear Axes & Labels
    - Y-axis (Genres): Sort genres by total recommendations (descending) for intuitive scanning.
    - X-axis (Audio Types): Label clearly ("Single" vs. "Multi") with centered headings.
    - Font hierarchy: Bold axis titles, lighter tick labels.

- Minimalist Grid
    - Groups sub-3% genres into "Others" category
    - Preserves minor genres' collective impact without clutter

- Direct Labeling (When Helpful)
    - Add text labels inside cells only for extreme highs/lows (e.g., top 3 genres).
    - Ensure labels are high-contrast (white on dark cells, black on light cells).

- Interactive Highlights
    - Darken hovered cell + fade others slightly.
    - Show a tooltip with exact values and % of total.

- Legend as a Key
    - Place legend near the heatmap (right or top).
    - Label it clearly (e.g., "Recommendations Count").
    - Use gradient blocks (not just a line) for intuitive scaling.

- White Space & Balance
    - Pad margins to avoid crowding (especially left axis for long genre names).
    - Keep the title/subtitle concise but descriptive (e.g., "Audio Recommendations by Genre").

---

### f. Originality and Extension

This visualization builds on Heatmap concepts, but `significant enhancements has been added`. In addition, all the code, logic, structure, and data pre- prosessing are developed by me. The significant enhancements that differentiate this design :


- **Custom legend implementation:** Precise color-value mapping  
- **Clean axis labeling:** Clear genre and audio type identification  
- **Responsive tooltips:** Contextual information on demand  
- **Visual hierarchy:** Emphasizes data over decorative elements  
- **Color choice:** Blue gradient optimized for value perception  


---

### g. Comments

**(1) Goal Alignment:**  
- Answers the question: *"How do single and multi-audio recommendations compare across genres?"*  
- Supports quick pattern recognition and detailed value lookup  
- Enables Compare, Discover, and Lookup actions effectively  

**(2) Pros and Cons:**

**Pros**    
- Clear visual patterns
- Precise value encoding
- Uncluttered design
- Immediate insight generation 

**Cons**    
- Color perception varies among viewers
- Less effective for exact value comparison

---

**(3) Suggested Improvements:**

- Add sorting functionality for genres  
- Include absolute/relative view toggle  
- Implement row highlighting on hover   
- Implement zoom for genres with small values  
- Consider small bar charts within cells for precise comparison 


The heatmap successfully compares single vs. multi-audio recommendations across genres. The color gradient effectively shows patterns, while tooltips provide exact values. The clean layout and legend make the visualization immediately understandable.

---



#  File & Data Pipeline Summary(for Marker's ease of navigation)
## Overview of the design files' whole process and features

### Design 1 **`Bar_Chart_Race`**
- Loaded `Netflix_Data_new.csv` with D3

- Processed data by:
    - Aggregating counts by `Main Genre` and `Release Year`
    - Calculating cumulative totals over time
    - Implementing genre filters through checkboxes

- Built features:
    - Animated bar race with playback controls
    - 2D/3D view toggle
    - Interactive tooltips with yearly data
    - Genre filtering system

### Design 2 **`Donut_Chart_Recommendations`**
- Loaded `Netflix_Data_new.csv` with D3

- Processed data by:
    - Parsing Recommendations column to count per genre
    - Calculating percentages of total
    - Grouping minor genres (<3%) into "Others"

- Created features:
    - Animated Donut chart with grow-in transition
    - Interactive tooltips with detailed metrics
    - Supporting data table with raw values

### Design 3 **`Heatmap_Audio_Recommendations.html`**
- `data_preparation3.ipynb`:
    - Loaded raw `Netflix_Data_new.csv` using Pandas
    - Cleaned data by removing rows with missing audio/genre values

- Processed audio tracks:
    - Created Audio Count column by splitting Original Audio strings
    - Split data into single-audio (Audio Count == 1) and multi-audio (Audio Count > 1) groups

- Aggregated counts by genre:
    - Calculated single-audio recommendations per genre
    - Calculated multi-audio recommendations per genre
    - Merged and exported clean data as `genre_audio_recommendations.csv`



- `genre_audio_recommendations.csv`:
    - Structure:
        - `Main Genre`: Category labels (text).
        - Single Audio Recs: Integer counts.
        - Multi Audio Recs: Integer counts.
        - Contains no missing values (filled zeros during merge).

- `Heatmap_Audio_Recommendations.html`:
    - Loaded pre-processed CSV (`genre_audio_recommendations.csv`)with D3.js

- Data flow:
    - Parse numeric columns (+d["Single Audio Recs"])
    - Map genres → Y-axis, audio types → X-axis
    - Encode counts via sequential blue color scale

- Created features:
    - Hover tooltips showing raw counts
    - White gridlines for cell separation
    - Legend for value interpretation

Each visualization follows a clean load → process → render pipeline using D3.js for the visualization layer. The CSVs serve as the raw data sources with minimal preprocessing needed before visualization.

---

## Extra mention: 

Although my initial proposal focused on genre popularity trends and recommendation networks, my final project shifted toward analyzing genre distributions, recommendation patterns, and audio-language preferences in Netflix’s content catalog. This change was intentional and motivated by:

**Dataset Focus Shift**

- Proposal: Planned to analyze genre popularity, recommendation networks, and maturity ratings.

- Final Project: Narrowed focus to `genre trends`, `recommendation distributions`, and `audio-language analysis` (dropped maturity ratings).

- Why: The original maturity rating question lacked actionable insights for the client, while audio-language data directly supported Netflix’s global content strategy.

**Visualization Redesign**

- Proposal: Included vague ideas like a Netflix logo visual (deemed low-value in feedback).

- Final Project: Implemented three interactive, data-driven visualizations:

    - Bar Chart Race: Showed genre growth over time (adds temporal tracking).
    
    - Donut Chart: Quantified recommendation share by genre (simplifies comparison).
    
    - Heatmap: Compared single vs. multi-audio recommendations (reveals localization trends).

- Why: Professor's feedback noted the need for clearer HOW methodology (marks/channels) and client-aligned questions. The final designs explicitly tie to Munzner’s framework.

**Methodology Rigor**

- Proposal: Lacked detailed WHAT-WHY-HOW analysis.

- Final Project: Full Munzner breakdowns for each visualization, including:

    - Action/Target pairs (e.g., Compare genre performance).
    
    - Channel justifications (e.g., color for categories, length for values).

- Why: Feedback emphasized the need to link design choices to data attributes (WHAT) and client tasks (WHY).

**Client-Centric Refinement**

- Proposal: Generic questions (e.g., "popular genres over time").

- Final Project: Focused questions like "How do single-audio vs. multi-audio recommendations vary by genre?"

- Why: Feedback highlighted the need for data-specific questions that drive actionable insights (e.g., optimizing dubbing resources).


The revisions resulted in a more cohesive, client-focused project with clear methodological grounding.

---

## 6. Conclusions

### Success:
Achieved all client objectives:  
1. Revealed Drama's dominance over time (V1)  
2. Showed Drama as most recommended (V2)  
3. Identified Comedy as most multilingual (V3)  

### Common Successes:
- All visualizations effectively address their core questions  
- Each includes appropriate interactivity for exploration  
- Consistent design language across visualizations

---

### Methodology:
- Munzner's WHAT-WHY-HOW was critical for design choices  
- Most useful:
    - Action/Target pairs clarified client needs
    - Data abstraction decisions
    - Idiom selection
- Revision: More iterative testing for interactivity  

## Most Useful Elements:

- Expressiveness/Effectiveness: The designs align well here (e.g., heatmap uses sequential colors for quantitative data, bar race uses length for values).

- Prioritization of Channels: Position > Color > Text in all charts follows Munzner’s ranking.

- Interaction Costs: Tooltips provide details-on-demand, reducing clutter.

## Areas for Refinement:

- Abstraction/Agreement: The heatmap’s blue gradient assumes users intuitively associate darkness with higher values—a diverging scale might better highlight Single/Multi differences.

- Tabular Data: Missing in heatmap (e.g., a sorted table of genres by recommendation bias could supplement the visual).

## Revised Approach:

- Applying "overview first, zoom/filter, details-on-demand" more rigorously (e.g., heatmap could add a "Top 10" toggle).

- Using diverging scales for comparative views (e.g., red/blue for Single/Multi bias in heatmap).

---

### Shared Improvement Areas:
- Enhanced accessibility considerations  
- More consistent interaction patterns  
- Additional view customization options

---


# Attributions

| Source | What is it | How used |
|--|--|--|
| https://www.kaggle.com/datasets/willianoliveiragibin/netflix-interactive | Public dataset listing   | Main Dataset
| https://www.cs.ubc.ca/~tmm/vadbook/ | Textbook on visualization design principles.  | Guided the "what," "why," and "how" analysis for designing visualizations
| Course Materials: COMP 6934 Data Visualization | Lecture slides and instructor guidance  | Helped align project work with course expectations and Munzner methodology. |
| chatgpt.com | Documentation   | the script's grammer and modify professionally
| https://www.linkedin.com/feed/update/urn:li:activity:7252238223265939456/ | Bar Race   | Design inspiration
| https://d3js.org/ | D3 offcial website   | d3 code guideline