Task 1: Trend Analysis: Interactive visualization can be used to analyze the trend of content added to Netflix over the years. Users can explore the number of movies and TV shows added each year, distribution by genres, and the change in content from different countries. This would help in understanding Netflix's content strategy and user preferences over time. 


Visualization Method: Line charts showing the number of movies and TV shows added each year.
Tool Used: Plotly, a Python library for interactive and graphical plotting.

Plotly Line Charts for Trend Analysis (Task 1):

Original Example: Plotly Line Charts
Adaptation: Modified the data source to the Netflix dataset and plotted the number of movies and TV shows added over the years.

In [2]:
import pandas as pd
import plotly.graph_objs as go


# Load the dataset
file_path = "./archive/netflix_titles.csv" 
netflix_data = pd.read_csv(file_path)


# Convert 'date_added' to datetime and extract the year
netflix_data['date_added'] = pd.to_datetime(netflix_data['date_added'])
netflix_data['year_added'] = netflix_data['date_added'].dt.year

# Count the number of movies and TV shows added each year
content_trend = netflix_data.groupby(['year_added', 'type']).size().unstack(fill_value=0).reset_index()

# Create traces for the interactive plot
trace_movies = go.Scatter(
    x=content_trend['year_added'],
    y=content_trend['Movie'],
    mode='lines+markers',
    name='Movies',
    line=dict(color='royalblue')
)
trace_tvshows = go.Scatter(
    x=content_trend['year_added'],
    y=content_trend['TV Show'],
    mode='lines+markers',
    name='TV Shows',
    line=dict(color='firebrick')
)

# Layout for the plot
layout = go.Layout(
    title='Netflix Content Added Over the Years',
    xaxis=dict(title='Year'),
    yaxis=dict(title='Number of Titles Added'),
    hovermode='closest'
)

# Combine traces and layout into a figure
fig = go.Figure(data=[trace_movies, trace_tvshows], layout=layout)

# Display the interactive plot
fig.show()


Task 2: Interactive Content Recommendation: Interactive visualization can aid in recommending movies and TV shows to users based on their preferences. By allowing users to select their favorite genres, actors, or directors, the visualization can recommend content that matches their interests. This can enhance user engagement and satisfaction with the platform.

Visualization Method: Interactive dashboard with dropdown menus for user preferences and a section displaying recommended content.
Tool Used: Dash by Plotly for creating interactive web applications.

Dash Interactive Dashboard for Recommendations (Task 2):

Original Example: Dash Interactive Python Dashboard
Adaptation: Utilized dropdown components for selecting preferences and a callback function to filter and display Netflix titles based on user selections.

In [3]:
import dash
from dash import dcc, html, Input, Output
import pandas as pd
import plotly.express as px

# Load your dataset
netflix_data = pd.read_csv('./archive/netflix_titles.csv')  

# Initialize the Dash app
app = dash.Dash(__name__)

# App layout
app.layout = html.Div([
    html.H1("Netflix Content Recommender"),
    dcc.Dropdown(
        id='genre-dropdown',
        options=[{'label': genre, 'value': genre} for genre in netflix_data['listed_in'].unique()],
        multi=True,
        placeholder="Select a Genre"
    ),
    dcc.Dropdown(
        id='actor-dropdown',
        options=[{'label': actor, 'value': actor} for actor in set([a for sublist in netflix_data['cast'].dropna().str.split(', ') for a in sublist])],
        multi=True,
        placeholder="Select an Actor"
    ),
    html.Div(id='output-container')
])

# Callback to update output container based on selected inputs
@app.callback(
    Output('output-container', 'children'),
    [Input('genre-dropdown', 'value'),
     Input('actor-dropdown', 'value')]
)
def update_output(selected_genres, selected_actors):
    filtered_data = netflix_data
    if selected_genres:
        filtered_data = filtered_data[filtered_data['listed_in'].isin(selected_genres)]
    if selected_actors:
        filtered_data = filtered_data[filtered_data['cast'].apply(lambda x: any(actor in x for actor in selected_actors) if pd.notna(x) else False)]
    return [html.Div([html.H3(row['title']), html.P(row['description'])]) for index, row in filtered_data.iterrows()]

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)


ImportError: cannot import name 'escape' from 'jinja2' (/Users/rahulpaulgopireddy/opt/anaconda3/lib/python3.9/site-packages/jinja2/__init__.py)

Task 3: Geographical Analysis: Interactive maps can be used to visualize the distribution of Netflix content across different countries. Users can explore the popularity of content in different regions, allowing Netflix to tailor its content offerings based on geographical preferences. This can also provide insights into the global reach and impact of Netflix's content library.

Visualization Method: Interactive map showing the distribution of Netflix content across different countries.
Tool Used: Plotly for creating interactive maps.

Plotly Maps for Geographical Analysis (Task 3):

Original Example: Plotly Maps
Adaptation: Used the Netflix dataset to map the count of titles available from different countries, adjusting parameters to represent the data accurately.

For Task 3, the geographical distribution required custom aggregation to count titles by country. This involved preprocessing Netflix data to associate each title with its respective country and then plotting this data on a world map. The map's color scale represents the number of titles, providing an intuitive visual of Netflix's global reach.

In [4]:
import pandas as pd
import plotly.express as px

# Load the dataset
netflix_data = pd.read_csv('./archive/netflix_titles.csv')

# Preprocess the data
# Splitting the 'country' column into individual countries and counting the occurrences
country_data = netflix_data['country'].str.split(', ', expand=True).stack()
country_count = country_data.value_counts().reset_index()
country_count.columns = ['country', 'count']

# Create the map
fig = px.choropleth(
    country_count,
    locations="country",
    locationmode='country names',
    color="count",
    hover_name="country",
    color_continuous_scale=px.colors.sequential.Plasma,
    title="Distribution of Netflix Content Across Different Countries"
)

# Show the figure
fig.show()


2c. Visualization Program Generation
For each task, the program involves data preprocessing and visualization:

a. For trend analysis, data was grouped by year and type, then plotted using Plotly's line chart functions.
b. For interactive recommendations, a Dash app was created with dropdowns for genres and actors, with the output updating based on user selections.
c. For geographical analysis, the dataset was aggregated by country, and a choropleth map was created using Plotly to visualize the number of titles per country.

Each program was written in Python, utilizing pandas for data manipulation and Plotly/Dash for visualization.

2d. Overall Evaluation

The chosen visualization methods are appropriate for the data features and the tasks at hand. Line charts effectively show trends over time, making them ideal for the trend analysis task. An interactive dashboard with filters allows users to tailor the recommendations to their preferences, engaging them more effectively. Lastly, choropleth maps are excellent for displaying geographical distributions, providing clear insights into Netflix's global content distribution.