# Netflix Content Analysis

## Final Project Part 3
### IS 445 (Data Visualization)
### Group Members - Akshat Gupta, Rohan Jain

### Introduction

Netflix is one of the most popular media and video streaming platforms. They have over 8000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. This tabular dataset consists of listings of all the movies and tv shows available on Netflix, along with details such as - cast, directors, ratings, release year, duration, etc.

Below are the attributes of the data sets:
1. show_id: Uniquely identifies the row 
2. type: Tells whether the content is TV Show or the Movie
3. title: Name of the TV Show or the Movie
4. director: Name of the director the TV Show or the Movie
5. cast: Provides List of cast who worked in the TV Show/Movie
6. country: Name of the country in which the content was directed/produced.
7. date_added: Date when the content was added to Netflix.
8. release_year: The year in which content was released
9. rating: Rating of the content
10. duration: Duration of the content in terms of minutes if its a Movie or in number of Seasons in case it is a TV Show
11. description: Description about the content.

Data Type of each column:
1. show_id: string
2. type: string
3. title: string
4. director: string
5. cast: string
6. country: string
7. date_added: string but will convert into data object afterwards
8. release_year: integer
9. rating: string
10. duration: string
11. description: string

### About Rows 
Each row describes above attributes about the content hosted on Netflix.

### About Data
1. What is the "name" of the dataset? <br>
Ans. Netflix Movies and TV Shows 

2. Where did you obtain it? <br>
Ans. Kaggle 

3. Where can we obtain it? (i.e., URL) <br>
Ans. https://www.kaggle.com/datasets/shivamb/netflix-shows?resource=download 

4. What is the license of the dataset? What are we allowed to do with it? <br>
Ans. Lincense: CC0: Public Domain, We can modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below.

5. How big is it in file size and in items? <br>
Ans. Size:3.4 MB <br>
No of rows: 8807

In [1]:
#Importing all the required libraries
%matplotlib inline
import numpy as np 
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import bqplot.pyplot
import ipywidgets
import ipywidgets as widgets
from ipywidgets import interactive
import matplotlib.colors as mcolors
from IPython.core.display import HTML

In [2]:
#Reading the Data
nfData = pd.read_csv('netflix_titles.csv')

#Dispalying all the Data
nfData

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


In [3]:
# Performing Data cleaning

# 1. Dropping the duplicates
nfData = nfData.drop_duplicates()

# 2. Replacing all the null/Nan values with No Data
nfData = nfData.replace(np.nan, 'No Data')

# 3. Removing min and season terms from the Duration. Later on I will use type column to identify the duration unit
#nfData = nfData.replace(regex=['^.'],value='HE')
nfData['duration'] = nfData['duration'].replace(regex=['min$'], value='')
nfData['duration'] = nfData['duration'].replace(regex=['Seasons$'], value='')
nfData['duration'] = nfData['duration'].replace(regex=['Season$'], value='')
nfData['duration'] = nfData['duration'].replace('No Data', 0)
nfData = nfData.infer_objects()
nfData[['duration']] = nfData[['duration']].apply(pd.to_numeric)

In [4]:
# Creating two Datasets one for Movies and one for TV Shows
nfMData =  nfData[nfData['type']=='Movie']
nfTvData =  nfData[nfData['type']=='TV Show']

In [5]:
# Taking out List of rating from the given dataset
ratingList = nfData['rating'].unique()
ratingList

# Since '74 min', '84 min', '66 min' are not ratings so figuring out to clean the data
nfData['rating'] = nfData['rating'].replace(regex=['74 min'], value='No Data')
nfData['rating'] = nfData['rating'].replace(regex=['84 min'], value='No Data')
nfData['rating'] = nfData['rating'].replace(regex=['66 min'], value='No Data')

# Data with some Rating only
nfDataWithRating = nfData[nfData['rating']!='No Data']

# Interactive Chart
## About the chart:

In [6]:
nfMData1 = nfMData[((nfMData['country'] != 'No Data') & (nfMData['rating'] != 'No Data'))]
nfTVData1 = nfTvData[((nfTvData['country'] != 'No Data') & (nfTvData['rating'] != 'No Data'))]
n_rows = 1
n_cols = 1

def drawPlot2(contentType, xAxis): 

    figFinal, axes = plt.subplots()
    if(contentType == 'Movie'):
        axes = nfMData1[xAxis].value_counts()[:30].plot(kind = 'bar', figsize=(15, 10), color="#b20710")
        axes.set_title('No of Movies for each ' + xAxis, fontsize=16, fontfamily='serif')
    else:
        axes = nfTVData1[xAxis].value_counts()[:30].plot(kind = 'bar', figsize=(15, 10), color="#b20710")
        axes.set_title('No of TV Shows for each ' + xAxis, fontsize=16, fontfamily='serif')

    plt.ylabel('Counts', fontsize=14)
    plt.xlabel(xAxis, fontsize=14)

    plt.tight_layout()
    plt.show(figFinal)

plot = interactive(drawPlot2, 
                   contentType = widgets.Dropdown(value='Movie', options=['Movie', 'TV Show'], description='Content Type'),
                   xAxis = widgets.Dropdown(value='country', options=['country', 'rating'], description='X-axis')
                 )
plot

interactive(children=(Dropdown(description='Content Type', options=('Movie', 'TV Show'), value='Movie'), Dropd…

In [7]:
out1 = widgets.Output()

with out1:
    n_rows = 2
    n_cols = 2
    figFinal, axes = plt.subplots(n_rows, n_cols)

    #fig.delaxes(axes.flatten()[-2])
    figFinal.delaxes(axes.flatten()[1])

    # Plot 1
    moviePercentCount = nfMData['show_id'].count()*100/nfData['show_id'].count()
    tvPercentCount = nfTvData['show_id'].count()*100/nfData['show_id'].count()

    y = np.array([moviePercentCount, tvPercentCount])
    mylabels = ["Movies", "TV Shows"]
    mycolors = ["#b20710", "#221f1f"]
    myexplode = [0.15, 0]
    percentage_format = "%1.1f%%"

    patches, texts, pcts = axes[0,0].pie(
        y, labels=mylabels, autopct='%.1f%%',
        wedgeprops={'linewidth': 3.0, 'edgecolor': 'white'},
        textprops={'size': 'x-large'}, colors = mycolors, explode = myexplode)
    plt.setp(pcts, color='white', fontweight='bold')
    axes[0,0].set_title('Movie & TV Show distribution available on Netflix', fontsize=16, fontfamily='serif')


    # Plot 2
    axes[0,1] = nfData.release_year.value_counts()[:20].plot(kind = 'barh', figsize=(15, 15), color="#b20710")
    axes[0,1].set_title('Number of Movie & TV Show Released Each Year from 2002 - 2020', fontsize=16, fontfamily='serif')


    # Plot 3
    nfMData60orLess = nfMData.query('duration  <= 60')
    nfMData60toLess90 = nfMData.query('duration  > 60 & duration  <= 90')
    nfMData90toLess120 = nfMData.query('duration  > 90 & duration  <= 120')
    nfMData120More = nfMData.query('duration  > 120 ')

    nfMData60orLessProp = nfMData60orLess['show_id'].count()*100/nfMData['show_id'].count()
    nfMData60toLess90Prop = nfMData60toLess90['show_id'].count()*100/nfMData['show_id'].count()
    nfMData90toLess120Prop = nfMData90toLess120['show_id'].count()*100/nfMData['show_id'].count()
    nfMData120MoreProp = nfMData120More['show_id'].count()*100/nfMData['show_id'].count()
    mylabels = ["< 60 Mins ", "60-90 mins", "90-120 mins", "> 120 mins"]

    y = np.array([nfMData60orLessProp, nfMData60toLess90Prop, nfMData90toLess120Prop, nfMData120MoreProp])

    mycolors = ["#b20710", "#b20750", "#b20780", "#221f1f"]

    patches, texts, pcts = axes[1,0].pie(
        y, labels=mylabels, autopct='%.1f%%',
        wedgeprops={'linewidth': 2.0, 'edgecolor': 'white'},
        textprops={'size': 'x-large'}, colors = mycolors)
    plt.setp(pcts, color='white', fontweight='bold')
    axes[1,0].set_title('Distribution based on Movie Durations', fontsize=16, fontfamily='serif')


    # Plot 4
    nfTvDataGroupedbySeason = nfTvData[['duration', 'show_id']].groupby(['duration']).count().sort_values('duration')

    axes[1,1] = nfTvDataGroupedbySeason.plot.barh(figsize=(6, 6), color="#b20710")
    axes[1,1].set_title('Distribution based on Number of TV Show Seasons', fontsize=16, fontfamily='serif')
    plt.xlabel('Number of TV Shows'); 
    plt.ylabel('Number of Seasons') 

    figFinal.suptitle('Netflix Dashboard', fontsize=24, fontfamily='serif')
    plt.tight_layout()
    plt.show(figFinal) 


In [8]:
out2 = widgets.Output()

with out2:
    
    nfMData1 = nfMData[((nfMData['country'] != 'No Data') & (nfMData['rating'] != 'No Data'))]
    
    n_rows = 1
    n_cols = 1
    
    def drawPlot2(xAxis): 
       
        figFinal, axes = plt.subplots()

        axes = nfTVData1[xAxis].value_counts()[:30].plot(kind = 'bar', figsize=(15, 10), color="#b20710")
        axes.set_title('No of Movies for each ' + xAxis, fontsize=16, fontfamily='serif')

        plt.ylabel('Counts', fontsize=14)
        plt.xlabel(xAxis, fontsize=14)

        plt.tight_layout()
        plt.show(figFinal)
    
    plot = interactive(drawPlot2, xAxis = widgets.Dropdown(value='country', options=['country', 'rating'], description='X-axis'))
    display(plot)
    

In [9]:
nfTvData2 = nfTvData[((nfTvData['release_year'] >= 2000))]
nfTvDataPivot = nfTvData2.pivot_table(values='show_id', index = 'rating', columns= 'release_year', aggfunc= 'count', fill_value=0)

def drawHeatMap(color):
    # Using np.nanmin and np.nanmax to avoid all the nan values while evaluating the min and the max value
    # Intializing Color Scale for the heatmap
    col_sc = bqplot.ColorScale(scheme=color, min=float(np.nanmin(nfTvDataPivot)), max=float(np.nanmax(nfTvDataPivot)))
    x_sc = bqplot.OrdinalScale()
    y_sc = bqplot.OrdinalScale() 

    # Intializing X and Y axis for the Heatmap
    col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
    x_ax = bqplot.Axis(scale=x_sc, label='Release Year', tick_rotate=90, label_offset='50', offset={'scale':x_sc, 'value':-40})
    y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Rating', label_offset='-50')

    heatmap = bqplot.GridHeatMap(color=nfTvDataPivot.values, 
                                     row = nfTvDataPivot.index,
                                     column = nfTvDataPivot.columns, 
                                     scales={'color':col_sc, 'row':y_sc, 'column':x_sc}, 
                                     style={'opacity': 0.5},
                                     interactions={'click':'select'}, # This will add the interactivity  
                                     anchor_style={'fill':'white'}    # This will highlight the selected data in the heatmap with the provided color
                                 )

    # Intializing the heatmap
    heatmap_figure = bqplot.Figure(marks=[heatmap], 
                                   axes=[col_ax, x_ax, y_ax], 
                                   fig_margin={'top':50, 'bottom':50, 'left':100, 'right':100},
                                   title='No of TV Shows for the set of Rating and Release Year',
                                   title_style={'fontsize':'14', 'fontfamily':'serif'})
    
    heatmap_figure.axes[0].tick_style = {'text-anchor': 'start'}
    # Setting min width and height of the heatmap
    heatmap_figure.layout.min_width  = '400px'
    heatmap_figure.layout.min_height = '400px'
    #heatmap_figure.title('Netflix Dashboard')
    

    return heatmap_figure


# FINAL DASHBOARD

In [10]:
# Dasboard Intialization for Horizantal Elements based on Colors
#dashboard1= ipywidgets.HBox([figFinal])
dashboard= ipywidgets.VBox([drawHeatMap('YlGnBu')])

# Intialization of Tab for two different colors
tab_nest = widgets.Tab()
tab_nest.children = [out1, dashboard, out2]

titles = ['Overview', 'Tv Shows', 'Movies']

for i in range(len(titles)):
    tab_nest.set_title(i, str(titles[i]))
    
tab_nest

Tab(children=(Output(), VBox(children=(Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale(max=249…

## Explanation

There are three tabs which that can be selected by clicking on it. 

1. #### TAB 1 (Overview): 
The first tab shows multiple graphs showing some basis statistics on Movies and TV shows available on Netflix.
It has below components:
    1. #### Pie Chart (Movie & TV Show distribution available on Netflix):<br>
        This will show the distribution of contents on the Netflix.
    2. #### Pie Chart (Distribution based on Movie Durations): <br>
        This will show the distribution of Movie contents based on duration of movie. Ranges are less than 60 mins, 60 - 90 mins, 90 - 120 mins and more than 120 mins. 
    3. #### Horizantal bar graph (Number of Movie & TV Show Released Each Year from 2002 - 2020): <br>
        This will show the number of movies and TV shows available on Netflix each year between 2002 - 2020.
    4. #### Horizantal bar graph (Distribution based on number of TV shows seasons): <br>
        This will show the distribution of TV show contents based on number of seasons released on the Netflix for that show. The range TV show season range starts from the 1 to 17.

2. #### TAB 2 (TV Shows): 
    This tab will show some TV shows count statistics for Netflix. It has following components:
    1. #### HeatMap (No of TV Shows for the set of Rating and Release Year): 
        This will show number of TV shows for the set of rating and release year.

    2. #### Bar Chat (No of TV Shows/Country [For the set of Value selected in the Heat Map]): 
        This will show the number of TV shows for each county based on the value selected on the heatmap. Eg: User wanted to know number of Movies released available on Netflix for different countries in 2015 and have rating TV-MA, then user need to click on the cell in heatmap which represents year 2015 and rating TV-MA on the heatmap, after this this bar chart will be automatically updated with the results. 

3. #### TAB 3 (Movies): 
The third tab visualises number of movies available on Netflix either for each country in which it was originated or for each ratings . There is drop down that will help user to select what values he/she wants to populate, based on his/her selection the bar garph will be updated and will show the counts for the selected value in the drop down.


# Additional interactive plot

In [11]:
#Reading the Data
primeData = pd.read_csv('amazon_prime_titles.csv')
huluData = pd.read_csv('hulu_titles.csv')
disneyData = pd.read_csv('disney_plus_titles.csv')

# 1. Dropping the duplicates
primeData = primeData.drop_duplicates()
huluData = huluData.drop_duplicates()
disneyData = disneyData.drop_duplicates()

# 2. Replacing all the null/Nan values with No Data
primeData = primeData.replace(np.nan, 'No Data')
huluData = huluData.replace(np.nan, 'No Data')
disneyData = disneyData.replace(np.nan, 'No Data')

# Creating two Datasets one for Movies and one for TV Shows
primeMData =  primeData[primeData['type']=='Movie']
primeTvData =  primeData[primeData['type']=='TV Show']

huluMData =  huluData[huluData['type']=='Movie']
huluTvData =  huluData[huluData['type']=='TV Show']

disneyMData =  disneyData[disneyData['type']=='Movie']
disneyTvData =  disneyData[disneyData['type']=='TV Show']

n_rows=1
n_cols=2
out1 = widgets.Output()
out2 = widgets.Output()
out3 = widgets.Output()
out4 = widgets.Output()

with out1:
    # Plot 1
    figFinal, axes = plt.subplots(figsize=(4, 4))
    moviePercentCount = primeData['show_id'].count()*100/primeMData['show_id'].count()
    tvPercentCount = primeData['show_id'].count()*100/primeTvData['show_id'].count()

    y = np.array([moviePercentCount, tvPercentCount])
    mylabels = ["Movies", "TV Shows"]
    mycolors = ["#FF9900", "#000000"]
    myexplode = [0.15, 0]
    percentage_format = "%1.1f%%"

    patches, texts, pcts = axes.pie(
        y, labels=mylabels, autopct='%.1f%%',
        wedgeprops={'linewidth': 3.0, 'edgecolor': 'white'}, 
        textprops={'size': 'x-large'}, colors = mycolors, explode = myexplode, radius=2)
    plt.setp(pcts, color='white', fontweight='bold')
    plt.show(figFinal) 

with out2:
    # Plot 2
    figFinal, axes = plt.subplots(figsize=(4, 4))
    moviePercentCount = huluData['show_id'].count()*100/huluMData['show_id'].count()
    tvPercentCount = huluData['show_id'].count()*100/huluTvData['show_id'].count()

    y = np.array([moviePercentCount, tvPercentCount])
    mylabels = ["Movies", "TV Shows"]
    mycolors = ["#66aa33", "#221f1f"]
    myexplode = [0.15, 0]
    percentage_format = "%1.1f%%"

    patches, texts, pcts = axes.pie(
        y, labels=mylabels, autopct='%.1f%%',
        wedgeprops={'linewidth': 3.0, 'edgecolor': 'white'},
        textprops={'size': 'x-large'}, colors = mycolors, explode = myexplode, radius=2)
    plt.setp(pcts, color='white', fontweight='bold')
    plt.show(figFinal) 

with out3:
    # Plot 3
    figFinal, axes = plt.subplots(figsize=(4, 4))
    moviePercentCount = disneyMData['show_id'].count()*100/disneyData['show_id'].count()
    tvPercentCount = disneyTvData['show_id'].count()*100/disneyData['show_id'].count()

    y = np.array([moviePercentCount, tvPercentCount])
    mylabels = ["Movies", "TV Shows"]
    mycolors = ["#006e99", "#221f1f"]
    myexplode = [0.15, 0]
    percentage_format = "%1.1f%%"

    patches, texts, pcts = axes.pie(
        y, labels=mylabels, autopct='%.1f%%',
        wedgeprops={'linewidth': 3.0, 'edgecolor': 'white'},
        textprops={'size': 'x-large'}, colors = mycolors, explode = myexplode, radius=2)
    plt.setp(pcts, color='white', fontweight='bold')
    plt.show(figFinal) 
    
with out4:
    # Plot 4
    figFinal, axes = plt.subplots(figsize=(4, 4))
    moviePercentCount = nfMData['show_id'].count()*100/nfData['show_id'].count()
    tvPercentCount = nfTvData['show_id'].count()*100/nfData['show_id'].count()

    y = np.array([moviePercentCount, tvPercentCount])
    mylabels = ["Movies", "TV Shows"]
    mycolors = ["#b20710", "#221f1f"]
    myexplode = [0.15, 0]
    percentage_format = "%1.1f%%"

    patches, texts, pcts = axes.pie(
        y, labels=mylabels, autopct='%.1f%%',
        wedgeprops={'linewidth': 3.0, 'edgecolor': 'white'},
        textprops={'size': 'x-large'}, colors = mycolors, explode = myexplode, radius=2)
    plt.setp(pcts, color='white', fontweight='bold')
    plt.show(figFinal) 

tab_nest = widgets.Tab()
tab_nest.children = [out1, out2, out3, out4]

titles = ['Amazon', 'Disney', 'Hulu', 'Netflix']

for i in range(len(titles)):
    tab_nest.set_title(i, str(titles[i]))
    
display(HTML('<h2>Content Distribution for streaming service<h2/>'))
tab_nest

Tab(children=(Output(), Output(), Output(), Output()), _titles={'0': 'Amazon', '1': 'Disney', '2': 'Hulu', '3'…

# Contextual Visualizations

## Contextual Visualization 1

### About
Disney Plus users
Source: https://www.businessofapps.com/data/disney-plus-statistics/

In [12]:
display(HTML('<h2>Disney Plus Subscribers<h2/><div class="infogram-embed" data-id="993c1c39-afa5-45a6-bf63-470d57bd4fef" data-type="interactive" data-title="Disney+ subscriber count"></div><script>!function(e,i,n,s){var t="InfogramEmbeds",d=e.getElementsByTagName("script")[0];if(window[t]&&window[t].initialized)window[t].process&&window[t].process();else if(!e.getElementById(n)){var o=e.createElement("script");o.async=1,o.id=n,o.src="https://e.infogram.com/js/dist/embed-loader-min.js",d.parentNode.insertBefore(o,d)}}(document,0,"infogram-async");</script>'))

## Contextual Visualization 2

### About
Netflix subscribers
https://www.businessofapps.com/data/netflix-statistics/

In [13]:
display(HTML('<h2>Netflix Subscribers<h2/><div class="infogram-embed" data-id="3b00a2cc-0942-4bcf-9102-5429fd83cf60" data-type="interactive" data-title="Netflix subscribers"></div><script>!function(e,i,n,s){var t="InfogramEmbeds",d=e.getElementsByTagName("script")[0];if(window[t]&&window[t].initialized)window[t].process&&window[t].process();else if(!e.getElementById(n)){var o=e.createElement("script");o.async=1,o.id=n,o.src="https://e.infogram.com/js/dist/embed-loader-min.js",d.parentNode.insertBefore(o,d)}}(document,0,"infogram-async");</script>'))

## Contextual Visualization 3

### About
Source: https://www.businessofapps.com/data/hulu-statistics/

In [14]:
display(HTML('<h2>Streaming Service Subscribers<h2/><div class="infogram-embed" data-id="e7a570fe-ddc8-43e9-889a-44452fd18f1f" data-type="interactive" data-title="US video streaming users"></div><script>!function(e,i,n,s){var t="InfogramEmbeds",d=e.getElementsByTagName("script")[0];if(window[t]&&window[t].initialized)window[t].process&&window[t].process();else if(!e.getElementById(n)){var o=e.createElement("script");o.async=1,o.id=n,o.src="https://e.infogram.com/js/dist/embed-loader-min.js",d.parentNode.insertBefore(o,d)}}(document,0,"infogram-async");</script>'))



### Learning from this assignment
Since we both sat together and worked on it, it was a great learning experinence for both of us. Akshat had great suggestions in story-telling part of the project where he suggested how the analysis will give insights to the viewer. I learnt a lot on the color selection and the interactivity we have used in this project. How easy and intutive we can make this dashboard was our focus because we wanted it to be easy to use. I researched on various techniques that's been used currently, I learnt many interactivity functionalities as well. As one can also see, I made many attempts to extract insights from the dataset to find out the best way to describe what the story we wanted to show; this trial and error phase of the project taught me a lot about this particular dataset. This all also helped me while we were finding the other datasets form Amazon prime movies and hulu, since I knew what to look in it that would match and align with the project we are working on.

#### 1-2 paragraphs describing what things went according to your group work plan submitted in Part 2 and what things you'd like to do differently next time -- include in your text input