### Notebook info:
> **Movie Streaming Analysis** <br/>
> *Movies_Streaming_Analysis.ipynb* Version 1.0 <br/>
> Last updated in: September 1st, 2021; by Luiz Gustavo Fagundes Malpele. <br/>

<br/>
<div class="alert alert-block alert-success">

### To-Do:

**High-priority:**
- [X] Organize the Data (9/8/2021)
- [X] Preprocessing (9/8/2021)
- [ ] Exploratory Data Analysis (EDA)


**Modeling:**
    
- [ ] Investigate Possibilities for the Dataset


**Streamlit:**
- [ ] Begin the User Interface


    
</div>
<br/><hr/>

<br/>

### Package/library dependencies:

- **matplotlib**, for plots and graphs
- **numpy**, for float-point ranges
- **plotly**, for plotting aesthetics
- **pandas**, for reading json files into data frames
- **datetime**, for time related operations

In [46]:
#import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
from datetime import datetime, timedelta
import plotly.express as px 
import plotly.graph_objects as go
from plotly.subplots import make_subplots

<br/><hr/>
## **Initializations**

In [13]:
movies_data_path = '../data/movies_streaming_platforms.csv'
movies_cleaned_data_path = '../data/movies_streaming_platforms_cleaned.csv'

<br/>

### Importing **Functions** library:

In [12]:
%run -i ../libraries/Preprocessing_Library.ipynb

<br/><hr/>
## **Data Acquisition**

In [None]:
#Applies all preprocessing step to the raw DataFrame
#movies_data = prepare_movies_dataframe(path = movies_data_path, to_csv = True)

In [22]:
#Reads the cleaned DataFrame directly
movies_data = read_cleaned_movies_dataframe(path = movies_cleaned_data_path)

In [23]:
movies_data

Unnamed: 0_level_0,title,year,age,imdb,rotten_tomatoes,netflix,hulu,prime_video,disney,directors,genres,country,language,runtime
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,The Irishman,2019,18+,7.8,98.0,True,False,False,False,['Martin Scorsese'],"['Biography', 'Crime', 'Drama']",['United States'],"['English', 'Italian', 'Latin', 'Spanish', 'Ge...",209.0
1,Dangal,2016,7+,8.4,97.0,True,False,False,False,['Nitesh Tiwari'],"['Action', 'Biography', 'Drama', 'Sport']","['India', 'United States', 'United Kingdom', '...","['Hindi', 'English']",161.0
2,David Attenborough: A Life on Our Planet,2020,7+,9.0,95.0,True,False,False,False,"['Alastair Fothergill', 'Jonathan Hughes', 'Ke...","['Documentary', 'Biography']",['United Kingdom'],['English'],83.0
3,Lagaan: Once Upon a Time in India,2001,7+,8.1,94.0,True,False,False,False,['Ashutosh Gowariker'],"['Drama', 'Musical', 'Sport']","['India', 'United Kingdom']","['Hindi', 'English']",224.0
4,Roma,2018,18+,7.7,94.0,True,False,False,False,['Other'],"['Action', 'Drama', 'History', 'Romance', 'War']","['United Kingdom', 'United States']",['English'],52.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9510,Most Wanted Sharks,2020,,,14.0,False,False,False,True,['Other'],"['Crime', 'Reality-TV']",['United States'],"['Greek', 'English']",
9511,Doc McStuffins: The Doc Is In,2020,,,13.0,False,False,False,True,['Chris Anthony Hamilton'],['Animation'],['United States'],['English'],23.0
9512,Ultimate Viking Sword,2019,,,13.0,False,False,False,True,['Other'],['Other'],['United States'],['Other'],
9513,Hunt for the Abominable Snowman,2011,,,10.0,False,False,False,True,['Dan Oliver'],"['Drama', 'History']",['Other'],['Other'],


<br/><hr/>
## **Summary Statistics**

In [24]:
# Null Values
movies_data.isnull().sum()

title                 0
year                  0
age                4177
imdb                206
rotten_tomatoes       7
netflix               0
hulu                  0
prime_video           0
disney                0
directors             0
genres                0
country               0
language              0
runtime             319
dtype: int64

In [26]:
# Summary Statistics for continuous variables
movies_data.describe()

Unnamed: 0,year,imdb,rotten_tomatoes,runtime
count,9515.0,9309.0,9508.0,9196.0
mean,2007.422386,6.156311,53.545015,95.199435
std,19.130367,1.163573,13.197673,29.654047
min,1914.0,1.1,10.0,1.0
25%,2006.0,5.5,44.0,85.0
50%,2015.0,6.3,52.0,95.0
75%,2018.0,7.0,62.0,109.0
max,2021.0,9.8,98.0,566.0


<br/><hr/>
## **Exploratory Data Analysis**

In [37]:
movies_data['rotten_tomatoes'][movies_data['netflix'] == True]
movies_data['rotten_tomatoes'][movies_data['disney'] == True]
movies_data['rotten_tomatoes'][movies_data['hulu'] == True]
movies_data['rotten_tomatoes'][movies_data['prime_video'] == True]

index
116     82.0
155     80.0
158     80.0
184     79.0
185     79.0
        ... 
8610    13.0
8611    13.0
8612    12.0
8613    12.0
8614    12.0
Name: rotten_tomatoes, Length: 4113, dtype: float64

In [68]:
color_netflix = '#E50914'
color_hulu = '#3DBB3D'
color_prime_video = '#00A8E1'
color_disney = '#332765'

In [83]:
def plot_scores_per_platform(movies_data:pd.DataFrame):
    '''
    Plots Histograms for Scores Movies' Distribution
    '''
    #Get color palette
    color_light_blue, color_dark_blue, color_red, color_gray = get_color_palette()
    
    fig_scores = make_subplots(rows=2, cols=1,
                               subplot_titles=('Boxplot of Rotten Tomatoes Scores', 'Boxplot of IMDB Scores'),
                               #shared_xaxes=True,
                               vertical_spacing = 0.2)
    
    #Creates Histogram for the distribution of IMDB Scores
    fig_scores.add_trace(go.Box(x = movies_data['rotten_tomatoes'][movies_data['netflix'] == True],
                                marker_color = color_netflix,
                                #opacity = 0.85,
                                showlegend = False,
                                name = 'Netflix'), row=1, col=1) # Row 1, Column 1
    
    fig_scores.add_trace(go.Box(x = movies_data['rotten_tomatoes'][movies_data['disney'] == True],
                                marker_color = color_disney,
                                #opacity = 0.85,
                                showlegend = False,
                                name = 'Disney+'), row=1, col=1) # Row 1, Column 1
    
    fig_scores.add_trace(go.Box(x = movies_data['rotten_tomatoes'][movies_data['hulu'] == True],
                                marker_color = color_hulu,
                                opacity = 0.85,
                                showlegend = False,
                                name = 'Hulu'), row=1, col=1) # Row 1, Column 1

    fig_scores.add_trace(go.Box(x = movies_data['rotten_tomatoes'][movies_data['prime_video'] == True],
                                marker_color = color_prime_video,
                                #opacity = 0.85,
                                showlegend = False,
                                name = 'Prime Video'), row=1, col=1) # Row 1, Column 1
        
    #Creates Histogram for the distribution of Rotten Tomato Scores
    fig_scores.add_trace(go.Box(x = movies_data['imdb'][movies_data['netflix'] == True],
                                marker_color = color_netflix,
                                #opacity = 0.85,
                                showlegend = False,
                                name = 'Netflix'), row=2, col=1) # Row 1, Column 1
    
    fig_scores.add_trace(go.Box(x = movies_data['imdb'][movies_data['disney'] == True],
                                marker_color = color_disney,
                                #opacity = 0.85,
                                showlegend = False,
                                name = 'Disney+'), row=2, col=1) # Row 1, Column 1
    
    fig_scores.add_trace(go.Box(x = movies_data['imdb'][movies_data['hulu'] == True],
                                marker_color = color_hulu,
                                #opacity = 0.85,
                                showlegend = False,
                                name = 'Hulu'), row=2, col=1) # Row 1, Column 1
    
    fig_scores.add_trace(go.Box(x = movies_data['imdb'][movies_data['prime_video'] == True],
                                marker_color = color_prime_video,
                                #opacity = 0.85,
                                showlegend = False,
                                name = 'Prime Video'), row=2, col=1) # Row 1, Column 1
    
    #Update Y-axis Labels for figure 1
    fig_scores.update_xaxes(title_text='Critics\' Score', row=1, col=1)
    
    #Update Y-axis Labels for figure 2
    fig_scores.update_xaxes(title_text='Critics\' Score', row=2, col=1)
    
    #Standard Figure Layout for Data Visualization
    fig_scores.update_layout(
        dict(
            height=700, 
            width=1000,
            plot_bgcolor = "#F1F1F3",
            paper_bgcolor = 'white',
            #xaxis_tickformat = '%d %B <br>%Y',
            title = 'Boxplot of Critics\' scores per Streaming Platform'))
    
    #Returns Fig Scores
    return fig_scores

In [61]:
def get_color_palette():
    '''
    Standard color palette used for visualization and UI
    '''
    #Standard Color Palette
    color_light_blue = '#0194ED'
    color_dark_blue = '#294D6E'
    color_red = '#FF494E'
    color_gray = '#F0F9FF'
    return color_light_blue, color_dark_blue, color_red, color_gray

In [84]:
plot_scores_per_platform(movies_data)