### Notebook info:
> **Movie Streaming Data Visualization** <br/>
> *Movies_Streaming_Analysis.ipynb* Version 1.0 <br/>
> Last updated in: September 15th, 2021; by Luiz Gustavo Fagundes Malpele. <br/>

<br/>
<div class="alert alert-block alert-success">

### To-Do:

**High-priority:**
- [ ] Generate a Data Visualization Template
- [ ] Generate Histograms for the main quantitave variables

**Streamlit:**
- [ ] Begin the User Interface


    
</div>
<br/><hr/>

<br/>

### Package/library dependencies:

- **matplotlib**, for plots and graphs
- **numpy**, for float-point ranges
- **plotly**, for plotting aesthetics
- **pandas**, for reading json files into data frames
- **datetime**, for time related operations

In [1]:
#import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
from datetime import datetime, timedelta
import plotly.express as px 
import plotly.graph_objects as go
from plotly.subplots import make_subplots

<br/><hr/>
## **Initializations**

In [4]:
movies_data_path = '../data/movies_streaming_platforms.csv'
movies_cleaned_data_path = '../data/movies_streaming_platforms_cleaned.csv'

<br/>

### Importing **Functions** library:

In [5]:
%run -i ../libraries/Preprocessing_Library.ipynb

<br/><hr/>
## **Data Acquisition**

In [None]:
#movies_data = prepare_movies_dataframe(path = movies_data_path, to_csv = True)

In [6]:
movies_data = read_cleaned_movies_dataframe(path = movies_cleaned_data_path)

In [22]:
movies_data

Unnamed: 0_level_0,title,year,age,imdb,rotten_tomatoes,netflix,hulu,prime_video,disney,directors,genres,country,language,runtime
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,The Irishman,2019,18+,7.8,7.8,True,False,False,False,['Martin Scorsese'],"['Biography', 'Crime', 'Drama']",['United States'],"['English', 'Italian', 'Latin', 'Spanish', 'Ge...",209.0
1,Dangal,2016,7+,8.4,8.4,True,False,False,False,['Nitesh Tiwari'],"['Action', 'Biography', 'Drama', 'Sport']","['India', 'United States', 'United Kingdom', '...","['Hindi', 'English']",161.0
2,David Attenborough: A Life on Our Planet,2020,7+,9.0,9.0,True,False,False,False,"['Alastair Fothergill', 'Jonathan Hughes', 'Ke...","['Documentary', 'Biography']",['United Kingdom'],['English'],83.0
3,Lagaan: Once Upon a Time in India,2001,7+,8.1,8.1,True,False,False,False,['Ashutosh Gowariker'],"['Drama', 'Musical', 'Sport']","['India', 'United Kingdom']","['Hindi', 'English']",224.0
4,Roma,2018,18+,7.7,7.7,True,False,False,False,['Other'],"['Action', 'Drama', 'History', 'Romance', 'War']","['United Kingdom', 'United States']",['English'],52.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9510,Most Wanted Sharks,2020,,,,False,False,False,True,['Other'],"['Crime', 'Reality-TV']",['United States'],"['Greek', 'English']",
9511,Doc McStuffins: The Doc Is In,2020,,,,False,False,False,True,['Chris Anthony Hamilton'],['Animation'],['United States'],['English'],23.0
9512,Ultimate Viking Sword,2019,,,,False,False,False,True,['Other'],['Other'],['United States'],['Other'],
9513,Hunt for the Abominable Snowman,2011,,,,False,False,False,True,['Dan Oliver'],"['Drama', 'History']",['Other'],['Other'],


<br/><hr/>
## **Data Visualization**

In [29]:
def get_color_palette():
    '''
    Standard color palette used for visualization and UI
    '''
    #Standard Color Palette
    color_light_blue = '#0194ED'
    color_dark_blue = '#294D6E'
    color_red = '#FF494E'
    color_gray = '#F0F9FF'
    return color_light_blue, color_dark_blue, color_red, color_gray

In [41]:
def plot_scores_distribution(movies_data:pd.DataFrame):
    '''
    Plots Histograms 
    '''
    #Get color palette
    color_light_blue, color_dark_blue, color_red, color_gray = get_color_palette()
    
    fig_scores = make_subplots(rows=1, cols=2,
                               subplot_titles=('Distribution of IMDB Scores', 'Distribution of Rotten Tomato Scores'),
                               #shared_xaxes=True,
                               vertical_spacing = 0.05)
    
    #Creates Histogram for the distribution of IMDB Scores
    fig_scores.add_trace(go.Histogram(x = movies_data['imdb'],
                                      marker_color= color_red,
                                      opacity=0.85), 
                         row=1, col=1) # Row 1, Column 1
    
    #Creates Histogram for the distribution of Rotten Tomato Scores
    fig_scores.add_trace(go.Histogram(x=movies_data['rotten_tomatoes'],
                                      marker_color= color_dark_blue,
                                      opacity=0.85), 
                         row=1, col=2) # Row 1, Column 2
    
    #Update Y-axis Labels for figure 1
    fig_scores.update_yaxes(title_text='Frequency', row=1, col=1)
    
    #Update Y-axis Labels for figure 2
    fig_scores.update_yaxes(title_text='Frequency', row=1, col=2)
    
    #Standard Figure Layout for Data Visualization
    fig_scores.update_layout(
        dict(
            height=600, 
            width=1000,
            plot_bgcolor = "#F1F1F3",
            paper_bgcolor = 'white',
            #xaxis_tickformat = '%d %B <br>%Y',
            title = 'Frequency Distribution of Critics\' scores'))
    
    #Returns Fig Scores
    return fig_scores

In [42]:
plot_scores_distribution(movies_data = movies_data)