# <span style='color:White'><span style='background :Blue' > ALL TIME HIGHEST-GROSSING MOVIES WORLDWIDE ANALYSIS  </span></span> 
---

## <span style='color:White'><span style='background :Red' > Data Processing  </span></span>

For this part we need to make use of what we have previous done in *Data Collection* part. From that part we already have the variable top_movies, which contains the 1000 highest grossing films. In case we want a smaller sample we just need to use the function **movie_list( )**.

In [None]:
#%run Data_Collection.ipynb

In [None]:
#top_movies.head()

In [None]:
#top_movies.tail()

### <span style='color:White'><span style='background :Brown' > Data Transformation </span></span>  

The required data has been already downloaded using the function movie_list() and stored into dataframe **top_movies**. In this part we transform the data so working with it can be done more easily. It involves not only changing the type of variable but also creating new ones.

### <span style='color:White'><span style='background :Black' > Function: </span></span>  gross_number()

Create new column containing gross income in millions of USD as numbers, as they appear originally as strings.

In [None]:
def gross_number(table):
    """
    Transform a given vector from chr to float.
    The input should be in the form of '$XX,XXX,XXX.XX'
    The output would be in the form XX.XX (to represent MM USD)
    
    Example
    ----------
    Input.-  '$780,237,950' | Type: chr
    Output.- 780.24         | Type: float
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Input dataframe with additional column containing the gross income in MM USD.
    
    """
    #Create new column.
    table = table.copy(deep=True)
    table['Gross (MM USD)'] = None
    #Iterate through all items according to the index.
    for i in (table.index):
        table['Gross (MM USD)'][i] = round(float(table['Lifetime Gross'][i][1:].replace(',',''))/(10**6),2)
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span>  date_strptime()

Create new columns containing year and month of release as numbers, as they appear originally as strings.

In [None]:
from datetime import datetime

In [None]:
def date_strptime(table):
    """
    Transform date from chr to datetime.strptime.
    The input must be in format %d %b %Y, and the output format will be datetime.datetime(%Y, %m, %d, 0, 0)
    
    Example
    ----------
    Input.-  '01 Jan 1900'                       | Type: chr
    Output.- datetime.strptime(1900, 1, 1, 0, 0) | Type: datetime.strptime  
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Input dataframe with additional columns containing month and year .
    
    """
    #Create new column.
    table = table.copy(deep=True)
    table['Release Year'] = None
    table['Release Month'] = None
    #Iterate through all items according to the index.
    for i in (table.index):
        if table['Released'][i]!='N/A':
            table['Release Year'][i] = datetime.strptime(table['Released'][i], '%d %b %Y').year
            table['Release Month'][i]= datetime.strptime(table['Released'][i], '%d %b %Y').month
        else:
            table['Release Year'][i] = datetime.strptime('01 Jan 1900', '%d %b %Y').year
            table['Release Month'][i]= datetime.strptime('01 Jan 1900', '%d %b %Y').month
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span>  std_rating()

Create new columns containing rating equivalence as the API returns both MPAA Ratings System and TV Parental Guidelines Ratings System which complicates direct analysis.

In [None]:
import numpy as np

In [None]:
def std_rating(table):
    """
    Create new column with equivalent rating system.
    Allows to standarize all movies under same rating logic.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Input dataframe with additional column containing standarized rating.
    
    """
    #Create table of rating equivalences.
    table = table.copy(deep=True)
    rating_table = pd.DataFrame([['G', 'General Audience'], ['PG', 'Parental Guidance Suggested'], ['PG-13', 'Parents Strongly Cautioned'], ['R', 'Restricted'], ['NC-17', 'No Children 17 or Under'], ['TV-Y', 'General Audience'], ['TV-Y7', 'Parental Guidance Suggested'], ['TV-G', 'General Audience'], ['TV-PG', 'Parental Guidance Suggested'], ['TV-14', 'Parents Strongly Cautioned'], ['TV-MA', 'Restricted'], ['Approved', 'General Audience'], ['Passed', 'Restricted'], ['Not Rated', 'Not Rated'], ['N/A', 'Not Rated']], columns = ['Rated', 'Standarized Rating'])
    #Save movie index.
    index = table.index
    #Copy rating equivalent in a new column.
    table = table.merge(rating_table, how = 'left')
    #Restore index as merging resets it.
    table.index = index
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span>  runtime_number()

Create new column containing movie runtime as numbers, as they appear originally as strings.

In [None]:
def runtime_number(table):
    """
    Transform a given vector from chr to float.
    The input should be in the form of 'XXX min'
    The output would be in the form XXX (representing minutes)
    
    Example
    ----------
    Input.-  '190 min' | Type: chr
    Output.- 190       | Type: float
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Input dataframe with additional column containing the runtime in minutes.
    
    """
    #Create new column.
    table = table.copy(deep=True)
    table['Runtime (min.)'] = None
    #Iterate through all items according to the index.
    for i in (table.index):
        if (table['Runtime'][i]=='N/A'):
            #Average movie runtime as proxy.
            table['Runtime (min.)'][i] = 100 
        else:
            #Real runtime.
            table['Runtime (min.)'][i] = int(table['Runtime'][i].replace(' min',''))
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span>  imdbScore()

Create new column containing movie IMDb score as numbers, as they appear originally as strings.

In [None]:
def imdbScore(table):
    """
    Transform a given vector from chr to float.
    The input should be in the form of 'X.X'
    The output would be in the form X.X
    
    Example
    ----------
    Input.-  '5.8' | Type: chr
    Output.- 5.8   | Type: float
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Input dataframe with additional column containing the IMDb score.
    
    """    
    #Create new column.
    table = table.copy(deep=True)
    table['IMDb Score'] = None
    #Iterate through all items according to the index.
    for i in (table.index):
        if (table['imdbRating'][i]=='N/A'):
            #Average movie score as proxy.
            table['IMDb Score'][i] = 5
        else:
            #Real score.
            table['IMDb Score'][i] = float(table['imdbRating'][i])
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span> dataTransformation()

Transform dataset by including all previous functions, generating necessary columns in the process.

In [None]:
def dataTransformation(table):
    """
    Transform data by adding required columns.
    * Gross (MM USD)
    * Release Year
    * Release Month
    * Standarized Rating
    * Runtime (min.)
    * IMDb Score
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Dataframe with additional columns.
    
    """     
    table = table.copy(deep=True)
    #Apply all functions
    table = gross_number(table)
    table = date_strptime(table)
    table = std_rating(table)
    table = runtime_number(table)
    table = imdbScore(table)
    return table

We can check after that we now are able to get the required information to work with later.

In [None]:
#dataTransformation(top_movies).head()

In [None]:
#dataTransformation(top_movies).tail()

### <span style='color:White'><span style='background :Brown' > List Extraction </span></span>  

This part is to extract some information to see if some criteria is shared by multiple items for an specific field (column).

### <span style='color:White'><span style='background :Black' > Function: </span></span>  criteria_list()

Takes a vector with multiple information and returns one listing content individually. Take into account that it does not eliminate duplicates as this function is going to be used later on.

In [None]:
def criteria_list(table, criteria):
    """
    Retrieve indivudal data for specific criteria.
    
    Example
    ----------
    Input.-  ['Blue, White, Red']
    Output.- ['Blue', 'White', 'Red']
    
    Parameters
    ----------
    table: movie_list() dataframe output
    criteria: dataframe column ('Genre', 'Director', 'Writer', 'Actors', 'Language', 'Country')
    
    Result
    ----------
    List of data separated accordingly based on criteria
    
    """     
    test_list = []
    #Define valid criteria.
    fields = ['Genre', 'Director', 'Writer', 'Actors', 'Language', 'Country']
    #Verify if criteria is valid.
    if (criteria in fields)==True:
        #Store individual values.
        for i in (table.index):
            test_list.extend(table[criteria][i].split(', '))
    else:
        test_list = 'Please enter valid criteria.'
    return test_list

### <span style='color:White'><span style='background :Black' > Function: </span></span>  criteria_count()

We proceed not only to delete duplicates but also count how many time each value appears. This will later allow us to know if there is some trend or characteristic featured in most of the highest grossing films.

In [None]:
def criteria_count(table, criteria):
    """
    Count item appereances in specific column.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    criteria: dataframe column ('Genre', 'Director', 'Writer', 'Actors', 'Language', 'Country')
    
    Result
    ----------
    Dataframe containing items and counting of appereances.
    
    """    
    #Define valid criteria.
    fields = ['Genre', 'Director', 'Writer', 'Actors', 'Language', 'Country']
    #Verify if criteria is valid.
    if (criteria in fields)==True:
        #Count values and simplify list by eliminating duplicates.
        test_list = criteria_list(table, criteria)
        test_list = pd.DataFrame(test_list).value_counts()
        test_list = pd.DataFrame(test_list, columns=['Count']).reset_index()
        test_list.columns = [criteria, 'Count']
    else:
        test_list = 'Please enter valid criteria.'
    return test_list

### <span style='color:White'><span style='background :Black' > Function: </span></span>  search_list()

Auxiliary function that creates list to search for specific string in dataset.

In [None]:
def search_list(table, criteria):
    """
    Count item appereances in specific column.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    List containing all categories available for defined criteria.
    
    """    
    test_list = []
    #Retrieve data individually.
    for i in (table.index):
        test_list.extend(table[criteria][i].split(', '))
    #Delete duplicates and add null string element.
    test_list = np.unique(test_list)
    test_list = list(test_list)
    test_list.append('')
    test_list = sorted(test_list)
    return test_list

### <span style='color:White'><span style='background :Brown' > Data Search </span></span>  

This part is to extract items that contain specific information.

### <span style='color:White'><span style='background :Black' > Function: </span></span>  lcase_df()

Change all strings in dataframe to lowercase.

In [None]:
def lcase_df(table):
    """
    Lowercase all data in dataframe.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Change all content to lowercase.
    
    """
    #Extract fields we need to change letter case to lowercase.
    table = table.copy(deep=True)
    fields = table.columns.drop(['Lifetime Gross', 'imdbRating'])
    #Change strings to lowercase.
    for field in fields:
        table[field] = table[field].str.lower()
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span>  string_search()

Subset dataframe to show only rows containing specific string input.

In [None]:
def string_search(table, string):
    """
    Show only rows that contain string.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    string: string to search in dataframe
    
    Result
    ----------
    Subseted dataframe.
    
    """
    #Lowercase string.
    string = str(string)
    string = string.lower()
    #Look for string in dataframe.
    x = lcase_df(table.copy(deep=True))
    table = table.copy(deep=True)
    condition = x.apply(lambda row: row.astype(str).str.contains(string).any(), axis=1)
    table = table[condition]
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span>  multiple_search()

Search multiple fields and subset dataset according to selected criteria.

In [None]:
def multiple_search(table, min_gross, max_gross, min_year, max_year, min_month, max_month, std_rating, min_time, max_time, min_score, max_score, genre, director, writer, actor_1, actor_2, plot_keyword, language, country):
    """
    Show only rows that fit parameters.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    min_gross: Maximum gross income
    max_gross: Minimum gross income
    min_year: Minimum year of release
    max_year: Maximum year of release
    min_month: Minimum month of release
    max_month: Maximum month of release
    std_rating : Standarized rating
    min_time: Minimum running time
    max_time: Maximum running time
    min_score: Minimum IMDb Score
    max_score: Maximum IMDb Score
    genre: Genre
    director: Director it has to contain
    writer: Writer it has to contain
    actor_1: Actor/Actress it has to contain
    actor_2: Actor/Actress it has to contain
    plot_keyword: Keyword to look for in Plot
    language: Language
    country: Country
    
    Result
    ----------
    Subseted dataframe.
    
    """
    #Save column names to appear in output.
    table = table.copy(deep=True)
    columns = table.columns.drop(['Poster'])
    #Transform data to make data.
    table = dataTransformation(table)
    #Search conditions:
    #Gross (MM USD).-
    cond01 = table['Gross (MM USD)']>=min_gross
    cond02 = table['Gross (MM USD)']<=max_gross
    #Release.-
    cond03 = table['Release Year']>=min_year
    cond04 = table['Release Year']<=max_year
    cond05 = table['Release Month']>=min_month
    cond06 = table['Release Month']<=max_month
    #Standarized Rating.-
    cond07 = table['Standarized Rating'].str.contains(std_rating)
    #Runtime.-
    cond08 = table['Runtime (min.)']>=min_time
    cond09 = table['Runtime (min.)']<=max_time
    #IMDb Score.-
    cond10 = table['IMDb Score']>=min_score
    cond11 = table['IMDb Score']<=max_score
    #Genre.-
    cond12 = table['Genre'].str.contains(genre)
    #Director.-
    cond13 = table['Director'].str.contains(director)
    #Writer.-
    cond14 = table['Writer'].str.contains(writer)
    #Actors.-
    cond15 = table['Actors'].str.contains(actor_1)
    cond16 = table['Actors'].str.contains(actor_2)
    #Plot.-
    plot_keyword = str(plot_keyword)
    plot_keyword = plot_keyword.lower()
    cond17 = table['Plot'].str.lower().str.contains(plot_keyword)
    #Language.-
    cond18 = table['Language'].str.contains(language)
    #Country.-
    cond19 = table['Country'].str.contains(country)
    #Subset
    table=table.loc[(cond01)&(cond02)&(cond03)&(cond04)&(cond05)&(cond06)&(cond07)&(cond08)&(cond09)&(cond10)&(cond11)&(cond12)&(cond13)&(cond14)&(cond15)&(cond16)&(cond17)&(cond18)&(cond19),columns]
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span>  search_engine()

Create an interactive searching tool making use of previous functions and widgets.

In [None]:
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

In [None]:
def search_engine(dataframe):
    """
    Show only rows that fit parameters.
    Select blank space in dropdown lists to not filter category.
    It may take some time to properly update so please be patient.
    If filters are not compatible it will show nothing :(
    
    Parameters
    ----------
    dataframe: movie_list() dataframe output
    
    Result
    ----------
    Interactive widgets for subsetting dataframe
    
    """
    #Get all data available.
    dataframe = dataframe.copy(deep=True)
    aux_table = dataTransformation(dataframe)
    #Set parameters.
    #Some parameters do not need to be retrieved
    gross1 = int(min(aux_table['Gross (MM USD)'])/10)*10
    gross2 = int(max(aux_table['Gross (MM USD)'])/10+1)*10
    year1 = min(aux_table['Release Year'])
    year2 = max(aux_table['Release Year'])
    time1 = min(aux_table['Runtime (min.)'])
    time2 = max(aux_table['Runtime (min.)'])
    std_rates = ['', 'General Audience', 'Parental Guidance Suggested', 'Parents Strongly Cautioned', 'Restricted', 'No Children 17 or Under', 'Not Rated']
    genres = search_list(aux_table,'Genre')
    directors = search_list(aux_table,'Director')
    writers = search_list(aux_table,'Writer')
    actors = search_list(aux_table,'Actors')
    languages = search_list(aux_table,'Language')
    countries = search_list(aux_table,'Country')
    #Interactive widgets.
    #Call multiple_search() function.
    output = interact(multiple_search, table=fixed(dataframe), min_gross=(gross1, gross2), max_gross=(gross1, gross2), min_year=(year1, year2), max_year=(year1, year2), min_month=(1, 12), max_month=(1, 12), std_rating=std_rates, min_time=(time1, time2), max_time=(time1, time2), min_score=(0, 10), max_score=(0, 10), genre=genres, director=directors, writer=writers, actor_1=actors, actor_2=actors, plot_keyword='', language=languages, country=countries)
    return output

### <span style='color:White'><span style='background :Brown' > Data Visualization </span></span>  

With this part we may be able to visualize data by making plots/charts.

### <span style='color:White'><span style='background :Black' > Function: </span></span>  path_to_image_html()

Convert movie poster link to HTML tag so we can visualize it.

In [None]:
from IPython.display import Image
from IPython.core.display import HTML

In [None]:
def path_to_image_html(path):
    """
    Convert the given path/link of the image to HTML tag.
    The return type of the function is the traditional IMG tag, that we use to render images on a webpage.
    """
    return '<img src="'+ path + '" width="60" >'

### <span style='color:White'><span style='background :Black' > Function: </span></span>  recommendation_list()

Get movie recommendations based on parameters.

In [None]:
def recommendation_list(table, min_gross, max_gross, min_year, max_year, min_month, max_month, std_rating, min_time, max_time, min_score, max_score, genre, director, writer, actor_1, actor_2, plot_keyword, language, country):
    """
    Show recommendations that fit parameters.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    min_gross: Maximum gross income
    max_gross: Minimum gross income
    min_year: Minimum year of release
    max_year: Maximum year of release
    min_month: Minimum month of release
    max_month: Maximum month of release
    std_rating : Standarized rating
    min_time: Minimum running time
    max_time: Maximum running time
    min_score: Minimum IMDb Score
    max_score: Maximum IMDb Score
    genre: Genre
    director: Director it has to contain
    writer: Writer it has to contain
    actor_1: Actor/Actress it has to contain
    actor_2: Actor/Actress it has to contain
    plot_keyword: Keyword to look for in Plot
    language: Language
    country: Country
    
    Result
    ----------
    Subseted dataframe.
    
    """
    #Transform data to make data.
    table = table.copy(deep=True)
    table = dataTransformation(table)
    #Search conditions:
    #Gross (MM USD).-
    cond01 = table['Gross (MM USD)']>=min_gross
    cond02 = table['Gross (MM USD)']<=max_gross
    #Release.-
    cond03 = table['Release Year']>=min_year
    cond04 = table['Release Year']<=max_year
    cond05 = table['Release Month']>=min_month
    cond06 = table['Release Month']<=max_month
    #Standarized Rating.-
    cond07 = table['Standarized Rating'].str.contains(std_rating)
    #Runtime.-
    cond08 = table['Runtime (min.)']>=min_time
    cond09 = table['Runtime (min.)']<=max_time
    #IMDb Score.-
    cond10 = table['IMDb Score']>=min_score
    cond11 = table['IMDb Score']<=max_score
    #Genre.-
    cond12 = table['Genre'].str.contains(genre)
    #Director.-
    cond13 = table['Director'].str.contains(director)
    #Writer.-
    cond14 = table['Writer'].str.contains(writer)
    #Actors.-
    cond15 = table['Actors'].str.contains(actor_1)
    cond16 = table['Actors'].str.contains(actor_2)
    #Plot.-
    plot_keyword = str(plot_keyword)
    plot_keyword = plot_keyword.lower()
    cond17 = table['Plot'].str.lower().str.contains(plot_keyword)
    #Language.-
    cond18 = table['Language'].str.contains(language)
    #Country.-
    cond19 = table['Country'].str.contains(country)
    #Subset
    table=table.loc[(cond01)&(cond02)&(cond03)&(cond04)&(cond05)&(cond06)&(cond07)&(cond08)&(cond09)&(cond10)&(cond11)&(cond12)&(cond13)&(cond14)&(cond15)&(cond16)&(cond17)&(cond18)&(cond19), ['Title', 'Poster', 'imdbRating']]
    table = HTML(table.to_html(escape=False, formatters=dict(Poster=path_to_image_html)))
    return table

### <span style='color:White'><span style='background :Black' > Function: </span></span>  recommendation_engine()

Get movie recommendations.

In [None]:
def recommendation_engine(dataframe):
    """
    Show only rows that fit parameters.
    Select blank space in dropdown lists to not filter category.
    It may take some time to properly update so please be patient.
    If filters are not compatible it will show nothing :(
    
    Parameters
    ----------
    dataframe: movie_list() dataframe output
    
    Result
    ----------
    Interactive widgets for subsetting dataframe
    
    """
    #Get all data available.
    dataframe = dataframe.copy(deep=True)
    aux_table = dataTransformation(dataframe)
    #Set parameters.
    #Some parameters do not need to be retrieved
    gross1 = int(min(aux_table['Gross (MM USD)'])/10)*10
    gross2 = int(max(aux_table['Gross (MM USD)'])/10+1)*10
    year1 = min(aux_table['Release Year'])
    year2 = max(aux_table['Release Year'])
    time1 = min(aux_table['Runtime (min.)'])
    time2 = max(aux_table['Runtime (min.)'])
    std_rates = ['', 'General Audience', 'Parental Guidance Suggested', 'Parents Strongly Cautioned', 'Restricted', 'No Children 17 or Under', 'Not Rated']
    genres = search_list(aux_table,'Genre')
    directors = search_list(aux_table,'Director')
    writers = search_list(aux_table,'Writer')
    actors = search_list(aux_table,'Actors')
    languages = search_list(aux_table,'Language')
    countries = search_list(aux_table,'Country')
    #Interactive widgets.
    #Call multiple_search() function.
    output = interact(recommendation_list, table=fixed(dataframe), min_gross=(gross1, gross2), max_gross=(gross1, gross2), min_year=(year1, year2), max_year=(year1, year2), min_month=(1, 12), max_month=(1, 12), std_rating=std_rates, min_time=(time1, time2), max_time=(time1, time2), min_score=(0, 10), max_score=(0, 10), genre=genres, director=directors, writer=writers, actor_1=actors, actor_2=actors, plot_keyword='', language=languages, country=countries)
    return output

### <span style='color:White'><span style='background :Black' > Function: </span></span>  bar_plot()

Generate bar plot based on specific criteria. The total count may exceed the length of the database as some movie fields contain more than one item.

In [None]:
import matplotlib.pyplot as plt
import plotly.express as px

In [None]:
def bar_plot(table, criteria):
    """
    Bar plot based on certain criteria.
    It counts the appereances of certain items under said criteria.
    Amount of items is limited to 25.
    
    Criteria must be one of the following:
    
    ['Genre', 'Director', 'Writer', 'Actors', 'Language', 'Country', 'Standarized Rating']
    
    Parameters
    ----------
    table: movie_list() dataframe output
    criteria: One of the following: 'Genre', 'Director', 'Writer', 'Actors', 'Language', 'Country', 'Standarized Rating'
    
    Result
    ----------
    Bar plot
    
    """
    #Get relevant data.
    table = table.copy(deep=True)
    table = dataTransformation(table)
    #Plot.
    if criteria=='Standarized Rating':
        fig, ax = plt.subplots(figsize =(30, 20))
        ax.barh(pd.DataFrame(table['Standarized Rating'].value_counts()).index, pd.DataFrame(table['Standarized Rating'].value_counts())['Standarized Rating'])
    else:
        movie_year_list = criteria_count(table, criteria)
        #Limit number of items so only most relevant appear.
        if len(movie_year_list)>25:
            movie_year_list = movie_year_list[:25]
        else:
            movie_year_list = movie_year_list
        fig, ax = plt.subplots(figsize =(30, 20))
        ax.barh(movie_year_list[criteria], movie_year_list["Count"])
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.xaxis.set_tick_params(pad = 5.0)
    ax.yaxis.set_tick_params(pad = 7.5)
    ax.grid(b = True, color ='grey', linestyle ='-.', linewidth = 0.5, alpha = 0.5)
    ax.invert_yaxis()
    for i in ax.patches:
        plt.text(i.get_width()+0.2, i.get_y()+0.5, str(round((i.get_width()), 2)), fontsize = 10, fontweight ='bold', color ='gray')
    ax.set_title(f'Top-Grossing Movies - {criteria}', loc ='center', fontsize = 15)
    return plt.show()

### <span style='color:White'><span style='background :Black' > Function: </span></span>  data_barplot()

Applies interactive widgets to get barplot for different categories.

In [None]:
def data_barplot(table):
    """
    Bar plot based on certain criteria.
    It counts the appereances of certain items under said criteria.
    Amount of items is limited to 25.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Bar plot
    
    """
    table = table.copy(deep=True)
    #Criteria list:
    field = ['Genre', 'Director', 'Writer', 'Actors', 'Language', 'Country', 'Standarized Rating']
    #Apply widgets.
    output = interact(bar_plot, table=fixed(table), criteria=field)
    return output

### <span style='color:White'><span style='background :Black' > Function: </span></span> hist_plot()

Generate histogram based on specific criteria. 

In [None]:
import seaborn as sns

In [None]:
def hist_plot(table, genre, criteria):
    """
    Histogram based on certain criteria.
    Some items may repeat under more than one genre.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    genre: string describing one of the films genre
    criteria: One of the following ['Gross (MM USD)', 'Runtime (min.)', 'IMDb Score', 'Release Month', 'Release Year']
    
    Result
    ----------
    Histogram
    
    """
    #Get all data information.
    table = table.copy(deep=True)
    table = dataTransformation(table)
    #Subset according to genre.
    table = table[table['Genre'].str.contains(genre)]
    #Make plot.
    sns.set(rc={'figure.figsize':(15, 5)})
    plot = sns.distplot( a=table[criteria], hist=True, kde=False, rug=False ).set(title=f'Top-Grossing Movies - Genre: {genre} - {criteria}')
    return plot

### <span style='color:White'><span style='background :Black' > Function: </span></span> data_histogram()

Applies interactive widgets to get histogram for different genres and categories.  

In [None]:
def data_histogram(table):
    """
    Histogram based on certain criteria.
    Some items may repeat under more than one genre.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Histogram
    
    """
    table = table.copy(deep=True)
    #Criteria list:
    field = ['Gross (MM USD)', 'Runtime (min.)', 'IMDb Score', 'Release Month', 'Release Year']
    #Genres list:
    genres = search_list(table, 'Genre')
    #Apply widgets.
    output = interact(hist_plot, table=fixed(table), genre=genres, criteria=field)
    return output    

### <span style='color:White'><span style='background :Black' > Function: </span></span> geo_distribution()

Shows in map where the countries are from. The total sum may exceed the length of dataset as some films take place in more than one country. 

In [None]:
def geo_distribution(table):
    """
    Map with counter based on number of appereances in dataset.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    
    Result
    ----------
    Map with counter
    
    """
    table = table.copy(deep=True)
    #Get list of countries and count.
    movie_country_list = criteria_count(table, 'Country')
    movie_country_list.columns = ['country', 'Count']
    #Get global country list.
    np.random.seed(12)
    gapminder = px.data.gapminder().query("year==2007")
    gapminder
    #Generate map plot.
    df=pd.merge(gapminder, movie_country_list, how='left', on='country')
    fig = px.choropleth(df.sort_values(by=['Count']), locations="iso_alpha",
                        color="Count", 
                        hover_name="country")
    return fig.show()

### <span style='color:White'><span style='background :Black' > Function: </span></span> boxplot()

Generate boxplot based on specific criteria. 

In [None]:
def boxplot(table, genre, criteriaX, criteriaY):
    """
    Histogram based on certain criteria.
    Some items may repeat under more than one genre.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    genre: string describing one of the films genre
    criteriaX: One of the following ['Standarized Rating', 'Release Month', 'Release Year']
    criteriaY: One of the following ['Gross (MM USD)', 'Runtime (min.)', 'IMDb Score']
    
    Result
    ----------
    Boxplot based on selected criteria
    
    """
    #Get all data information.
    table = table.copy(deep=True)
    table = dataTransformation(table)
    #Subset according to genre.
    table = table[table['Genre'].str.contains(genre)]
    #Make plot.
    sns.set(rc={'figure.figsize':(30, 12)})
    plot = sns.boxplot(x=table[criteriaX], y=table[criteriaY]).set(title=f'Top-Grossing Movies - Genre: {genre} - {criteriaX}||{criteriaY}')
    return plot

### <span style='color:White'><span style='background :Black' > Function: </span></span> data_boxplot()

Applies interactive widgets to get boxplot based on specific criteria. 

In [None]:
def data_boxplot(table):
    """
    Histogram based on certain criteria.
    Some items may repeat under more than one genre.
    
    Parameters
    ----------
    table: movie_list() dataframe output
    genre: string describing one of the films genre
    criteriaX: One of the following ['Standarized Rating', 'Release Month', 'Release Year']
    criteriaY: One of the following ['Gross (MM USD)', 'Runtime (min.)', 'IMDb Score']
    
    Result
    ----------
    Boxplot based on selected criteria
    
    """
    table = table.copy(deep=True)
    #Criteria list:
    fieldX = ['Standarized Rating', 'Release Month', 'Release Year']
    fieldY = ['Gross (MM USD)', 'Runtime (min.)', 'IMDb Score']
    #Genres list:
    genres = search_list(table, 'Genre')
    #Apply widgets.
    output = interact(boxplot, table=fixed(table), genre=genres, criteriaX=fieldX, criteriaY=fieldY)
    return output    