# Final Project

# Title : Exploring Disney Movies Total Gross

# Introduction

Disney movies have a magical way of capturing our hearts and sparking our imagination. In this project, we're on a quest to answer a simple yet intriguing question: 

**What makes Disney movies successful at the box office, and how has this changed over time?**

We're diving into the world of Disney films to uncover what makes them click with audiences and become big hits. The reason we're curious about this is because Disney movies aren't just movies—they're a big part of our lives. They entertain us, they teach us, and they've become a big deal worldwide.

Our goal? To find out patterns in this data and figure out why some Disney movies hit the jackpot while others maybe didn't shine as brightly. By doing this, we're hoping to understand why people love these movies and how Disney's storytelling has evolved through the years.

# Dataset Description

The dataset include all movies produced until 2016.

# Available Features:

- **Movie Title**
- **Release Date**
- **Genre**
- **MPAA Rating**
- **Total Gross**
- **Inflation Adjusted Gross**

# Methods and Results

# Import Libraries

In [1]:
import pandas as pd
import numpy as np
from hashlib import sha1
import altair as alt
import inspect

# Read the Data

In [2]:
disney_movies_df= pd.read_csv('data/disney_movies_total_gross.csv')
disney_movies_df.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross
0,Snow White and the Seven Dwarfs,"Dec 21, 1937",Musical,G,"$184,925,485","$5,228,953,251"
1,Pinocchio,"Feb 9, 1940",Adventure,G,"$84,300,000","$2,188,229,052"
2,Fantasia,"Nov 13, 1940",Musical,G,"$83,320,000","$2,187,090,808"
3,Song of the South,"Nov 12, 1946",Adventure,G,"$65,000,000","$1,078,510,579"
4,Cinderella,"Feb 15, 1950",Drama,G,"$85,000,000","$920,608,730"


# Explore the Data

In [3]:
# Check the shape of dataframe
disney_movies_df.shape

(579, 6)

There are 579 movies and 6 features included in this dataset.

In [4]:
# Check the summary of dataframe
#The .info() function in pandas provides a concise summary of a DataFrame 
#It includes information about the data types, the number of non-null values, and memory usage 

disney_movies_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 579 entries, 0 to 578
Data columns (total 6 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   movie_title               579 non-null    object
 1   release_date              579 non-null    object
 2   genre                     562 non-null    object
 3   MPAA_rating               523 non-null    object
 4   total_gross               579 non-null    object
 5   inflation_adjusted_gross  579 non-null    object
dtypes: object(6)
memory usage: 27.3+ KB


In [5]:
# Check if any column contains null values
disney_movies_df.isnull().sum()

movie_title                  0
release_date                 0
genre                       17
MPAA_rating                 56
total_gross                  0
inflation_adjusted_gross     0
dtype: int64

# Data Wrangling

In [6]:
# Check what genre of movies we have in this dataframe
disney_movies_df['genre'].unique()

array(['Musical', 'Adventure', 'Drama', 'Comedy', nan, 'Action', 'Horror',
       'Romantic Comedy', 'Thriller/Suspense', 'Western', 'Black Comedy',
       'Documentary', 'Concert/Performance'], dtype=object)

In [7]:
# Fill empty values with "Other" in genre column
disney_movies_df['genre'] = disney_movies_df['genre'].fillna('Other')

In [8]:
# Check what MPAA_rating of movies we have in this dataframe
disney_movies_df['MPAA_rating'].unique()

array(['G', nan, 'Not Rated', 'PG', 'R', 'PG-13'], dtype=object)

In [9]:
# Fill empty values with "Not Rated" in MPAA_rating column
disney_movies_df['MPAA_rating'] = disney_movies_df['MPAA_rating'].fillna('Not Rated')

In [10]:
disney_movies_df.isnull().sum()

movie_title                 0
release_date                0
genre                       0
MPAA_rating                 0
total_gross                 0
inflation_adjusted_gross    0
dtype: int64

**Awesome! There is no null values anymore in the dataset.**

In [11]:
# Check the oldest disney movie in the dataset
# We first need to convert 'release_date' column into 'Datetime64' dtype
disney_movies_df['release_date'] = pd.to_datetime(disney_movies_df['release_date'])

# Sort the dataframe by release date and get the movie title from the first row 
oldest_movie = disney_movies_df.sort_values(by='release_date')['movie_title'].head(1)
oldest_movie

0    Snow White and the Seven Dwarfs
Name: movie_title, dtype: object

In [12]:
# Check the most recent released disney movie in the dataset

most_recent_movie = disney_movies_df.sort_values(by='release_date', ascending = False)['movie_title'].head(1)
most_recent_movie

578    Rogue One: A Star Wars Story
Name: movie_title, dtype: object

In [13]:
print(f"The oldest movie was released in {disney_movies_df['release_date'].min()}.")

The oldest movie was released in 1937-12-21 00:00:00.


In [14]:
print(f"The newest movie was released in {disney_movies_df['release_date'].max()}.")

The newest movie was released in 2016-12-16 00:00:00.


In [15]:
# get the release year of movies from release date and add it as a new column in dataframe
disney_movies_df['release_year'] = disney_movies_df['release_date'].dt.year

In [16]:
#check the modified dataframe
disney_movies_df.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year
0,Snow White and the Seven Dwarfs,1937-12-21,Musical,G,"$184,925,485","$5,228,953,251",1937
1,Pinocchio,1940-02-09,Adventure,G,"$84,300,000","$2,188,229,052",1940
2,Fantasia,1940-11-13,Musical,G,"$83,320,000","$2,187,090,808",1940
3,Song of the South,1946-11-12,Adventure,G,"$65,000,000","$1,078,510,579",1946
4,Cinderella,1950-02-15,Drama,G,"$85,000,000","$920,608,730",1950


# Let's Analyze the Categorical Variables

**Analyzing the variable Genre:**

In [17]:
genre_counts = disney_movies_df['genre'].value_counts().reset_index()
genre_counts.columns = ['Genre', 'Number of Movies']

# Create Altair bar chart
genre_chart = (
    alt.Chart(genre_counts, width=500, height=300)
    .mark_bar()
    .encode(
        x=alt.X("Genre:N", title = "Genre", sort = '-y'),
        y=alt.Y("Number of Movies:Q", title="Number of Movies"),
        tooltip=['Genre:N', 'Number of Movies:Q']
    )
    .properties(title='Figure 1: Number of Disney Movies by Genre')
)

# Display the chart
genre_chart

Here we see that the biggest genre from this dataframe is Comedy (182 movies). 

**Analyzing the Motion Picture Association film rating:**

In [18]:
rating_counts = disney_movies_df['MPAA_rating'].value_counts().reset_index()
rating_counts.columns = ['MPAA rating', 'Number of Movies']

# Create Altair bar chart
rating_chart = (
    alt.Chart(rating_counts, width=500, height=300)
    .mark_bar()
    .encode(
        x=alt.X("MPAA rating:N", title = "MPAA rating", sort = '-y'),
        y=alt.Y("Number of Movies:Q", title="Number of Movies"),
        tooltip=['MPAA rating:N', 'Number of Movies:Q']
    )
    .properties(title='Figure 2: Number of Disney Movies by MPAA rating')
)

# Display the chart
rating_chart

Here we can see the “PG” mpaa rating is in the almost of Disney movies.

# Analyzing the numeric variables:

**Analyzing the relationship between genre vs. inflation adjusted gross:**

In [19]:
# Remove dollar sign and commas from total_gross column and convert to numeric
disney_movies_df['inflation_adjusted_gross'] = disney_movies_df['inflation_adjusted_gross'].replace('[\$,]', '', regex=True).astype(float)

# Create Altair scatter plot
scatter_genre_inflation_adjusted_gross = (
    alt.Chart(disney_movies_df, width=500, height=300)
    .mark_circle(color='slateblue', size=80, opacity=0.7)
    .encode(
        x=alt.X('genre:N', title='Genre'),
        y=alt.Y('inflation_adjusted_gross:Q', title='Inflation Adjusted Gross'),
        tooltip=['genre:N', 'inflation_adjusted_gross:Q']
    )
    .properties(title="Figure 3: Relationship between Genre and Inflation Adjusted Gross")
)

# Display the scatter plot
scatter_genre_inflation_adjusted_gross

Analyzing the relationship between genre and grossing we can see that the highest grossing Disney movie is correlationated by Adventure Movies.

# Chaining Operations

**Craete a chain operation to see the total inflation adjusted gross income by genre**

In [20]:
genre_gross_income = (
    disney_movies_df
    .assign(inflation_adjusted_gross_numeric=disney_movies_df['inflation_adjusted_gross'].replace('[\$,]', '', regex=True).astype(float))  # Convert inflation_adjusted_gross to numeric
    .sort_values(by='release_year')  # Sort DataFrame by release_year
    .groupby('genre')['inflation_adjusted_gross_numeric'].sum()  # Group by genre and calculate total_gross_numeric sum
    .reset_index()  # Reset index to convert the result back to DataFrame
)

# Format the numeric values as currency
genre_gross_income['inflation_adjusted_gross_numeric'] = genre_gross_income['inflation_adjusted_gross_numeric'].map('${:,.2f}'.format)

genre_gross_income


Unnamed: 0,genre,inflation_adjusted_gross_numeric
0,Action,"$5,498,936,786.00"
1,Adventure,"$24,561,266,158.00"
2,Black Comedy,"$156,730,475.00"
3,Comedy,"$15,409,526,913.00"
4,Concert/Performance,"$114,821,678.00"
5,Documentary,"$203,488,418.00"
6,Drama,"$8,195,804,484.00"
7,Horror,"$140,483,092.00"
8,Musical,"$9,657,565,776.00"
9,Other,"$367,603,384.00"


Here we can see that Adventure genre movies makes the highest inflation adjusted gross of $24 Billion.

# Black Formatting

**Run black on our sampling.py.**

In [21]:
!black sampling.py

[1mAll done! ✨ 🍰 ✨[0m
1 file left unchanged.[0m


In [22]:
from sampling import sample_dataframe


sampled_df = sample_dataframe(disney_movies_df, 'genre', N=2)
# Format the numeric values as currency
sampled_df['inflation_adjusted_gross'] = sampled_df['inflation_adjusted_gross'].map('${:,.2f}'.format)
sampled_df.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year
177,3 Ninjas Kick Back,1994-05-06,Action,PG,"$11,744,960","$24,267,154.00",1994
29,Condorman,1981-08-07,Action,Not Rated,$0,$0.00,1981
212,Operation Dumbo Drop,1995-07-28,Adventure,PG,"$24,670,346","$47,809,421.00",1995
41,The Black Cauldron,1985-07-24,Adventure,Not Rated,"$21,288,692","$50,553,142.00",1985
297,Rushmore,1998-12-11,Black Comedy,R,"$17,105,219","$28,392,518.00",1998


This sampled 2 rows from each genre in our disney_movies_df DataFrame using the sample_dataframe function defined in sampling.py script.

**Check distribution of gross income of disney movies**

In [23]:

disney_movies_df['total_gross'] = disney_movies_df['total_gross'].replace('[\$,]', '', regex=True).astype(float)

total_gross_histogram = alt.Chart(disney_movies_df).mark_bar().encode(
    alt.X("total_gross:Q", bin=alt.Bin(maxbins=30), title="Total Gross"),
    alt.Y("count()", title="Frequency"),
    tooltip=["count()"]
).properties(
    width=600,
    height=400,
    title="Figure 4: Distribution of Total Gross"
)

total_gross_histogram

In [24]:
inflation_adjusted_gross_histogram = alt.Chart(disney_movies_df).mark_bar().encode(
    alt.X("inflation_adjusted_gross:Q", bin=alt.Bin(maxbins=30), title="Inflation Adjusted Gross"),
    alt.Y("count()", title="Frequency"),
    tooltip=["count()"]
).properties(
    width=600,
    height=400,
    title="Figure 5: Distribution of Inflation Adjusted Gross"
)

inflation_adjusted_gross_histogram

From the histogram, we see that there are some movies where gross income is 0 dollar which is unusal.

In [25]:
#Check movies where total_gross or inflation_adjusted_gross is $0
disney_movies_df[(disney_movies_df['total_gross'] == 0) | (disney_movies_df['inflation_adjusted_gross'] == 0)]

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year
20,The Many Adventures of Winnie the Pooh,1977-03-11,Other,Not Rated,0.0,0.0,1977
27,Amy,1981-03-20,Drama,Not Rated,0.0,0.0,1981
29,Condorman,1981-08-07,Action,Not Rated,0.0,0.0,1981
355,Frank McKlusky C.I.,2002-01-01,Other,Not Rated,0.0,0.0,2002


Here we got 4 movies where gross income is 0. To fix these data, let’s replace those null values by the median value of each variable.

In [26]:
# Calculate the median values for each column
median_total_gross = disney_movies_df['total_gross'].median()
median_inflation_adjusted_gross = disney_movies_df['inflation_adjusted_gross'].median()

# Use .loc to replace the records with 0 with median values
disney_movies_df.loc[(disney_movies_df['total_gross'] == 0), 'total_gross'] = median_total_gross
disney_movies_df.loc[(disney_movies_df['inflation_adjusted_gross'] == 0), 'inflation_adjusted_gross'] = median_inflation_adjusted_gross

Now check again to see if there is any movies with 0 gross income.

In [27]:
disney_movies_df[(disney_movies_df['total_gross'] == 0) | (disney_movies_df['inflation_adjusted_gross'] == 0)]

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year


Done! Our dataframe is clean now.

# Check top 10 movies based on the adjusted gross revenue

We will check which are the 10 Disney movies that have earned the most at the box office. We can do this by sorting movies by their inflation-adjusted gross.

In [28]:
top_10_movies = disney_movies_df.sort_values(by='inflation_adjusted_gross', ascending=False).head(10)
# Format the numeric values as currency
top_10_movies['inflation_adjusted_gross'] = top_10_movies['inflation_adjusted_gross'].map('${:,.2f}'.format)
top_10_movies

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year
0,Snow White and the Seven Dwarfs,1937-12-21,Musical,G,184925485.0,"$5,228,953,251.00",1937
1,Pinocchio,1940-02-09,Adventure,G,84300000.0,"$2,188,229,052.00",1940
2,Fantasia,1940-11-13,Musical,G,83320000.0,"$2,187,090,808.00",1940
8,101 Dalmatians,1961-01-25,Comedy,G,153000000.0,"$1,362,870,985.00",1961
6,Lady and the Tramp,1955-06-22,Drama,G,93600000.0,"$1,236,035,515.00",1955
3,Song of the South,1946-11-12,Adventure,G,65000000.0,"$1,078,510,579.00",1946
564,Star Wars Ep. VII: The Force Awakens,2015-12-18,Adventure,PG-13,936662225.0,"$936,662,225.00",2015
4,Cinderella,1950-02-15,Drama,G,85000000.0,"$920,608,730.00",1950
13,The Jungle Book,1967-10-18,Musical,Not Rated,141843000.0,"$789,612,346.00",1967
179,The Lion King,1994-06-15,Adventure,G,422780140.0,"$761,640,898.00",1994


This is very interesting to see that the oldest musical disney movie "Snow White and the Seven Dwarfs" generated the highest gross revenue. Also we see that out of top 10 movies, 9 of them released before 2000. The only movie that released after 2000 made to the top income movie is "Star Wars Ep. VII: The Force Awakens", released in 2015.

# Python Script with Function for Data Wrangling

I created a python script called top_grossing_movies_by_genre.py which will provide top N highest gross income movies for any genre.

In [29]:
#Call the function from the python script
from top_grossing_movies_by_genre import top_grossing_movies_by_genre

# Check top 10 adventure movies
top_N_adventure_movies = top_grossing_movies_by_genre(disney_movies_df, 'Adventure', top_n=4)
top_N_adventure_movies['inflation_adjusted_gross'] = top_N_adventure_movies['inflation_adjusted_gross'].map('${:,.2f}'.format)

top_N_adventure_movies

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year
1,Pinocchio,1940-02-09,Adventure,G,84300000.0,"$2,188,229,052.00",1940
3,Song of the South,1946-11-12,Adventure,G,65000000.0,"$1,078,510,579.00",1946
564,Star Wars Ep. VII: The Force Awakens,2015-12-18,Adventure,PG-13,936662225.0,"$936,662,225.00",2015
179,The Lion King,1994-06-15,Adventure,G,422780140.0,"$761,640,898.00",1994


In [30]:
# top 5 Drama genre disney movies
top_N_drama_movies = top_grossing_movies_by_genre(disney_movies_df, 'Drama', top_n=5)
top_N_drama_movies['inflation_adjusted_gross'] = top_N_drama_movies['inflation_adjusted_gross'].map('${:,.2f}'.format)
top_N_drama_movies

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year
6,Lady and the Tramp,1955-06-22,Drama,G,93600000.0,"$1,236,035,515.00",1955
4,Cinderella,1950-02-15,Drama,G,85000000.0,"$920,608,730.00",1950
77,Dead Poets Society,1989-06-02,Drama,PG,95860116.0,"$202,531,517.00",1989
556,Cinderella,2015-03-13,Drama,PG,201151353.0,"$201,151,353.00",2015
243,Phenomenon,1996-07-05,Drama,PG,104636382.0,"$199,559,799.00",1996


In [31]:
!black top_grossing_movies_by_genre.py

[1mAll done! ✨ 🍰 ✨[0m
1 file left unchanged.[0m


In [32]:
!pytest unit_test_grossing_genre.py

platform linux -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/jupyter/prog-python-data-science-students/release/final_project
plugins: anyio-3.2.1, dash-1.20.0
collected 2 items                                                              [0m[1m

unit_test_grossing_genre.py [32m.[0m[32m.[0m[32m                                           [100%][0m



**Both test cases passed with our function.**

In [33]:
!black unit_test_grossing_genre.py

[1mAll done! ✨ 🍰 ✨[0m
1 file left unchanged.[0m


# Monitor Trend of Disney Movies

**Check no of movies released per year**

In [34]:
movie_counts = disney_movies_df['release_year'].value_counts().reset_index()
movie_counts.columns = ['release_year', 'Number of Movies']

# Create Altair bar chart
movie_counts_chart = (
    alt.Chart(movie_counts, width=700, height=300)
    .mark_line()
    .encode(
        x=alt.X("release_year:N", title = "Release Year"),
        y=alt.Y("Number of Movies:Q", title="Number of Movies"),
        tooltip=['release_year:N', 'Number of Movies:Q']
    )
    .properties(title='Figure 6: Number of Disney Movies by Release Year')
)

# Display the chart
movie_counts_chart

From this line chart, we can see that Disney movie released increased tremendously after 1990 and reached to spike in 1995 with record 32 movie release, then it reduced gradually and in between 2010-2016, Diney released on average 10-15 movies.

**How that relates with gross income?**

In [35]:
scatter_plot_total_gross_release_year = (
    alt.Chart(disney_movies_df, width=700, height=300)
    .mark_circle()
    .encode(
        x=alt.X("release_year:N", title="Release Year"),
        y=alt.Y("sum(total_gross):Q", title="Total Gross"),
        tooltip=['release_year:N', 'sum(total_gross):Q']
    )
    .properties(title='Figure 8: Total Gross by Release Year')
)


# Display the chart
scatter_plot_total_gross_release_year

Analyzing the graph, we can see that the films begin to grow grossing in begin of 90’s, same pattern we noticed for no of movies in previous chart. We can see that the decade with the highest-grossing is the 2010's with record highest gross income in 2013.

In [36]:
# Create a pivot table
genre_yearly = disney_movies_df.pivot_table(index=['genre', 'release_year'], values=['total_gross', 'inflation_adjusted_gross'], aggfunc='mean', fill_value=0).reset_index()

# Inspect genre_yearly
print(genre_yearly.head(10))

    genre  release_year  inflation_adjusted_gross  total_gross
0  Action          1981                55159783.0   30702446.0
1  Action          1982                77184895.0   26918576.0
2  Action          1988                36053517.0   17577696.0
3  Action          1990               118358772.0   59249588.5
4  Action          1991                57918572.5   28924936.5
5  Action          1992                58965304.0   29028000.0
6  Action          1993                44682157.0   21943553.5
7  Action          1994                39545796.0   19180582.0
8  Action          1995               122162426.5   63037553.5
9  Action          1996               257755262.5  135281096.0


In [37]:
genre_yearly['genre'] = genre_yearly['genre'].astype(str)

gross_genre_chart = (
    alt.Chart(genre_yearly, width=500, height=300)
    .mark_line()
    .encode(
        x=alt.X("release_year:N", title="Release Year"),
        y=alt.Y("inflation_adjusted_gross:Q", title="Inflation Adjusted Gross"),
        color='genre:N',
        tooltip=['release_year:N', 'inflation_adjusted_gross:Q']
    )
    .properties(title='Figure 9: Inflation Adjusted Gross Over Time by Genre')
)

# Display the chart
gross_genre_chart

The line plot illustrates the changing popularity of genres over time for Disney movies. Notably, it reveals that certain genres have experienced more rapid growth in popularity than others. In particular, the Action and Adventure genres stand out as the fastest-growing genres in terms of inflation-adjusted gross. This suggests a strong audience preference for Action and Adventure Disney movies, making them prominent contributors to the overall success and box office revenue growth over the years.

# Discussion:

This exploration into Disney movies provided valuable insights into their box office performance. The initial analysis of categorical variables revealed the prevalence of Comedy and a dominant "PG" rating, aligning with Disney's well-established family-friendly and entertaining image.

Transitioning to numeric variables, Adventure genres emerged as strong contributors to higher gross income, totaling an impressive 24 billion dollar. This aligns with the anticipation of Adventure movies resonating with a wide audience. The unexpected success of the older classic, "Snow White and the Seven Dwarfs," underscored Disney's enduring appeal and timeless creations.

The introduction of a Python script for top-grossing movies by genre, validated through tests, showcased practical utility for users seeking specific genre insights. Examining Disney movie trends over time revealed a notable post-1990 surge in movie releases, corresponding with increased total gross income. The line plot emphasized the consistent outperformance of Action and Adventure genres, highlighting their sustained popularity.

**Discussion on Findings:**

The findings, to some extent, align with our expectations outlined in the introduction. The dominance of Comedy and "PG" ratings resonates with Disney's family-centric brand. The substantial success of Adventure genres in terms of gross income was anticipated, given their broad audience appeal. The surprising triumph of an older classic speaks to the timeless nature of Disney's creations and their lasting impact.

**Impact of Findings:**

Understanding the genres that resonate most with audiences provides valuable insights for strategic decision-making in content creation and marketing campaigns. The observed trends over time offer a roadmap for adapting to changing audience preferences, ensuring Disney's continued relevance in a dynamic entertainment landscape. The enduring popularity of classics suggests opportunities for leveraging these timeless stories in future endeavors.

**Future Questions:**

To further enhance our understanding, future exploration into the impact of critical acclaim, audience reviews, and promotional strategies on box office performance could provide a more comprehensive view. Investigating streaming platform dynamics and changing consumption patterns would offer insights into the evolving dynamics of the film industry. These inquiries directly contribute to addressing our initial questions about what makes Disney movies successful and how these factors have evolved over time.

**Conclusion:**

In conclusion, this project unraveled patterns in Disney movies' success, emphasizing the significance of genres, historical classics, and temporal trends. The findings not only contribute to understanding Disney's box office triumphs but also provide valuable insights for shaping future strategies in the ever-evolving world of entertainment.







# References


## Resources used: 

1. Disney movies total gross dataset : The data were obtain from [Data Source](https://data.world/kgarrett/disney-character-success-00-16) which follows a [Creative Common Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/)
2. Python script for data wrangling function: https://kcds.student.stat.ubc.ca/jupyter/user/1085491/doc/tree/prog-python-data-science-students/release/final_project/top_grossing_movies_by_genre.py
3. Python script for unit tests: https://kcds.student.stat.ubc.ca/jupyter/user/1085491/doc/tree/prog-python-data-science-students/release/final_project/unit_test_grossing_genre.py
4. Python scipt for sampling function: https://kcds.student.stat.ubc.ca/jupyter/user/1085491/doc/tree/prog-python-data-science-students/release/final_project/sampling.py
5. Pandas documentation : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
6. Altair visualization library : https://altair-viz.github.io/user_guide/data.html#user-guide-data