# When to Release? The Impact of Release Timing on Disney Movie Revenues
Author: Kiara McCartan

## Introduction

### Question(s) of Interest
This project intends to find the correlation between seasons and Disney movie revenues, in order to determine the best time of year to release Disney films. Analyzing seasonal trends in this manner demonstrates whether seasonal release timing correlates with increased box office performance.

### Dataset Description
The dataset used in this analysis is titled disney_movies_total_gross.csv. It is sourced from data.world and compiled by FiveThirtyEight. It contains box office data for movies released under the Disney brand from 1937 through 2020.

Each row corresponds to a film and includes data on:

* **movie_title:** The name of the movie.

* **release_date:** The U.S. theatrical release date.

* **genre:** The genre classification (e.g., Adventure, Drama, Musical).

* **MPAA_rating:** The movie’s official rating (e.g., G, PG, PG-13).

* **total_gross:** The domestic box office gross in U.S. dollars (not adjusted for inflation).

* **inflation_adjusted_gross:** The total gross revenue adjusted for inflation to current dollars.

For this project, additional columns were derived from the original data:

* **release_month:** The month (1–12) the movie was released.

* **season:** A new categorical variable based on the release month, grouping each film into a season: Winter (Dec–Feb), Spring (Mar–May), Summer (Jun–Aug), or Fall (Sep–Nov).

This dataset provides a solid foundation to explore how the release season may influence box office performance, revealing whether certain times of year tend to yield higher-grossing films.

## Methods & Results
To explore whether the release season of a Disney movie impacts its box office performance, I used a dataset containing Disney film release dates and revenue figures. The analysis focused on cleaning and preparing the data, creating seasonal categories, and then comparing the average revenue across these seasons.

Before we proceed, I will start by loading the necessary libaries and CSV files using pandas.

In [1]:
# Lets import all the required libraries needed for this analysis
import pandas as pd
import altair as alt

disney_movies = pd.read_csv('data/disney_movies_total_gross.csv')


Here is what the original dataframe looks like:

In [2]:
disney_movies

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross
0,Snow White and the Seven Dwarfs,"Dec 21, 1937",Musical,G,"$184,925,485","$5,228,953,251"
1,Pinocchio,"Feb 9, 1940",Adventure,G,"$84,300,000","$2,188,229,052"
2,Fantasia,"Nov 13, 1940",Musical,G,"$83,320,000","$2,187,090,808"
3,Song of the South,"Nov 12, 1946",Adventure,G,"$65,000,000","$1,078,510,579"
4,Cinderella,"Feb 15, 1950",Drama,G,"$85,000,000","$920,608,730"
...,...,...,...,...,...,...
574,The Light Between Oceans,"Sep 2, 2016",Drama,PG-13,"$12,545,979","$12,545,979"
575,Queen of Katwe,"Sep 23, 2016",Drama,PG,"$8,874,389","$8,874,389"
576,Doctor Strange,"Nov 4, 2016",Adventure,PG-13,"$232,532,923","$232,532,923"
577,Moana,"Nov 23, 2016",Adventure,PG,"$246,082,029","$246,082,029"


This is a lot of information. For better readability, let's limit it to the first five rows.

In [3]:
disney_movies.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross
0,Snow White and the Seven Dwarfs,"Dec 21, 1937",Musical,G,"$184,925,485","$5,228,953,251"
1,Pinocchio,"Feb 9, 1940",Adventure,G,"$84,300,000","$2,188,229,052"
2,Fantasia,"Nov 13, 1940",Musical,G,"$83,320,000","$2,187,090,808"
3,Song of the South,"Nov 12, 1946",Adventure,G,"$65,000,000","$1,078,510,579"
4,Cinderella,"Feb 15, 1950",Drama,G,"$85,000,000","$920,608,730"


In order to prepare the data to reflect the question we want answered, we will need to create a function that:
* Converts the release_date column to datetime format.

* Extracts the release month from the date.

* Categorizes each month into one of the four seasons (Winter, Spring, Summer, Fall).

* Cleans the revenue columns by removing dollar signs and commas and converting them to numeric types.

Let's start by importing the function from the `function_script.py` file.

In [4]:
from function_script import wrangle_disney_movies

Now that our function has been imported, we can go ahead and run the wrangling code.

In [5]:
# Wrangle the data
cleaned_movies_df = wrangle_disney_movies(disney_movies)
cleaned_movies_df

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_month,season
0,Snow White and the Seven Dwarfs,1937-12-21,Musical,G,184925485.0,5.228953e+09,12,Winter
1,Pinocchio,1940-02-09,Adventure,G,84300000.0,2.188229e+09,2,Winter
2,Fantasia,1940-11-13,Musical,G,83320000.0,2.187091e+09,11,Fall
3,Song of the South,1946-11-12,Adventure,G,65000000.0,1.078511e+09,11,Fall
4,Cinderella,1950-02-15,Drama,G,85000000.0,9.206087e+08,2,Winter
...,...,...,...,...,...,...,...,...
574,The Light Between Oceans,2016-09-02,Drama,PG-13,12545979.0,1.254598e+07,9,Fall
575,Queen of Katwe,2016-09-23,Drama,PG,8874389.0,8.874389e+06,9,Fall
576,Doctor Strange,2016-11-04,Adventure,PG-13,232532923.0,2.325329e+08,11,Fall
577,Moana,2016-11-23,Adventure,PG,246082029.0,2.460820e+08,11,Fall


With the cleaned data prepared, we now proceed to calculate summary statistics that will help us understand box office revenue trends by season. To lessen the effect of unusually high or low values, we calculate the median total gross for each season.

In [6]:
# Count movies per season
season_counts = cleaned_movies_df['season'].value_counts()

# Average total gross by season
season_avg_revenue = (cleaned_movies_df.groupby('season')['total_gross']
                                       .mean()
                     )

# Median total gross by season
season_median_revenue = (cleaned_movies_df.groupby('season')['total_gross']
                                          .median()
                                          .reset_index()
                        )
season_counts
season_avg_revenue
season_median_revenue

Unnamed: 0,season,total_gross
0,Fall,27800112.5
1,Spring,24242464.0
2,Summer,46573027.0
3,Winter,32349264.0


To clearly illustrate these findings, we employ Altair for visualization. The initial chart compares median total revenue across seasons, making it easier to discern seasonal trends.

In [7]:
alt.Chart(season_median_revenue).mark_bar().encode(
    x=alt.X('season:N', sort=['Winter', 'Spring', 'Summer', 'Fall'], title='Season'),
    y=alt.Y('total_gross:Q', title='Median Total Gross ($)'),
    color=alt.Color('season:N', legend=None)
).properties(
    title='Median Disney Box Office Revenue by Season',
    width=500,
    height=300
)

**Figure 1:** Median Disney Box Office Revenue by Season.

To put the revenue data into perspective, we also look at how many movies were released in each season. The following bar chart visualizes the seasonal distribution of movie releases.

In [8]:
season_counts = (
    cleaned_movies_df['season']
    .value_counts()
    .rename_axis('season')
    .reset_index(name='count')
)

alt.Chart(season_counts).mark_bar().encode(
    x=alt.X('season:N', sort=['Winter', 'Spring', 'Summer', 'Fall'], title='Season'),
    y=alt.Y('count:Q', title='Number of Movies Released'),
    color='season:N'
).properties(
    title='Number of Disney Movies Released by Season',
    width=500,
    height=300
)

**Figure 1:** Number of Disney Movies Released by Season

The first bar chart shows that summer releases had the highest average total gross, followed by winter, fall, and spring, suggesting higher earnings during summer months. The chart of movie release counts adds context by showing how release volume varies by season, which may also influence revenue patterns.

Now how do these revenues change when the month of December is considered it's own season? I've created a separate function script specifically for this so we can view how the median gross revenues are influenced by Christmas break. Let's import the function `december_function_script.py`.

In [9]:
from december_function_script import december_disney_movies

Now, let's run the new wrangling code, and calculate the median gross earnings by season.

In [10]:
# Wrangle the data
cleaned_december_df = december_disney_movies(disney_movies)

# Count movies per season
december_season_counts = cleaned_december_df['season'].value_counts()

# Average total gross by season
december_season_avg_revenue = (cleaned_december_df.groupby('season')['total_gross']
                                         .mean()
                              )

# Median total gross by season
december_season_median_revenue = (cleaned_december_df.groupby('season')['total_gross']
                                            .median()
                                            .reset_index()
                                 )
december_season_counts
december_season_avg_revenue
december_season_median_revenue

Unnamed: 0,season,total_gross
0,Christmas,49074987.0
1,Fall,27800112.5
2,Spring,24242464.0
3,Summer,46573027.0
4,Winter,27688592.5


Using Altair, we can better discriminate the median gross earnings in the season `Christmas`.

In [12]:
alt.Chart(december_season_median_revenue).mark_bar().encode(
    x=alt.X('season:N', sort=['Winter', 'Spring', 'Summer', 'Fall'], title='Season'),
    y=alt.Y('total_gross:Q', title='Median Total Gross ($)'),
    color=alt.Color('season:N', legend=None)
).properties(
    title='Median Disney Box Office Revenue Including Christmas',
    width=500,
    height=300
)

**figure 3:** Median Disney Box Office Revenue Including Christmas

These findings tell us a new story, that Christmas surpasses Summer in box office revenue.

## Discussion
In this analysis, I explored how the release season affects Disney movies' box office revenue. The results show that, between the four seasons of summer, winter, fall and spring, movies released in summer generally earn the highest median total gross, while spring releases tend to perform the worst. Interestingly, more movies are released during the seasons with lower average revenue, such as fall and spring. This could suggest that Disney schedules more releases in these seasons to compensate for lower individual movie earnings.

When the data is filtered to show the median gross earnings with all four seasons and the month of December singled out as it's own season, the results shift to show that Christmas far surpasses the earnings in summer. This may suggest that Disney saves some of their bigger box office hits for Christmas break. 

These finding raises further questions about Disney’s release strategy—whether they deliberately space out big hits and smaller releases by season, or if other factors are at play. Future analyses could explore how genre or marketing budget varies by season to better understand these patterns.

## References
Not all of the work in this notebook is original. Some techniques and ideas were informed by publicly available resources and course materials. These elements were used solely for educational purposes.

### Resources Used
* Data Source
    * The Disney movies dataset was curated by FiveThirtyEight, which provided historical box office revenue and release information.
* Data Wrangling Approach
    * Much of the data cleaning and transformation logic—especially around date formatting, feature extraction, and string manipulation—was informed by techniques learned in the "Programming in Python" course from the Key Capabilities in Data Science certificate.
* Data Visualization
    * Altair was used for all visualizations. Chart design and syntax were guided by the Altair documentation and various examples found on Towards Data Science.
* Code Formatting 
    * All Python scripts and notebook code cells were formatted using Black (https://github.com/psf/black) to ensure consistent and clean style.
* Attribution
    * Parts of the data wrangling function were developed with guidance from ChatGPT to improve structure, docstring formatting, and error handling. All logic and implementation were reviewed and modified by me.