In [2]:
import pickle

# RESULTS

## Does the analysis matter?

![](visuals/1.png)

<!-- ![](visuals/2.png) -->

![](visuals/3.png)

![](visuals/4.png)

<!-- ![](visuals/5.png) -->

## Hypothesis 1 : higher budget movies are a higher risk and are less likely to be profitable.

![](visuals/6.png)

<!-- ![](visuals/7.png) -->

![](visuals/8.png)

![](visuals/9.png)



## Hypothesis 2: Does movie runtime influence revenue?



![](visuals/10.png)

Plot of number of movies within a certain range of runtimes. There’s a small group of much shorter films that are in the 0-30min range, which come mostly from the earlier years where technological and financial considerations limited runtimes of movies, and some that stretch well over two hours, but they’re relatively rare compared to the cluster in the middle. This suggests that “feature-length” typically centers around that hour-and-a-half sweet spot, with fewer productions choosing to deviate too far from it.

![](visuals/11.png)

Early films in the 1910s and 1920s tended to be shorter, but by the mid-20th century, typical runtimes settled into a stable range around 90–100 minutes. Over the decades, the median runtime hasn’t shifted dramatically, suggesting that the “feature-length” standard reached a sweet spot and stayed there. Even so, there are always outliers—exceptionally long films appear in every era—but they remain the exception rather than the rule.

![](visuals/12.png)

These boxplots show that movie runtimes don’t change much by release month. The medians stay around 90–100 minutes year-round, and every month has its share of short and long films. While December’s median might be a bit higher, the overall pattern is pretty steady. In other words, a film’s release month doesn’t seem to influence how long it is.

![](visuals/13.png)

The plot on the left suggests that across all movies, there isn’t a strong relationship between runtime and median profitability—the line’s basically flat. However, the plot on the right, focusing on the last 20 years, shows a positive trend, with longer movies tending to be more profitable on average. We can also consider that as time passed, higher budgets and marketing efforts often go to longer “event” films, which might explain their higher profitability. It’s not guaranteed that making a movie longer leads to bigger profits, but the data suggests a recent pattern where movies that run a bit longer may be benefiting from greater audience interest and investment.

## Hypothesis 3: what is the most profitable release period?

![](visuals/14.png)

![](visuals/15.png)

![](visuals/16.png)

<!-- ![](visuals/17.png) -->

<!-- ![](visuals/18.png) -->

![](visuals/19.png)




## Hypothesis 4: Some genres or themes are more popular and might draw more spectators

![](visuals/22.png)

These plots show how certain genres have changed in both profitability and their share of the market over time. For example, “Family”, "Action", and “Romantic” films generally became less profitable even though their representation increased as decades passed. “War” sees a downfall of both representation and profitability, perhaps because of the world's traumatising experience with war in the last century.  “LGBT” films, on the other hand, saw both their representation and profitability tick upwards. “Superhero” genres don’t have as clear patterns, but they also show shifts. Overall, it suggests that audience tastes and industry focus move over time, giving different genres periods of growth or decline.

![](visuals/23.png)

Over the last two decades, "Family" and "Romantic" films, like was the trend over the last century, seem to have become slightly more profitable, even as their share of the market declined. "War" and "Action" films have seen a gradual drop in profitability without gaining much presence. "Superhero" genres show a mild upward trend in profitability, while also slowly gaining representation. Meanwhile, "LGBT" films show no clear profitability trend over time, and their representation remains fairly low and flat. Overall, the changes are subtle, but it’s clear that some genres are slowly drifting in terms of both profitability and how frequently they’re produced.

![](visuals/24.png)

This heatmap shows which combinations of genres and themes tend to bring in higher profits. Darker blues mark higher profitability. For example, “LGBT” movies with “Love” or “Identity” themes are very profitable, as are “Action” films touching on “Friendship”. Some of these profitable cells have few samples, so it might be luck rather than a broad trend. Overall, it suggests that certain thematic choices within a genre can pay off, but you need to be cautious when the sample size is small.

![](visuals/27.png)

These plots show how certain themes have evolved in both profitability and how often they appear. For “Friendship,” profitability hasn’t changed much, but these films have grown more common. Themes like “Resilience” and “Identity” started off profitable but have trended downward over time, even though their representation shifted only a bit. “Deception” and “Love” themes saw their profitability fall as their prevalence decreased. “Family” films have stayed pretty stable on both fronts. Overall, it suggests that audience tastes and industry strategies around these themes have shifted, affecting both the financial returns and how frequently filmmakers choose them.

![](visuals/28.png)

In the past two decades, most themes have gradually lost some profitability and become a bit less common. “Friendship” and “Deception,” for example, both saw their returns and share shrink over time. “Love” also followed this downward path in profitability. “Resilience” and “Identity” don’t show strong trends, but they aren’t notably improving either. The one standout is “Family,” which shows a mild upward trend in profitability and presence, suggesting that as other themes waned, audiences and filmmakers turned more toward family-centric stories.


<!-- ## Combined results
# COMMENT AJOUTER UN PLOT ANIMÉ HTML? -->

## Case study: war-related movies
In we research of dominant factors of movies commercial success over time we were interested to see the relationship of real worlds events and cultural movments with the cinema industry. 
To explore this we decided to analysis a subset of the CMU dataset composed of movies that have war-related genres and are from the USA. We also filtered out the movies for which we didn't have the box office revenue and release year data and ended up with 391 war-related movies.

In the following figure we plotted the evolution of the number of war-related movies and all movie release over time.  

![](visuals/war_movies_over_time.png)

As we can see, the number of war-related movies' release for each year fluctuates a lot so we decided to apply a rolling average with a 3-year window in order to observe long-term trend. We further annotated the plot with the major conflicts in which the USA was involved in the last century.

![](visuals/war_movies_over_time_smoothed.png)

From this it seems that conflicts such as WWII, the Vietnam War or the Cold War did induce some peak of war-related movies. Furthermore when plotting the proportion of war-related movies over the years we see that in the 1940's these movies accounted for a big porcentage of the movies' release culmunating to 50% in 1943.   

![](visuals/war_movies_proportion_over_time.png)

For further analysis we calculated the Pearson correlation between the number of war-related movies and all movies' release per year:  

In [7]:
# Load variables from the file
with open('results/saved_variables.pkl', 'rb') as file:
    data = pickle.load(file)

# Access the variables
corr = data['corr']
p_value = data['p_value']

print(f"Correlation: {corr}, P-value: {p_value}")

Correlation: 0.617434365120252, P-value: 6.479400208508737e-10


We have a correlation coefficient of 0.62 which indicates moderate positive linear relationship between the number of war movies per year and the total number of movies per year. The P-value is nearly 0 which suggests the observed correlation is highly statistically significant. 
This implies that the variation in war-related movie's releases is greatly related to the variation that exists for all movie's releases, ie the general increase in movies production is a driving factor of the war movies production.

To complement this result we also performed an OSL regression.

![](visuals/OSL_war_movies.png)


In [10]:
# Load and print the OSL regression model summary
with open('results/OSL_summary.txt', 'r') as file:
    summary = file.read()

print(summary)

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.381
Model:                            OLS   Adj. R-squared:                  0.373
Method:                 Least Squares   F-statistic:                     49.29
Date:                Fri, 20 Dec 2024   Prob (F-statistic):           6.48e-10
Time:                        19:02:30   Log-Likelihood:                -200.13
No. Observations:                  82   AIC:                             404.3
Df Residuals:                      80   BIC:                             409.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.2988      0.469      4.899      0.0

We have a R-squared of 0.387 wich means that 38.7% of the variance of the number of war-related movies' releases per year is explained by the total number of movies' releases per year, this suggests that other factors play a role. From the residuals plot above we can see that the relationship between the number of war movies per year and the total number of movies per year is not well explained by this linear model and confirms that some other factors, such as actual wars, influence the war-related movies' releases.

We then explored the average box office revenue of war-related movies vs all movies and, as earlier, we smoothed the data with a 3-year window rolling average. We also computed the proportion 

![](visuals/war_movies_BO.png)

We observe that altought it fluctuates a lot we have a big peak during the Vietnam War/Cold War and that over the last 30 years the genre performs better on average than the rest of the industry.



## Shortcomings of our analysis

There are more than actual useful results :-)