# Box Office visual investigation across time

This notebook investigate the box office across time by creating interactive animated plots of accumulated box office revenue with `plotly`.

We take inspiration from the way [this article/post](#https://towardsdatascience.com/how-to-produce-an-animated-bar-plot-in-plotly-using-python-2b5b360492f8) uses plotly express to generate animated bar plots that changes based on year. We will however modify it to our approach and start by expressing the approach using auxiliary functions, that are import from the auxiliary python script, `plotly_aux.py`

In [1]:
import os
import pandas as pd

from plotly_aux import *
from raceplotly.plots import barplot

First we load the box office data frame.

In [2]:
data_dir = os.getcwd() + os.sep + 'data'

df_boxOffice = pd.read_pickle(rf"{data_dir}{os.sep}boxOffice.pkl")    
df_boxOffice.head()

Unnamed: 0,days,dow,rank,daily,theaters,special events,movie
0,2019-05-24,Friday,1,31358935.0,4476,,Aladdin
1,2019-05-25,Saturday,1,30013295.0,4476,,Aladdin
2,2019-05-26,Sunday,1,30128699.0,4476,,Aladdin
3,2019-05-27,Monday,1,25305033.0,4476,Memorial Day,Aladdin
4,2019-05-28,Tuesday,1,12014982.0,4476,,Aladdin


We extract months that are to be used as the time axis on the animated plot and visualize the one of them to see what the time-index looks like.

In [3]:
df_boxOffice['year-month'] = [("-").join(date.split("-")[:2]) for date in df_boxOffice['days']]
df_boxOffice['year-month'].unique()[0]

'2019-05'

And now we can plot it! We start by visualizing a subsample of 10 random movies.

In [4]:
# specifying parameters
N_samples = 10
attribute = 'daily'
y_label = 'Box Office revenue'
time_attribute = 'year-month'
title = f'Top 10 movies in terms of {y_label}'

save_fig = False

# creating dataframes
df_boxOffice_wrangled = wrangleData(df_boxOffice, time_attribute=time_attribute)
df_plot, df_sample= createPlotDF(df_boxOffice_wrangled, attribute=attribute, time_attribute=time_attribute, y_label=y_label, N_samples=N_samples)
    
# plotting figure
fig = animatedBarPlot(df_plot, y_label=y_label, time_attribute=time_attribute, speed=0.4, title=title, save_fig=save_fig)

  0%|          | 0/9 [00:00<?, ?it/s]

In [5]:
df_sample.groupby('movie')[attribute].sum()

movie
Bohemian Rhapsody                          216303339.0
Fantastic Beasts and Where to Find Them    234037575.0
Fast & Furious 7                           351032910.0
Incredibles 2                              608581744.0
Inside Out                                 353612437.0
The Hunger Games: Mockingjay - Part 2      281723902.0
The Jungle Book                            364001123.0
Thor: Ragnarok                             315058289.0
Wonder Woman                               412563408.0
Name: daily, dtype: float64

And then we visualize it for all movies!

In [7]:
# specifying parameters
N_samples = None
attribute = 'daily'
y_label = 'Box Office revenue'
time_attribute = 'year-month'
title = ''# f'Top 10 movies in terms of {y_label}'

save_fig = True

# creating dataframes
df_boxOffice_wrangled = wrangleData(df_boxOffice, time_attribute=time_attribute)
df_plot, df_sample= createPlotDF(df_boxOffice_wrangled, attribute=attribute, time_attribute=time_attribute, y_label=y_label, N_samples=N_samples)
    
# plotting figure
fig = animatedBarPlot(df_plot, y_label=y_label, time_attribute=time_attribute, speed=0.2, title=title, save_fig=save_fig)

  0%|          | 0/56 [00:00<?, ?it/s]

saved html version of plot to: C:\Users\Albert Kjøller\Documents\EPFL\Courses\CS-401_ADA\ada-2021-project-f-jab\exploratory\plotlyplots


In [7]:
df_sample.groupby('movie')[attribute].sum()

movie
Aladdin                                                                355559216.0
Aquaman                                                                335061807.0
Avengers: Age of Ultron                                                455530367.0
Avengers: Endgame                                                      858373000.0
Avengers: Infinity War                                                 678815482.0
Bad Boys for Life                                                      204417855.0
Batman v Superman: Dawn of Justice                                     330360194.0
Beauty and the Beast                                                   504014165.0
Birds of Prey: And the Fantabulous Emancipation of One Harley Quinn     84158461.0
Black Panther                                                          700059566.0
Bohemian Rhapsody                                                      216303339.0
Captain America: Civil War                                             408084349.

The approach is also usable with a daily time indexing but has a longer running time and does not look as smooth when animated...

In [8]:
df_boxOffice.days.unique()

array(['2019-05-24', '2019-05-25', '2019-05-26', ..., '2021-04-13',
       '2021-04-14', '2021-04-15'], dtype=object)

In [9]:
# specifying parameters
N_samples = 10
attribute = 'daily'
y_label = 'Box Office revenue'
time_attribute = 'days'
title = f'Accumulated {y_label} across {time_attribute}'

save_fig = False

# creating dataframes
df_boxOffice_wrangled = wrangleData(df_boxOffice, time_attribute=time_attribute)
df_plot, df_sample= createPlotDF(df_boxOffice_wrangled, attribute=attribute, time_attribute=time_attribute, y_label=y_label, N_samples=N_samples)
    
# plotting figure
fig = animatedBarPlot(df_plot, y_label=y_label, time_attribute=time_attribute, speed=0.9, title=title, save_fig=save_fig)

  0%|          | 0/9 [00:00<?, ?it/s]

In [10]:
df_sample.groupby('movie')[attribute].sum()

movie
Avengers: Infinity War                                                 678815482.0
Birds of Prey: And the Fantabulous Emancipation of One Harley Quinn     84158461.0
Bohemian Rhapsody                                                      216303339.0
Fantastic Beasts and Where to Find Them                                234037575.0
Fast & Furious 7                                                       351032910.0
Jumanji: Welcome to the Jungle                                         404515480.0
Spectre                                                                200074609.0
The Hunger Games: Mockingjay - Part 2                                  281723902.0
The Martian                                                            228395947.0
Name: daily, dtype: float64