# Adjusting budget data for inflation -> Creating dataframe and pickle file

### Inflation for budget will be calculated according to the consumer price index (CPI) from 2022-12-31
Using the equation: adjusted_value = (old_value * cpi_current) / cpi_old 

From https://medium.com/analytics-vidhya/adjusting-for-inflation-when-analysing-historical-data-with-python-9d69a8dcbc27

#### Oldest and newest movies:

Oldest = 2010-01-08

Newest = 2020-02-21

Note: Do not use learn-env environment. Needs to be most recent python3

In [1]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import quandl
QUANDL_KEY = 'daf3zZyPaQRMjpUcP8Lg'           # Diego's account quandl key
quandl.ApiConfig.api_key=QUANDL_KEY

In [2]:
# If quandl not installed, use pip
# ! pip install quandl

In [3]:
master_df = pd.read_pickle('../plotting_data/movie_master_dataset.pkl')

### Creating the cpi dataframe to access the cpi values when joining later

In [4]:
# Creating a cpi dataframe to be used for creating an inflation adjusted budget column
df_cpi = quandl.get(dataset='RATEINF/CPI_USA',start_date='2009-01-01',end_date='2023-12-22')
df_cpi.rename(columns={'Value':'cpi'}, inplace=True)
df_cpi.reset_index(inplace=True)

In [5]:
# Adding columns to the master dataframe that include the year and month as integers in their own columns
master_df['y'], master_df['m'] = master_df['release_date'].dt.year, master_df['release_date'].dt.month

In [6]:
# Adding columns to the cpi dataframe that include the year and month as integers in their own columns
df_cpi['y'], df_cpi['m'] = df_cpi['Date'].dt.year, df_cpi['Date'].dt.month

In [7]:
# Merging the cpi dataframe with the master dataframe along month and day of release and month and day of cpi
master_df = master_df.merge(df_cpi, how='left', left_on=['y', 'm'], right_on=['y', 'm'])

In [8]:
# Adding the inflation adjust budget column using curr_cpi as the current cpi value (from 2022-12-31)
cur_cpi = df_cpi.iloc[-1]['cpi']    # The 'current' cpi (i.e. the cpi on 2022-12-31)
master_df['inf_adj_production_budget'] = ((master_df['production_budget']*cur_cpi)/master_df['cpi']).astype(int)

In [9]:
# Remove unnecessary columns
master_df.drop(labels=['y', 'm', 'Date', 'cpi'], axis=1, inplace=True)

In [10]:
# Change order of columns so they have inflation adjusted production budget after regular production budget
corrected_col_order = master_df.columns.tolist()
corrected_col_order.insert(3, corrected_col_order.pop(-1))
master_df = master_df[corrected_col_order]

#### Sending the inflation adjusted budget to a pickle file

In [11]:
master_df.to_pickle('../plotting_data/movie_master_dataset_with_inflation.pkl')