Key:
* O = Original 
* A = Adaptation 
* S = Series (Original) 
* R = Reboot/Remake 
* N = Non-Fiction

Notes:
* Labeled musical features like Make Mine Music and Melody Time as N (there’s no story to be based on) but Fantasia as A since it is meant to be The Sorcerer’s Apprentice
* Noelle classified as ???? due to its being based on Santa - Coco classified as original despite being based on Day of the Dead (original story and characters)
* Some cite a connection between A Bug’s Life and the Aesop fable “The Ant and the Grasshopper”. The inspiration seems loose to me, admittedly this is a much deeper look than I would have taken for a less well-known movie 
* The Lion King is often cited as being based on Hamlet, but during production they also say that they wanted to lean more in the direction of Joseph in the Bible among other figures. This reads as more “common tropes from classic stories” than an adaptation of the story, so I’ve marked it as a very tentative (possibly controversial) original story
* Christmas movies that use Santa? Currently A* because of its use of a preexisting character, but I could see an argument for it being original based on other rulings I’ve made (use of folktale figures usually doesn’t make something an adaptation when it’s compiling many different unrelated ones e.g. Tall Tale (1995))

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

In [None]:
df = pd.read_csv('../Disney/disney_movies_halfannot.csv')
df['Branch'].unique()

array(['Standard', 'Touchstone', 'Hollywood Pictures',
       'Disney Movietoons', 'Hollywood', 'Pixar', 'Walt Disney Studios',
       'Intl', 'Disneynature', 'Miramax', 'Touchstone/Miramax',
       'Touchstone/DreamWorks', 'Marvel', 'Touchstone/Lucasfilm',
       'Lucasfilm', '20th Century Animation', '20th Century Studios'],
      dtype=object)

## Pixar/Standard Originality

Currently, Hollywood Pictures and Touchstone movies aren't completely labeled. 
These visualizations rely on Standard (Walt Disney Pictures) and Pixar, which together represent what a "Disney movie" is usually thought of as. This also excludes recently bought labels like Marvel and Lucasfilm.

In [None]:
px_st_df = df.drop(columns='i') # Pixar + Standard Dataframe
px_st_df = px_st_df[px_st_df['Original?'].notna()]
px_st_df = px_st_df[px_st_df['Branch'].isin(['Standard', 'Pixar'])]
simplified_px_st_df = {'O': 'Original', 'S': 'Sequel', 'N': 'Original', 'A': 'Adapt', 'A*': 'Adapt', 'AS': 'Sequel', 'R': 'Reboot'}
px_st_df['OG?'] = px_st_df['Original?'].map(simplified_px_st_df)

In [None]:
px_st_originality_by_year = px_st_df.groupby(['Year','Original?']).size().reset_index(name='count')
px_st_originality_by_year = px_st_originality_by_year.pivot(index='Year', columns='Original?', values='count')


og_order = ['O', 'S', 'N', 'A', 'A*', 'AS', 'R']
px_st_originality_by_year = px_st_originality_by_year[og_order]

In [None]:
px_st_simplified_by_year = px_st_df.groupby(['Year','OG?']).size().reset_index(name='count')
px_st_simplified_by_year = px_st_simplified_by_year.pivot(index='Year', columns='OG?', values='count')


simplified_og_order = ['Original', 'Adapt', 'Reboot', 'Sequel']
px_st_simplified_by_year = px_st_simplified_by_year[simplified_og_order]

In [None]:
fig_originality_by_year = px.bar(px_st_originality_by_year, title="Originality by Year", labels={'value':'Movie Count'})
fig_originality_by_year.show()

In [None]:
fig_simp_originality_by_year = px.bar(px_st_simplified_by_year, title = "Originalilty Over Time (Simplified)", labels={'value':"Movies Produced"})
fig_simp_originality_by_year.show()

In [None]:
simplified_freq_by_year = px_st_df.groupby(['Year','OG?']).size().reset_index(name='count')

total_counts = simplified_freq_by_year.groupby('Year')['count'].transform('sum')
simplified_freq_by_year['relative_frequency'] = simplified_freq_by_year['count'] / total_counts

simplified_freq_by_year = simplified_freq_by_year.pivot(index='Year', columns='OG?', values='relative_frequency')

simplified_og_order = ['Original', 'Adapt', 'Sequel', 'Reboot']
simplified_freq_by_year = simplified_freq_by_year[simplified_og_order]

In [None]:
hovertemplate = '%{value:.2%}'
fig_simp_freq_by_year = px.bar(simplified_freq_by_year, color_discrete_sequence=px.colors.qualitative.T10, 
             title="Originality Proportions Over Time", labels={'value':'Proportion of Movies Produced'})
fig_simp_freq_by_year.update_traces(hovertemplate=hovertemplate)
fig_simp_freq_by_year.show()

## Branch Investment

In [None]:
branch_df = df.drop(columns='i')
#branch_df = branch_df[branch_df['Original?'].notna()]
branch_df = branch_df[branch_df['Branch'].isin(['Standard', 'Pixar', 'Lucasfilm', 'Marvel', 'Touchstone', 'Hollywood Pictures'])]

branch_by_year = branch_df.groupby(['Year','Branch']).size().reset_index(name='count')
branch_by_year = branch_by_year.pivot(index='Year', columns='Branch', values='count')

branch_order = ['Standard', 'Pixar', 'Marvel', 'Lucasfilm', 'Touchstone', 'Hollywood Pictures']
branch_by_year = branch_by_year[branch_order]

In [None]:
fig_branch_by_year = px.bar(branch_by_year, title="Movies Produced Per Label Over Time", labels={'value':"Movie Count"})
fig_branch_by_year.show()

In [None]:
# Export charts #

#fig_simp_freq_by_year.write_html("simp_freq_chart.html")
# fig_originality_by_year.write_html("origin_chart.html")
# fig_simp_originality_by_year.write_html("simp_origin_chart.html")
# fig_branch_by_year.write_html("branch_chart.html")