# Milestone 2: Project proposal and initial analysis

<div style="border: 2px solid #4CAF50; border-radius: 10px; padding: 15px; background-color: #E8F5E9; color: #333;">
    <strong>Project:</strong> Decoding Box-Office Bombs 💣
    <br>
    <strong>Team:</strong> ADAdventurers2024
</div>

To replicate our dataset, please download the data indicated in the [README](https://github.com/epfl-ada/ada-2024-project-adaventurers2024/blob/main/README.md) file. Then, navigate to the `scripts` folder and run the following script:

```cmd
python preprocess_data.py
```

This script will generate five files in the data folder:

- `cmu_tmdb.csv`: A merged dataset from CMU and TMDB, containing movie information such as revenue, budget, and other details.
- `movie_tropes.csv`: Tropes associated with each movie in the IMDb dataset, which serves as an intermediary file for merging tropes with the CMU dataset.
- `cmu_tropes.csv`: Tropes associated with each movie in the CMU dataset.
- `movie_actors.csv`: Actors linked to each movie in the CMU dataset.
- `movie_directors_actors.csv`: Directors and actors linked to each movie in the IMDb dataset.

You can now proceed with exploratory data analysis and initial assessments.

------

## Exploratory data analysis

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
DATA_PATH = 'data'

#### CMU revenue and other metrics

In [None]:
df_cmu_tmdb = pd.read_csv(f'{DATA_PATH}/cmu_tmdb.csv')
df_cmu_tmdb.head()

In [None]:
df_cmu_tmdb.info()

#### CMU cast and crew

In [None]:
df_movie_actors = pd.read_csv(f'{DATA_PATH}/movie_actors.csv')
df_movie_actors.head()

In [None]:
df_movie_actors.info()

In [None]:
df_movie_directors_actors = pd.read_csv(f'{DATA_PATH}/movie_directors_actors.csv')
df_movie_directors_actors.head()

In [None]:
df_movie_directors_actors.info()

#### CMU tropes

In [None]:
df_cmu_tropes = pd.read_csv(f"{DATA_PATH}/cmu_tropes.csv")
df_cmu_tropes.head()

In [None]:
df_cmu_tropes.info()

--------

## Research questions

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import ast
import warnings
warnings.filterwarnings("ignore")

# Set visualization style
%matplotlib inline
sns.set(style='whitegrid', palette='muted', font_scale=1.2)

<div style="border: 2px solid #4CAF50; border-radius: 10px; padding: 15px; background-color: #E8F5E9; color: #333;">
    <strong>📊 Metrics & Performance
</strong> 
</div>


1. What metrics (e.g., low ratings, limited number of ratings, revenue vs budget) best indicate movie failure?


In [None]:
## Code
from src.utils.metric_analysis import *

metric_analysis('data/cmu_tmdb.csv')

1.1 What we have done for the initial analysis:


1.2 Key observations:


<div style="border: 2px solid #4CAF50; border-radius: 10px; padding: 15px; background-color: #E8F5E9; color: #333;">
    <strong>👥 Cast & Crew Analysis</strong> 
</div>


2. How do actor demographics and lack of diversity impact audience disengagement and contribute to box office underperformance?

In [None]:
## code
from src.utils.actor_analysis import *

actor_analysis('data/movie_actors.csv', 'data/wikidata_ethnicities.csv')

2.1 What we have done for the initial analysis:


2.2 Key observations:

3. Is thematic consistency in director filmographies a predictor of movie failure?

In [None]:
## code
from src.utils.director_analysis import *

director_analysis('data/movie_directors_actors.csv')

3.1 What we have done for the initial analysis:


3.2 Key observations:

<div style="border: 2px solid #4CAF50; border-radius: 10px; padding: 15px; background-color: #E8F5E9; color: #333;">
    <strong>🎬 Genre & Market Factors</strong> 
</div>

4. How does genre choice influence a movie's failure, particularly in different cultural contexts?

In [None]:
## code

4.1 What we have done for the initial analysis:


4.2 Key observations:

5. How does poor release timing (e.g., season, holiday periods) affect a movie's likelihood of failing?

In [None]:
## code

5.1 What we have done for the initial analysis:


5.2 Key observations:

<div style="border: 2px solid #4CAF50; border-radius: 10px; padding: 15px; background-color: #E8F5E9; color: #333;">
    <strong>📖 Narrative & Thematic Elements</strong> 
</div>

6. Which tropes consistently lead to negative reception by genre?

In [None]:
## code

6.1 What we have done for the initial analysis:


6.2 Key observations:

7. What recurring plot patterns appear most frequently in critically panned films?

In [None]:
# code

7.1 What we have done for the initial analysis:


7.2 Key observations: