# TMDB Movie Data Analysis

This notebook documents the process of fetching, cleaning, and analyzing movie data from the TMDB API. We utilize helper functions defined in the `src` directory to maintain a clean and modular codebase.

## Objectives
1. **Data Extraction**: Fetch specific movies from TMDB.
2. **Data Cleaning**: Process the raw JSON data into a structured DataFrame.
3. **KPI Implementation**: Calculate and rank movies by various metrics.
4. **Advanced Analysis**: Analyze franchises, directors, and specific queries.
5. **Visualization**: Visualize trends and comparisons.

In [None]:
import sys
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image, display

# Add src directory to path to import modules
sys.path.append(os.path.abspath(os.path.join('..', 'src')))

from analysis import load_processed_data, analyze_movies, plot_data

## 1. Load Processed Data

The data has been fetched using `src/fetch_data.py` and cleaned using `src/process_data.py`.

In [None]:
df = load_processed_data("../data/processed/movies_cleaned.csv")
display(df.head())

## 2. Analysis and Rankings

We perform comprehensive analysis including rankings, advanced filtering, and franchise analysis.

In [None]:
franchise_stats, franchise_df, director_df = analyze_movies(df)

## 3. Visualizations

Below are the generated plots visualizing key insights.

### Revenue vs Budget

In [None]:
display(Image(filename='../data/processed/revenue_vs_budget.png'))

### ROI Distribution by Genre

In [None]:
display(Image(filename='../data/processed/roi_by_genre.png'))

### Popularity vs Rating

In [None]:
display(Image(filename='../data/processed/popularity_vs_rating.png'))

### Franchise vs Standalone Comparison

In [None]:
display(Image(filename='../data/processed/franchise_vs_standalone.png'))

### Yearly Trends in Box Office Revenue

In [None]:
display(Image(filename='../data/processed/yearly_trends.png'))