# TMDB Movie KPI Analysis & Performance Metrics

This notebook analyzes cleaned movie data to identify top performers and generate insights.

## Objectives
1. Identify best/worst performing movies by revenue, budget, profit, ROI
2. Find most voted, highest/lowest rated, and most popular movies
3. Execute advanced search queries (Sci-Fi Action + Bruce Willis, Uma + Tarantino)
4. Compare franchise vs standalone movie performance
5. Identify most successful franchises and directors

## Setup

In [None]:
# Import required libraries
import sys
import os
from pathlib import Path
import pandas as pd
import numpy as np

# Add project root to path and set working directory
project_root = Path.cwd().parent
sys.path.append(str(project_root))
os.chdir(str(project_root))

from src.analytics.kpi_calculator import *
from src.analytics.filters import *
from src.analytics.aggregators import *
from src.utils.helpers import load_config, setup_logging

# Setup logger for notebook
logger = setup_logging(module_name='kpi_notebook')
logger.info("✓ Imports successful")

## 1. Load Configuration and Data

Load the cleaned data from Step 2 (Data Cleaning & Preprocessing).

In [None]:
# Load configuration
config = load_config('config/config.yaml')
processed_path = Path(config['paths']['processed_data'])

# Load cleaned data from Step 2
df = pd.read_parquet(processed_path / 'movies_cleaned.parquet')

logger.info(f"Loaded {len(df)} movies")
logger.info(f"Columns: {list(df.columns)}")
logger.info(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")

# Display first few rows
df.head()

## 2. Best/Worst Performing Movies - Revenue

Identify movies with the highest and lowest revenue.

In [None]:
# Highest revenue movies
logger.info("="*60)
logger.info("TOP 10 MOVIES BY REVENUE")
logger.info("="*60)

top_revenue = get_top_by_revenue(df, top_n=10)
top_revenue

In [None]:
# Lowest revenue movies
logger.info("\n" + "="*60)
logger.info("BOTTOM 10 MOVIES BY REVENUE")
logger.info("="*60)

bottom_revenue = get_bottom_by_revenue(df, top_n=10)
bottom_revenue

## 3. Best/Worst Performing Movies - Budget

Identify movies with the highest and lowest production budgets.

In [None]:
# Highest budget movies
logger.info("="*60)
logger.info("TOP 10 MOVIES BY BUDGET")
logger.info("="*60)

top_budget = get_top_by_budget(df, top_n=10)
top_budget

In [None]:
# Lowest budget movies
logger.info("\n" + "="*60)
logger.info("BOTTOM 10 MOVIES BY BUDGET")
logger.info("="*60)

bottom_budget = get_bottom_by_budget(df, top_n=10)
bottom_budget

## 4. Best/Worst Performing Movies - Profit

Identify movies with the highest and lowest profit (Revenue - Budget).

In [None]:
# Highest profit movies
logger.info("="*60)
logger.info("TOP 10 MOVIES BY PROFIT")
logger.info("="*60)

top_profit = get_top_by_profit(df, top_n=10)
top_profit

In [None]:
# Lowest profit movies
logger.info("\n" + "="*60)
logger.info("BOTTOM 10 MOVIES BY PROFIT")
logger.info("="*60)

bottom_profit = get_bottom_by_profit(df, top_n=10)
bottom_profit

## 5. Best/Worst Performing Movies - ROI

Calculate Return on Investment (ROI) for movies with budget >= $10M.

ROI = (Revenue - Budget) / Budget × 100

In [None]:
# Highest ROI (budget >= $10M)
logger.info("="*60)
logger.info("TOP 10 MOVIES BY ROI (Budget >= $10M)")
logger.info("="*60)

top_roi = get_top_by_roi(df, top_n=10)
top_roi

In [None]:
# Lowest ROI (budget >= $10M)
logger.info("\n" + "="*60)
logger.info("BOTTOM 10 MOVIES BY ROI (Budget >= $10M)")
logger.info("="*60)

bottom_roi = get_bottom_by_roi(df, top_n=10)
bottom_roi

## 6. Most Voted and Most Popular Movies

Identify movies with the highest number of votes and highest popularity scores.

In [None]:
# Most voted movies
logger.info("="*60)
logger.info("TOP 10 MOST VOTED MOVIES")
logger.info("="*60)

most_voted = get_most_voted(df, top_n=10)
most_voted

In [None]:
# Most popular movies
logger.info("\n" + "="*60)
logger.info("TOP 10 MOST POPULAR MOVIES")
logger.info("="*60)

most_popular = get_most_popular(df, top_n=10)
most_popular

## 7. Highest/Lowest Rated Movies

Filter to movies with at least 10 votes to ensure rating reliability.

In [None]:
# Highest rated movies (vote_count >= 10)
logger.info("="*60)
logger.info("TOP 10 HIGHEST RATED MOVIES (Votes >= 10)")
logger.info("="*60)

top_rated = get_top_rated(df, top_n=10)
top_rated

In [None]:
# Lowest rated movies (vote_count >= 10)
logger.info("\n" + "="*60)
logger.info("BOTTOM 10 LOWEST RATED MOVIES (Votes >= 10)")
logger.info("="*60)

bottom_rated = get_bottom_rated(df, top_n=10)
bottom_rated

## 8. Advanced Movie Search Queries

Execute complex multi-criteria searches.

### Search 1: Best-Rated Science Fiction Action Movies with Bruce Willis

Find Sci-Fi Action movies starring Bruce Willis, sorted by rating (highest to lowest).

In [None]:
# Search 1: Bruce Willis in Sci-Fi Action
logger.info("="*60)
logger.info("SEARCH 1: Sci-Fi Action Movies with Bruce Willis")
logger.info("="*60)

search1_results = search_scifi_action_bruce_willis(df)
logger.info(f"Found {len(search1_results)} movies")
search1_results

### Search 2: Uma Thurman + Quentin Tarantino Movies

Find movies starring Uma Thurman, directed by Quentin Tarantino (sorted by runtime - shortest to longest).

In [None]:
# Search 2: Uma Thurman directed by Quentin Tarantino
logger.info("\n" + "="*60)
logger.info("SEARCH 2: Uma Thurman + Quentin Tarantino Movies")
logger.info("="*60)

search2_results = search_uma_tarantino(df)
logger.info(f"Found {len(search2_results)} movies")
search2_results

## 9. Franchise vs Standalone Movie Performance

Compare movies that belong to franchises/collections vs standalone films.

**Metrics:**
- Mean Revenue
- Median ROI
- Mean Budget
- Mean Popularity
- Mean Rating

In [None]:
# Compare franchise vs standalone
logger.info("="*60)
logger.info("FRANCHISE VS STANDALONE COMPARISON")
logger.info("="*60)

comparison = compare_franchise_vs_standalone(df)
comparison

## 10. Most Successful Franchises

Identify the most successful movie franchises based on various metrics.

In [None]:
# Top franchises by total revenue
logger.info("="*60)
logger.info("TOP 10 FRANCHISES BY TOTAL REVENUE")
logger.info("="*60)

top_franchises_revenue = get_top_franchises(df, sort_by='total_revenue', top_n=10)
top_franchises_revenue

In [None]:
# Top franchises by mean rating
logger.info("\n" + "="*60)
logger.info("TOP 10 FRANCHISES BY MEAN RATING")
logger.info("="*60)

top_franchises_rating = get_top_franchises(df, sort_by='mean_rating', top_n=10)
top_franchises_rating

In [None]:
# Top franchises by movie count
logger.info("\n" + "="*60)
logger.info("TOP 10 FRANCHISES BY MOVIE COUNT")
logger.info("="*60)

top_franchises_count = get_top_franchises(df, sort_by='movie_count', top_n=10)
top_franchises_count

## 11. Most Successful Directors

Identify the most successful directors based on various metrics.

In [None]:
# Top directors by total revenue
logger.info("="*60)
logger.info("TOP 10 DIRECTORS BY TOTAL REVENUE")
logger.info("="*60)

top_directors_revenue = get_top_directors(df, sort_by='total_revenue', top_n=10)
top_directors_revenue

In [None]:
# Top directors by mean rating
logger.info("\n" + "="*60)
logger.info("TOP 10 DIRECTORS BY MEAN RATING")
logger.info("="*60)

top_directors_rating = get_top_directors(df, sort_by='mean_rating', top_n=10)
top_directors_rating

In [None]:
# Top directors by movie count
logger.info("\n" + "="*60)
logger.info("TOP 10 DIRECTORS BY MOVIE COUNT")
logger.info("="*60)

top_directors_count = get_top_directors(df, sort_by='movie_count', top_n=10)
top_directors_count

## Summary

KPI analysis completed successfully! Key findings:

### Performed Analyses:
1. ✓ Identified best/worst movies by revenue, budget, profit, and ROI
2. ✓ Found most voted and most popular movies
3. ✓ Analyzed highest/lowest rated movies (vote_count >= 10)
4. ✓ Executed advanced search queries for specific actor/director/genre combinations
5. ✓ Compared franchise vs standalone movie performance
6. ✓ Ranked most successful franchises by revenue, rating, and size
7. ✓ Ranked most successful directors by revenue, rating, and output

