Student Name: Gerald Wanjala
Student Pace: DSF-FT12-Hybrid
Instructor Name: Samuel Karu

# Data Science Project: Analyzing Box Office Film Success

##  Business Understanding

### Objective
To help a new movie studio make informed decisions about the types of movies to produce in order to maximize box office returns.

### Business Problem
The company needs insights into what characteristics contribute most to a film's box office success. By analyzing historical data on box office earnings and production budgets, we aim to identify patterns that can guide production and investment strategies.

### Main Questions
- Do higher production budgets result in higher box office returns?
- Which genres or studios consistently generate high grossing films?
- Is there a correlation between budget and return on investment (ROI)?

### Hypothesis
> Movies with higher production budgets tend to earn more at the box office and achieve a better ROI.


##  Data Understanding

We will use the following datasets:
- `bom.movie_gross.csv.gz` — Domestic and foreign gross by movie and studio.
- `tn.movie_budgets.csv.gz` — Production budgets and box office data by movie.


In [1]:
import pandas as pd
import sqlite3

# Load BOM dataset
bom_df = pd.read_csv('../zipped-data/bom.movie_gross.csv.gz')
print(bom_df.head())

# Connect to IMDB SQLite database
conn = sqlite3.connect('../zipped-data/im.db')

# Load key tables
movie_basics = pd.read_sql_query("SELECT * FROM movie_basics", conn)
movie_ratings = pd.read_sql_query("SELECT * FROM movie_ratings", conn)
directors = pd.read_sql_query("SELECT * FROM directors", conn)
writers = pd.read_sql_query("SELECT * FROM writers", conn)
persons = pd.read_sql_query("SELECT * FROM persons", conn)

# Preview basics
print(movie_basics.head())
print(movie_ratings.head())


                                         title studio  domestic_gross  \
0                                  Toy Story 3     BV     415000000.0   
1                   Alice in Wonderland (2010)     BV     334200000.0   
2  Harry Potter and the Deathly Hallows Part 1     WB     296000000.0   
3                                    Inception     WB     292600000.0   
4                          Shrek Forever After   P/DW     238700000.0   

  foreign_gross  year  
0     652000000  2010  
1     691300000  2010  
2     664300000  2010  
3     535700000  2010  
4     513900000  2010  
    movie_id                    primary_title              original_title  \
0  tt0063540                        Sunghursh                   Sunghursh   
1  tt0066787  One Day Before the Rainy Season             Ashad Ka Ek Din   
2  tt0069049       The Other Side of the Wind  The Other Side of the Wind   
3  tt0069204                  Sabse Bada Sukh             Sabse Bada Sukh   
4  tt0100275         The Wanderi