# Movie Studio Market Analysis

## ***Introduction***

## Stakeholder
Head of the New Movie Studio

## Problem Statement
Our company is launching a new movie studio and needs to decide what types of films to produce in order to maximize box office success. However, the company currently lacks historical knowledge about which movie characteristics lead to strong financial performance.

## BUSINESS UNDERSTANDING

***KEY QUESTIONS FOR MOVIE STUDIO STRATEGY***

1. **Which movie genres generate the highest revenue?**  
   By analyzing the total earnings from different genres, we can determine which types of films are more profitable and more likely to attract large audiences.

2. **Does movie rating or critic score affect performance?**  
   Using ratings from Rotten Tomatoes and IMDb, we can measure whether higher-rated movies tend to earn more revenue.

3. **Does a longer runtime affect ratings or revenue?**  
   By analyzing movie runtimes, we can identify whether certain runtimes generate higher earnings or better ratings across different genres.

4. **Does release timing affect earnings?**  
   By examining release months and their associated revenues, we can identify the optimal times of the year to launch films for maximum success.



In [8]:
import sqlite3
import pandas as pd
import zipfile
import os
# Extracting Data For IMDB and Checking of Size(Loading of Data)
zip_path = r"C:\Users\HP\Documents\Project_Phase2\Movie-Studio-EDA-Project\zippedData\im.db.zip"
extract_path = r"C:\Users\HP\Documents\Project_Phase2\Movie-Studio-EDA-Project\zippedData"

with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall(extract_path)

print(os.path.getsize(
    r"C:\Users\HP\Documents\Project_Phase2\Movie-Studio-EDA-Project\zippedData\im.db"
))

#Connecting to the Database
db_path = r"C:\Users\HP\Documents\Project_Phase2\Movie-Studio-EDA-Project\zippedData\im.db"

imdb_conn = sqlite3.connect(db_path)

# Confirm by listing tables
tables = pd.read_sql(
    "SELECT name FROM sqlite_master WHERE type='table';",
    imdb_conn
)

tables

    

169443328


Unnamed: 0,name
0,movie_basics
1,directors
2,known_for
3,movie_akas
4,movie_ratings
5,persons
6,principals
7,writers


In [2]:
#Inspect Columns in Table movie_basics
pd.read_sql("PRAGMA table_info(movie_basics);", imdb_conn)



Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,movie_id,TEXT,0,,0
1,1,primary_title,TEXT,0,,0
2,2,original_title,TEXT,0,,0
3,3,start_year,INTEGER,0,,0
4,4,runtime_minutes,REAL,0,,0
5,5,genres,TEXT,0,,0


In [3]:
#Load Box Office Dataset
boxoffice_df = pd.read_csv(
    r"C:/Users/HP/Documents/Project_Phase2/Movie-Studio-EDA-Project/zippedData/bom.movie_gross.csv.gz"
)
boxoffice_df.head()
boxoffice_df.columns



Index(['title', 'studio', 'domestic_gross', 'foreign_gross', 'year'], dtype='object')

In [4]:
#Load Data into Pandas Dataframe
imdb_movies = pd.read_sql(
    """
    SELECT primary_title, start_year
    FROM movie_basics
    """,
    imdb_conn
)

imdb_movies.head()


Unnamed: 0,primary_title,start_year
0,Sunghursh,2013
1,One Day Before the Rainy Season,2019
2,The Other Side of the Wind,2018
3,Sabse Bada Sukh,2018
4,The Wandering Soap Opera,2017


In [5]:
# Perform the Join on (primary_title,start_year on IMDB Data and (title,year on Box Office Data))
merged_df = imdb_movies.merge(
    boxoffice_df,
    left_on=["primary_title", "start_year"],
    right_on=["title", "year"],
    how="inner"
)

merged_df.head()


Unnamed: 0,primary_title,start_year,title,studio,domestic_gross,foreign_gross,year
0,Wazir,2016,Wazir,Relbig.,1100000.0,,2016
1,On the Road,2012,On the Road,IFC,744000.0,8000000.0,2012
2,The Secret Life of Walter Mitty,2013,The Secret Life of Walter Mitty,Fox,58200000.0,129900000.0,2013
3,A Walk Among the Tombstones,2014,A Walk Among the Tombstones,Uni.,26300000.0,26900000.0,2014
4,Jurassic World,2015,Jurassic World,Uni.,652300000.0,1019.4,2015


In [7]:
# Validate Join
print("IMDb rows:", imdb_movies.shape[0])
print("Box Office rows:", boxoffice_df.shape[0])
print("Merged rows:", merged_df.shape[0])


IMDb rows: 146144
Box Office rows: 3387
Merged rows: 1873
