## Business Problem
Your company now sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies. You are charged with exploring what types of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the head of your company's new movie studio can use to help decide what type of films to create.

## Data
The data used was obtained from the links below: 

* [Box Office Mojo](https://www.boxofficemojo.com/)
* [IMDB](https://www.imdb.com/)
* [Rotten Tomatoes](https://www.rottentomatoes.com/)
* [TheMovieDB](https://www.themoviedb.org/)
* [The Numbers](https://www.the-numbers.com/)

![movie data erd](movie_data_erd.jpeg)

### Goal
The goal is to investigate movie performance to identify patterns, genres, or attributes of successful films.

### Objectives:
1.	Analyze the Movie Data:
    * Identify characteristics of high-performing movies, in terms of genre, budget, runtime, release date, or lead actors/directors.
    * Look for trends in movie popularity and revenue growth over recent years.
2.	Understand Market Demand:
    * Examine the types of movies audiences prefer i.e. genre
    * Identify seasonal trends like during holiday releases, summer etc. to determine the best time for movie releases.
3.	Identify Key Success Factors:
    * Explore factors like movie rating, production budget, etc.
    * Evaluate the impact of sequels or franchise films on box office revenue.
4.	Translate Findings into Recommendations:
    * Provide specific, actionable insights for the company to make data-informed decisions on the types of movies to create.
    * Recommend strategies around genre, budget, target demographics, and release timing


In [1]:
# Import the necessary files
import numpy as np
import pandas as pd
import sqlite3
import zipfile
import os

# Import Libraries for plotting    # Visualizations
import matplotlib.pyplot as plt
import seaborn as sns

Load Relevant Data

In [None]:
# Extract the zipped files
# zip_path = 'im.db.zip' #the zipped database file path
extracted_db_path = 'im.db'  # create a path where extracted database will be saved
# extracted_db_path

# # Extract the .db file from the zip archive
# with zipfile.ZipFile(zip_path, 'r') as zip_ref:
#     zip_ref.extractall(os.path.dirname(extracted_db_path))  # Extracts in the same directory as zip file

# Connect to the extracted SQLite database
conn = sqlite3.connect(extracted_db_path)
cursor = conn.cursor()

Inspect the Data

In [None]:
# Get the list of tables in the database
pd.read_sql("SELECT name FROM sqlite_master WHERE type='table';", conn)

In [None]:
pd.read_sql("SELECT * FROM movie_basics",conn).columns
pd.read_sql("SELECT * FROM movie_ratings",conn).columns
pd.read_sql("SELECT * FROM directors",conn).columns

In [None]:
movie_basics_data = pd.DataFrame(pd.read_sql("SELECT * FROM movie_basics",conn))
movie_basics_data.head()

Check for Null values

In [None]:
#Show null values
movie_basics_data['original_title'].isnull().sum() #21 missing records
movie_basics_data['runtime_minutes'].isnull().sum() #31739 missing records
movie_basics_data['original_title'].isnull().sum() #21 missing records

## EDA
### Univariate Analysis
Revenue, Budget, Genres, Original Language,Runtime, etc.

### Bivariate Analysis
Budget vs revenue, Genre vs revenue, production cost vs revenue, etc

## Linear Regression/ ML