![example](images/director_shot.jpeg)

# Movie Analysis

**Authors:** Justin Hue
***

## Overview

A one-paragraph overview of the project, including the business problem, data, methods, results and recommendations.

## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***

In [3]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
! ls

DS_Project_Presentation.pdf
Movie-Analysis.ipynb
README.md
TEMPLATE_README.md
data
images


In [24]:
df1 = pd.read_csv('data/zippedData/imdb.title.basics.csv.gz', index_col=0, header=0, encoding='latin-1')
df2 = pd.read_csv('data/zippedData/imdb.title.ratings.csv.gz', header=0, encoding='latin-1')
df3 = pd.read_csv('data/zippedData/bom.movie_gross.csv.gz', header=0, encoding='latin-1')
df2

Unnamed: 0,tconst,averagerating,numvotes
0,tt10356526,8.3,31
1,tt10384606,8.9,559
2,tt1042974,6.4,20
3,tt1043726,4.2,50352
4,tt1060240,6.5,21
...,...,...,...
73851,tt9805820,8.1,25
73852,tt9844256,7.5,24
73853,tt9851050,4.7,14
73854,tt9886934,7.0,5


In [29]:
joined_df = df1.join(df2, how='left')
joined_df
joined_df[joined_df.duplicated(keep=False)].sort_values(by='averagerating')
joined_df = joined_df[joined_df.duplicated()]
with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    display(joined_df)

Unnamed: 0_level_0,primary_title,original_title,start_year,runtime_minutes,genres,tconst,averagerating,numvotes
tconst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
tt10064536,Untitled Disney Marvel Film,Untitled Disney Marvel Film,2022,,Action,,,
tt10064558,Untitled Marvel Film,Untitled Marvel Film,2021,,Action,,,
tt10127292,Plushtubers: The Apocalypse,Plushtubers: The Apocalypse,2019,,"Action,Adventure",,,
tt10224422,Olanda,Olanda,2019,154.0,Documentary,,,
tt10230042,Rok Sako To Rok Lo,Rok Sako To Rok Lo,2018,,Comedy,,,
tt10230622,Aitebaar,Aitebaar,2017,80.0,Comedy,,,
tt10230624,Huway Hum Jin Kay Liye Barbaad,Huway Hum Jin Kay Liye Barbaad,2017,,Comedy,,,
tt10268532,Sapo: Live at the Avalon... Ritmo del Corazon,Sapo: Live at the Avalon... Ritmo del Corazon,2019,,Music,,,
tt10275936,RaggarjÃ¤vlar (Swedish Greasers),RaggarjÃ¤vlar (Swedish Greasers),2019,70.0,Documentary,,,
tt10294034,Cinema of Sleep,Cinema of Sleep,2020,,Thriller,,,


## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [6]:
# Here you run your code to clean the data

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to model the data


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***