![example](images/director_shot.jpeg)

# Project Title

**Authors:** Nancy Ho
***

## Overview

A one-paragraph overview of the project, including the business problem, data, methods, results and recommendations.

## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

- As we enter the digital age, more and more companies are starting to venture into movie production. To remain on par with their competitors, it may be necessary for a tech giant like Microsoft to also consider movie production. Using data about successful movies from many sources, I describe attributes of successful movies in order to give corporate heads and investors an idea as to what kind of movies Microsoft should produce to ensure success.
- I believe there are two main factors that contribute to a movie's success: the profit it makes and its ratings from critics and moviewatchers.

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***

In [2]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3

%matplotlib inline

In [3]:
# Here you run your code to explore the data
conn = sqlite3.connect('data\movies.db')
cur = conn.cursor()

In [4]:
cur.execute("""SELECT * 
               FROM bom_movie_gross
               ORDER BY domestic_gross DESC;""")
test_df = pd.DataFrame(cur.fetchall())
test_df.columns = [x[0] for x in cur.description]
test_df.head(5)

Unnamed: 0,idx,title,studio,domestic_gross,foreign_gross,year
0,1872,Star Wars: The Force Awakens,BV,936700000.0,1131.6,2015
1,3080,Black Panther,BV,700100000.0,646900000.0,2018
2,3079,Avengers: Infinity War,BV,678800000.0,1369.5,2018
3,1873,Jurassic World,Uni.,652300000.0,1019.4,2015
4,727,Marvel's The Avengers,BV,623400000.0,895500000.0,2012


## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [6]:
# Here you run your code to clean the data

First joining `tn_movie_budgets` with `imdb_movie_basics`

In [6]:
cur.execute("""SELECT b.release_date, b.movie, b.production_budget, b.domestic_gross, b.worldwide_gross, m.genres 
               FROM tn_movie_budgets b
               JOIN imdb_title_basics m
               ON b.movie = m.primary_title
               GROUP BY movie
               ORDER BY worldwide_gross DESC;""")
gross_df = pd.DataFrame(cur.fetchall())
gross_df.columns = [x[0] for x in cur.description]
gross_df.head(20)

Unnamed: 0,release_date,movie,production_budget,domestic_gross,worldwide_gross,genres
0,"Sep 30, 2005",Duma,"$12,000,000","$870,067","$994,790","Biography,Crime,Documentary"
1,"Apr 1, 2011",Insidious,"$1,500,000","$54,009,150","$99,870,886","Horror,Mystery,Thriller"
2,"Apr 2, 2004",Hellboy,"$60,000,000","$59,623,958","$99,823,958","Action,Adventure,Fantasy"
3,"Aug 17, 2018",Alpha,"$51,000,000","$35,851,379","$99,624,873","Adventure,Drama,Family"
4,"Nov 21, 2007",Hitman,"$24,000,000","$39,687,694","$99,135,571",Action
5,"Feb 11, 2011",Justin Bieber: Never Say Never,"$13,000,000","$73,013,910","$99,034,125","Documentary,Music"
6,"Apr 14, 2006",The Wild,"$80,000,000","$37,384,046","$99,010,667",Documentary
7,"Jun 15, 1994",The Lion King,"$79,300,000","$421,785,283","$986,214,868","Adventure,Animation,Drama"
8,"Nov 22, 2013",Philomena,"$12,000,000","$37,709,979","$98,963,392","Biography,Comedy,Drama"
9,"Sep 18, 2015",Black Mass,"$53,000,000","$62,575,678","$98,837,872","Biography,Crime,Drama"


In [7]:
gross_df.shape

(2312, 6)

Creating IMDB ratings data frame

In [10]:
cur.execute("""SELECT *
               FROM imdb_title_basics m
               JOIN imdb_title_ratings r
               USING(tconst);""")
ratings_df = pd.DataFrame(cur.fetchall())
ratings_df.columns = [x[0] for x in cur.description]
ratings_df.head(10)

Unnamed: 0,idx,tconst,primary_title,original_title,start_year,runtime_minutes,genres,idx.1,averagerating,numvotes
0,0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama",36049,7.0,77
1,1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama",725,7.2,43
2,2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama,18429,6.9,4517
3,3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama",2223,6.1,13
4,4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy",1143,6.5,119
5,6,tt0112502,Bigfoot,Bigfoot,2017,,"Horror,Thriller",55676,4.1,32
6,7,tt0137204,Joe Finds Grace,Joe Finds Grace,2017,83.0,"Adventure,Animation,Comedy",30166,8.1,263
7,10,tt0146592,Pál Adrienn,Pál Adrienn,2010,136.0,Drama,55677,6.8,451
8,11,tt0154039,So Much for Justice!,Oda az igazság,2010,100.0,History,28904,4.6,64
9,12,tt0159369,Cooper and Hemingway: The True Gen,Cooper and Hemingway: The True Gen,2013,180.0,Documentary,68822,7.6,53


Then counting average ratings by genre

In [27]:
ratings_df.loc[ratings_df.genres == 'Action']

Unnamed: 0,idx,tconst,primary_title,original_title,start_year,runtime_minutes,genres,idx.1,averagerating,numvotes
49,56,tt0364201,Aman Ke Farishtey,Aman Ke Farishtey,2016,137.0,Action,2225,6.4,16
96,111,tt0439801,Segurança Nacional,Segurança Nacional,2010,86.0,Action,7455,3.3,260
263,294,tt0810815,Cross the Line,Cross the Line,2010,87.0,Action,66318,3.9,39
319,350,tt0846004,Gangster Exchange,Gangster Exchange,2010,95.0,Action,9826,4.6,436
424,461,tt0929742,Deep Gold 3D,Deep Gold,2011,86.0,Action,48227,3.3,315
...,...,...,...,...,...,...,...,...,...,...
73624,144820,tt9724114,Kamen Rider Build New World: Kamen Rider Cross-Z,Kamen Raidâ Birudo Nyû Warudo Kamen Raidâ Kurôzu,2019,60.0,Action,19623,6.5,6
73638,144911,tt9737984,Striker,Striker,2019,122.0,Action,45634,8.0,21
73664,145028,tt9760512,D/O Parvathamma,D/O Parvathamma,2019,,Action,65037,9.6,427
73730,145415,tt9815714,The Hard Way,The Hard Way,2019,92.0,Action,62118,4.7,1214


Creating data frame for Rotten Tomatoes data

In [23]:
cur.execute("""SELECT movie_title, genres, tomatometer_status, tomatometer_rating, tomatometer_count, audience_status, audience_rating, audience_count
               FROM rotten_tomatoes_movies;""")
rotten_tomatoes_df = pd.DataFrame(cur.fetchall())
rotten_tomatoes_df.columns = [x[0] for x in cur.description]
rotten_tomatoes_df.head(5)

Unnamed: 0,movie_title,genres,tomatometer_status,tomatometer_rating,tomatometer_count,audience_status,audience_rating,audience_count
0,Percy Jackson & the Olympians: The Lightning T...,"Action & Adventure, Comedy, Drama, Science Fic...",Rotten,49.0,149.0,Spilled,53.0,254421.0
1,Please Give,Comedy,Certified-Fresh,87.0,142.0,Upright,64.0,11574.0
2,10,"Comedy, Romance",Fresh,67.0,24.0,Spilled,53.0,14684.0
3,12 Angry Men (Twelve Angry Men),"Classics, Drama",Certified-Fresh,100.0,54.0,Upright,97.0,105386.0
4,"20,000 Leagues Under The Sea","Action & Adventure, Drama, Kids & Family",Fresh,89.0,27.0,Upright,74.0,68918.0


In [26]:
rotten_tomatoes_df.loc[rotten_tomatoes_df.genres == 'Action & Adventure']

Unnamed: 0,movie_title,genres,tomatometer_status,tomatometer_rating,tomatometer_count,audience_status,audience_rating,audience_count
197,Bitch Slap,Action & Adventure,Rotten,29.0,17.0,Spilled,29.0,10344.0
336,Breakout,Action & Adventure,Rotten,40.0,10.0,Spilled,41.0,1949.0
380,Commando,Action & Adventure,Fresh,71.0,34.0,Upright,67.0,138190.0
604,Missing in Action,Action & Adventure,Rotten,19.0,16.0,Spilled,42.0,12330.0
631,Nighthawks,Action & Adventure,Fresh,70.0,20.0,Spilled,54.0,7370.0
...,...,...,...,...,...,...,...,...
17237,When Time Ran Out,Action & Adventure,Rotten,0.0,6.0,Spilled,20.0,273.0
17468,Wolf Totem,Action & Adventure,Fresh,67.0,30.0,Spilled,50.0,496.0
17474,Wolves,Action & Adventure,Rotten,25.0,20.0,Spilled,32.0,677.0
17572,XXX,Action & Adventure,Rotten,48.0,180.0,Spilled,58.0,464274.0


## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to model the data


Average ratings by genre

## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***

- You may want to start by offering positions to up and coming directors and scriptwriters, preferrably those who had experience working on successful movies before, and gain leverage in the movie production process.
- Something to take into consideration is that for some movies, a large part of their success can be attributed to expectations tied to many of its factors, such as it being part of a series, or having certain A-list actors. If you want to make a footprint in the movie industry, be sure you keep those limitations in mind as you try to work your way up.