- As we enter the digital age, more and more companies are starting to venture into movie production. To remain on par with their competitors, it may be necessary for a tech giant like Microsoft to also consider movie production. Using data about successful movies from many sources, I describe attributes of successful movies in order to give corporate heads and investors an idea as to what kind of movies Microsoft should produce to ensure success.
- I believe there are two main factors that contribute to a movie's success: the profit it makes and its ratings from critics and moviewatchers.

## Data Understanding

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3

%matplotlib inline

In [2]:
from src.make_db import create_movies_db
create_movies_db()

imdb_title_principals table created successfully....
imdb_name_basic table created successfully....
imdb_title_crew table created successfully....
imdb_title_ratings table created successfully....
imdb_title_basics table created successfully....
imdb_title_akas table created successfully....
tn_movie_budgets table created successfully....
tmdb_movies table created successfully....
bom_movie_gross table created successfully....
rotten_tomatoes_critic_reviews table created successfully....
rotten_tomatoes_movies table created successfully....
Inserting data into the imdb_title_crew table....
Inserting data into the tmdb_movies table....
Inserting data into the imdb_title_akas table....
Inserting data into the imdb_title_ratings table....
Inserting data into the imdb_name_basics table....
Inserting data into the rotten_tomatoes_movies table....
Inserting data into the rotten_tomatoes_critic_reviews table....
Inserting data into the imdb_title_basics table....
Inserting data into the tn_mo

In [3]:
conn = sqlite3.connect('data\movies.db')
cur = conn.cursor()

In [4]:
cur.execute("""SELECT * 
               FROM bom_movie_gross
               ORDER BY domestic_gross DESC;""")
test_df = pd.DataFrame(cur.fetchall())
test_df.columns = [x[0] for x in cur.description]
test_df.head(5)

Unnamed: 0,idx,title,studio,domestic_gross,foreign_gross,year
0,1872,Star Wars: The Force Awakens,BV,936700000.0,1131.6,2015
1,3080,Black Panther,BV,700100000.0,646900000.0,2018
2,3079,Avengers: Infinity War,BV,678800000.0,1369.5,2018
3,1873,Jurassic World,Uni.,652300000.0,1019.4,2015
4,727,Marvel's The Avengers,BV,623400000.0,895500000.0,2012


## Data Preparation

Joining `tn_movie_budgets` and `imdb_movie_basics`

In [5]:
cur.execute("""SELECT b.release_date, b.movie, b.production_budget, b.domestic_gross, b.worldwide_gross, m.genres 
               FROM tn_movie_budgets b
               JOIN imdb_title_basics m
               ON b.movie = m.primary_title
               ORDER BY b.movie ASC;""")
gross_df = pd.DataFrame(cur.fetchall())
gross_df.columns = [x[0] for x in cur.description]
gross_df

Unnamed: 0,release_date,movie,production_budget,domestic_gross,worldwide_gross,genres
0,"Nov 20, 2015",#Horror,"$1,500,000",$0,$0,"Crime,Drama,Horror"
1,"Mar 11, 2016",10 Cloverfield Lane,"$5,000,000","$72,082,999","$108,286,422","Drama,Horror,Mystery"
2,"Nov 11, 2015",10 Days in a Madhouse,"$12,000,000","$14,616","$14,616",Drama
3,"Mar 27, 2009",12 Rounds,"$20,000,000","$12,234,694","$17,306,648","Action,Drama,Romance"
4,"Jan 19, 2018",12 Strong,"$35,000,000","$45,819,713","$71,118,378","Action,Drama,History"
...,...,...,...,...,...,...
3810,"Aug 11, 2006",Zoom,"$35,000,000","$11,989,328","$12,506,188","Crime,Thriller"
3811,"Aug 11, 2006",Zoom,"$35,000,000","$11,989,328","$12,506,188",Horror
3812,"Mar 4, 2016",Zootopia,"$150,000,000","$341,268,248","$1,019,429,616","Adventure,Animation,Comedy"
3813,"Dec 31, 2013",Zulu,"$16,000,000",$0,"$1,844,228","Crime,Drama,Thriller"


Creating IMDB ratings data frame

In [6]:
cur.execute("""SELECT m.primary_title, r.averagerating, r.numvotes, m.genres
               FROM imdb_title_basics m
               JOIN imdb_title_ratings r
               USING(tconst);""")
ratings_df = pd.DataFrame(cur.fetchall())
ratings_df.columns = [x[0] for x in cur.description]
ratings_df

Unnamed: 0,primary_title,averagerating,numvotes,genres
0,Sunghursh,7.0,77,"Action,Crime,Drama"
1,One Day Before the Rainy Season,7.2,43,"Biography,Drama"
2,The Other Side of the Wind,6.9,4517,Drama
3,Sabse Bada Sukh,6.1,13,"Comedy,Drama"
4,The Wandering Soap Opera,6.5,119,"Comedy,Drama,Fantasy"
...,...,...,...,...
73851,Diabolik sono io,6.2,6,Documentary
73852,Sokagin Çocuklari,8.7,136,"Drama,Family"
73853,Albatross,8.5,8,Documentary
73854,La vida sense la Sara Amat,6.6,5,


Creating data frame for Rotten Tomatoes data

In [7]:
cur.execute("""SELECT movie_title, genres, tomatometer_status, tomatometer_rating, tomatometer_count, audience_status, audience_rating, audience_count
               FROM rotten_tomatoes_movies;""")
rotten_tomatoes_df = pd.DataFrame(cur.fetchall())
rotten_tomatoes_df.columns = [x[0] for x in cur.description]
rotten_tomatoes_df.head(5)

Unnamed: 0,movie_title,genres,tomatometer_status,tomatometer_rating,tomatometer_count,audience_status,audience_rating,audience_count
0,Percy Jackson & the Olympians: The Lightning T...,"Action & Adventure, Comedy, Drama, Science Fic...",Rotten,49.0,149.0,Spilled,53.0,254421.0
1,Please Give,Comedy,Certified-Fresh,87.0,142.0,Upright,64.0,11574.0
2,10,"Comedy, Romance",Fresh,67.0,24.0,Spilled,53.0,14684.0
3,12 Angry Men (Twelve Angry Men),"Classics, Drama",Certified-Fresh,100.0,54.0,Upright,97.0,105386.0
4,"20,000 Leagues Under The Sea","Action & Adventure, Drama, Kids & Family",Fresh,89.0,27.0,Upright,74.0,68918.0


## Data Modeling

Average ratings by genre

Average gross by genre

Critics vs audience scores for each genre

Production budget vs domestic/worldwide gross

## Conclusions

- You may want to start by offering positions to up and coming directors and scriptwriters, preferrably those who had experience working on successful movies before, and gain leverage in the movie production process.
- Something to take into consideration is that for some movies, a large part of their success can be attributed to expectations tied to many of its factors, such as it being part of a series, or having certain A-list actors. If you want to make a footprint in the movie industry, be sure you keep those limitations in mind as you try to work your way up.