# Movie Analysis for Microsoft Studios

**Author:** Ronald Lodetti Jr.
***

## Overview

A one-paragraph overview of the project, including the 
business problem, 
data, 
methods, 
results and 
recommendations.

Microsoft would like to create a movie studio, but needs to better understand the movie landscape to help determine which kind of movies to produces. The data used in this project come from The Numbers and IMDB. 


## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project? 
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
*** 

As a new competitor in the field of movie production, Microsoft does not have a history to inform business decisions. They need analysis of current trends to help inform their first steps.

My first instinct was to look into the Box Office Mojo data to determine the most and least successful movie studios, identify patterns, and make recommendations to help Microsoft be successful. After thorough analysis, the results were inconclusive due to many reasoning including the size of the datasets. I then decided to pivot to most and least successful producers and look for similar patterns. 

I initially determined success of a movie by it's profit - the difference between a movie's gross box office and its production budget. After analysis, I determined that it came up with conclusions not suited to Microsoft's needs. For example, the most profitable movies tend to have very large budgets, but it doesn't seem prudent to recommend Microsoft make movies with very large budgets to start out, especially as I only found a correlation between budget and profit. Based on this I decided to focus on return on investment, a ratio of profit to budget, as a metric for box office success.

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***
The Numbers dataset came from <a href="https://www.the-numbers.com/">The Numbers</a> website. This dataset will be used to calculate the return on investment ("ROI") from each film. 

The IMDB dataset comes from their website as a database with many tables. For the purposes of this project, I used the movie_basics, principals, and persons tables. This dataset will be used to identify producers, genres, and runtime for movies. 



In [1]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
tn_df = pd.read_csv(
    '/Users/ronlodetti/Documents/Flatiron/1_phase/Project_1/Movie_Analysis_Project/data/imported/tn.movie_budgets.csv.gz')

In [3]:
tn_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5782 entries, 0 to 5781
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   id                 5782 non-null   int64 
 1   release_date       5782 non-null   object
 2   movie              5782 non-null   object
 3   production_budget  5782 non-null   object
 4   domestic_gross     5782 non-null   object
 5   worldwide_gross    5782 non-null   object
dtypes: int64(1), object(5)
memory usage: 271.2+ KB


In [4]:
tn_df.head()

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"


In [None]:
# Import the visualization package you created

import code.visualizations as viz

In [None]:
# This example function takes no arguments currently, but you would pass the full dataset to it for your project

viz.sample_plot_1()

## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# here you run your code to clean the data
import code.data_cleaning as dc

full_dataset = dc.full_clean()

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# here you run your code to model the data


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***

# Business Understanding

Microsoft sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies. You are charged with exploring what types of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the head of Microsoft's new movie studio can use to help decide what type of films to create.

# Data Understanding

## Data Preparation

# Exploratory Data Analysis

# Conclusions

## Limitations

# Recommendations

## Next Steps