In [1]:
import pandas as pd
import pickle
import warnings
warnings.filterwarnings('ignore')
movie_data=pickle.load(open("merged_file.pickle","rb"))

# New Movie Studio#  
We have been tasked with coming up with a strategy for how Microsoft should start up a new movie studio to compete in the streaming space. We will look at movie financial data and market condition data to try to come up with some strategic recommendation for microsoft.


## Steps In our process: ##


1. ### Import and clean our data ###
    * Movie Budget data is from https://www.the-numbers.com/movie/budgets/all
    * Market data is from https://finance.yahoo.com/quote/SPY?p=SPY&.tsrc=fin-srch  
      
2. ### Explore data with charts ###
    * Looking at summaries is a great way to get a feel for a data set.
    * We identified profit percentage as a metric to investigate
3. ### Analyze the data ###
    * For this data set we analyzed our data for insights mainly by creating and looking at graphs
4. ### Use Analysis to make recommendations
    * From our graphs we came ups with some concrete recomendations based on budget, compettion, market conditions and season.

# Exploring the Data #

One of the first things we do when exploring new data is to look at some summary statistics to get a feel for what is going on in. Below we have a summary table that gives us a selection of statistics about each of the peices of info we have about the movies in our database.  
(All financial numbers are in dollars, and may be recorded in scientific notattion where 4.610445e+08 = $4.610445*10^8$)

In [2]:
movie_data.iloc[:,[2,3,4,5,13,14,15]].describe()

Unnamed: 0,production_budget,domestic_gross,worldwide_gross,domestic_profit,foreign_profit,profit_margin
count,4319.0,4319.0,4319.0,4319.0,4319.0,4319.0
mean,38294180.0,46880420.0,106955700.0,8586241.0,60075260.0,237.705147
std,49264750.0,78248670.0,200464300.0,53825970.0,129602400.0,1272.2552
min,1341.496,0.0,0.0,-201941300.0,0.0,-100.0
25%,5517160.0,652711.3,3188324.0,-11414860.0,203937.8,-58.897805
50%,20480890.0,17739630.0,33349940.0,-1090830.0,11182740.0,54.997616
75%,49959030.0,57816220.0,114459300.0,15362550.0,55980650.0,239.418791
max,461044500.0,977135200.0,3011808000.0,657913000.0,2186802000.0,43051.785333


Take a look at the profit margin column. The maximum is 43051%, which seems crazy! What is going on here?  
Lets take a look at the ten most profitable movies

In [3]:
movie_data.iloc[:,[1,2,3,13,14,15]].sort_values(by = ['profit_margin'], ascending=False).head(10)

Unnamed: 0,release_date,title,production_budget,domestic_profit,foreign_profit,profit_margin
2281,2009-09-25,Paranormal Activity,502978.751942,120121200.0,96420160.0,43051.785333
708,2015-07-10,The Gallows,103012.073015,23347080.0,19461110.0,41556.474
3156,2004-05-07,Super Size Me,82309.737875,14517370.0,13555070.0,34105.858462
3883,2005-08-05,My Date With Drew,1341.496134,219445.6,0.0,16358.272727
1263,2007-05-16,Once,172332.5789,10679860.0,15943950.0,15449.087333
3266,2004-10-08,Primer,8864.125617,529011.0,528258.8,11927.514286
3128,2004-06-11,Napoleon Dynamite,506521.463847,55895850.0,2002985.0,11430.67825
3245,2004-08-06,Open Water,633151.829809,37990230.0,31680080.0,11003.7282
2755,2006-09-29,Facing the Giants,118269.014701,11919540.0,76671.44,10143.159
1823,2012-01-06,The Devil Inside,1000000.0,52262940.0,48496550.0,10075.949


Wow! It looks like there are movies with extremely high profitability, on the order of 10,000 percent. This is something that we want to investigate in order to come up with our movie studio strategy.

# Exploring Profitability with Graphs #  


The first thing we decided to do was constrict our search to only movies from the last ten years, and then just look at the distributions of profitability for each year.


![Distribution Of Movie Profitability](images/profitability_distribution_year.png) 

Just by looking at the distribution we can learn a great deal about our data. The vast majority of movies have performance somewhere between being slightly unprofitable and making about two and a half times their budget, while there is always a small percentage of movies each year that make ten, twenty, or even hundreds times their budget. This huge variability in profitability also affects the asymetrical appearance of the chart; you can only lose what you put into a movie production, but the potential profit doesn't have a hard upper limit. We will center our strategy around trying to take advantage of this asymmetry and variability.

# How can we use our new insight? #

Our strategy to try and take advantage of exceptionally profitable movies will be to focus on lower-budget films and hope that we can have one of these exceptionally profitable movies. There are a few reasons for this:  
1. Since we are looking at the return on our investment, it is simply not possible to to make expensive movies that make hundreds of times their budget. 
  * For example, a movie that costs \\$200M to make would need to make \\$20 Billion to make 100 times the initial investment.
2. Focusing on lower-budget films allows us to have granular controll over our risk and spending since because the distribution is so asymetric, the potential loss from any one film is very small. 
  * Instead of choosing whether to make one or two \\$200M movies, we can choose more specifically how much to spend.
3. In order to take maximum advantage of the high variability of profitability we should make as many lower-budget films as possible to increase our chances of success.
  * The more swings we take, the greater our chances of a home run
  
Now, lets dig in to the data and try to come up with some more specific recommendations.


# Effects of Competition #

![Competition and Profitability](images/competition.png) 

Our first avenue of investigation was to look at level of competition movies faced and how that affected their profitability. We also broke this down into Mass Production and Independant movies, because more expensive movies are more often large studio productions, and independant movies tend to have smaller budgets. What this graph shows us is very usefull. We can see that as we increase the amount of competition, the profitability of independant movies severely decreases, while big budget movies are far less affected. Importantly, the *variability* of the independant movie profitability is also decreased, which is very important to our strategy. We suspect this is due to the efficiency of scale and large budgets for advertising enjoyed by elarge, expensive studio productions.  
### Since our goal is to hit a home run with a lower budget film, it will be especially important for us to try for less competitive release times. ###

# Market Conditions #

We also wanted to see if we could use historical S&P 500 data to look at the overall consumer sentiment, and see how that will affect our strategy. 

![Markets and Profitability](images/market.png) 

Similarly to the competition graph, we see different behavior here for mass production and independant movies. The profitability for mass production movies is almost unaffected by market condition, while the median and the variability of the profitability is severely decreased for independant movies when consumer sentiment is low. So independant movies are also much more sensitive to consumer sentiment than mass porduction movies. We suspect this is because when consumer sentiment is low and people have less spending money they are more likely to choose large studio movies that are more familiar to them from advertising and brand loyalty.  
### It is important to only try this strategy when consumer sentiment is not low ###

# Combining our Insights #

Once we considered our insights, we realized that there are strong seasonal components to both movie competition and consumer sentiment, so we dedcied to look into seasonal effects on profitability.

![Seaonality and Profitability](images/bar_spider.png) 

This graph shows us the seasonality of competition with the red radial bars, and the seasonality of average movie profitability with the orane line and blue shaded area. We can see that competition and bad consumer sentiment combine to make the holiday season a bad time for our strategy, while the best times seem to be the begining and end of the summer season, where there is less competition that in the middle of the season, but consumer sentiment is still high.

# Recomendations #  

1. ### Focus on independent films ###
    * Low downside risk with very large upside potential  
2. ### Time release to avoid competition ###  
3. ### Hold off when the market is bad ###  
4. ### Pay attention to seasonality of supply and demand ###
    * Identify times of year that have low competition and good market conditions  