# What is the most successful genre?

We were tasked with analysing what genre would be the most profitable for Mircosoft to consider as a category for them to enter the movie making industry. We decided to answer the following questions about movie genres to assist in finding what the best strategy for Mircosoft should be. 

  1. What is the top overall movie genre?
  
  2. Is there a correlation between release month and higher profitability in that genre?
  
  3. Is there a correlation between production budget and net profits in the that genre?

We imported our cleaned dataframe that we saved as a .csv file and ranked the domestic_gross column from highest to lowest to determine what genres were the most successful at the box office.

In [16]:
import pandas as pd
%matplotlib notebook
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
df=pd.read_csv('Data/group_data.csv')
df2 = df.sort_values('domestic_gross', ascending=False).dropna()
df2

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,genres,tconst,runtime,popularity,year_released,release_day,release_month,domestic_gross_in_mill,production_budget_in_mill,domestic_net_in_mill,Return_on_Investment,release_day_num,release_month_num
0,1,2009-12-18,Avatar,425000000,760507625,Horror,tt1775309,93.0,26.526,2009,Friday,December,760.507625,425.0,335.507625,78.942971,18,12
40,42,2018-02-16,Black Panther,200000000,700059566,"Action,Adventure,Sci-Fi",tt1825683,134.0,2.058,2018,Friday,February,700.059566,200.0,500.059566,250.029783,16,2
5,7,2018-04-27,Avengers: Infinity War,300000000,678815482,"Action,Adventure,Sci-Fi",tt4154756,149.0,80.773,2018,Friday,April,678.815482,300.0,378.815482,126.271827,27,4
32,34,2015-06-12,Jurassic World,215000000,652270625,"Action,Adventure,Sci-Fi",tt0369610,124.0,20.709,2015,Friday,June,652.270625,215.0,437.270625,203.381686,12,6
25,27,2012-05-04,The Avengers,225000000,623279547,"Action,Adventure,Sci-Fi",tt0848228,143.0,50.289,2012,Friday,May,623.279547,225.0,398.279547,177.013132,4,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3216,95,2011-05-10,The Hit List,6000000,0,"Action,Thriller",tt1575694,90.0,6.241,2011,Tuesday,May,0.000000,6.0,-6.000000,-100.000000,10,5
2252,54,2016-02-19,Forsaken,18000000,0,"Action,Drama,Western",tt2271563,90.0,9.472,2016,Friday,February,0.000000,18.0,-18.000000,-100.000000,19,2
2251,53,2014-08-22,The Prince,18000000,0,"Action,Thriller",tt1085492,93.0,12.324,2014,Friday,August,0.000000,18.0,-18.000000,-100.000000,22,8
2249,51,2014-11-14,Wolves,18000000,0,"Action,Fantasy,Horror",tt1403241,91.0,8.627,2014,Friday,November,0.000000,18.0,-18.000000,-100.000000,14,11


Next we wanted to extract the top genres from the movies that historically preformed the best at the box office in order for us to gain a better understanding to what patterns the data showed. We determined that using the top 100 movies was the best sample size for our data. 

In [17]:
top_movie_genres = df2.loc[:,['movie', 'genres']].dropna() #drops the NaN values
final_movie_genres = top_movie_genres.head(100)
final_movie_genres

Unnamed: 0,movie,genres
0,Avatar,Horror
40,Black Panther,"Action,Adventure,Sci-Fi"
5,Avengers: Infinity War,"Action,Adventure,Sci-Fi"
32,Jurassic World,"Action,Adventure,Sci-Fi"
25,The Avengers,"Action,Adventure,Sci-Fi"
...,...,...
126,Wreck-It Ralph,"Adventure,Animation,Comedy"
2263,A Quiet Place,Documentary
127,Interstellar,"Adventure,Drama,Sci-Fi"
217,The Croods,"Action,Adventure,Animation"


Because we had so many different genres in our dataset, it was best for us to get a count of the most frequent genres to help us determine what type of movie was created the most. As you can see from our data, 'Action, Adventure, Sci-Fi' was the top genre within our 100 movie sampling size.

In [18]:
total_genres_count = final_movie_genres['genres'].value_counts()
total_genres_count

Action,Adventure,Sci-Fi       25
Adventure,Animation,Comedy    15
Action,Adventure,Fantasy       8
Action,Adventure,Comedy        7
Action,Adventure,Animation     5
Adventure,Family,Fantasy       4
Action,Crime,Thriller          3
Adventure,Fantasy              2
Animation,Comedy,Family        2
Action,Adventure,Thriller      2
Adventure,Drama,Sci-Fi         2
Action,Adventure,Family        1
Drama,Fantasy,Romance          1
Documentary                    1
Crime,Drama                    1
Comedy,Mystery                 1
Action,Drama,History           1
Action,Comedy,Crime            1
Action,Adventure,Drama         1
Drama,Sci-Fi,Thriller          1
Horror,Thriller                1
Action,Adventure,Horror        1
Adventure,Drama,Sport          1
Action,Thriller                1
Action,Biography,Drama         1
Action,Sci-Fi,Thriller         1
Adventure                      1
Adventure,Drama,Fantasy        1
Sci-Fi                         1
Horror                         1
Animation 

Once the genres were ranked in order from highest to lowest, visualizing that data was key. We determined that a bar chart was the best way to show our findings.

In [19]:
plt.figure(figsize=(13,8))
ax=sns.countplot(y="genres", data=final_movie_genres, palette= "ch:r=-.5,l=.75")

<IPython.core.display.Javascript object>

The next sub question to determine is if there is any correlation between the success of a particular genre being released at a specific time of the year. We extracted the top movies based on box office sales and compared that to release months and genre to see if there were any relationships. We decided to work with a smaller sampling size of 20 to get a better understanding of exactly how profitable the top movies of all time were. Did certain movies perform better being released in certain months?

In [20]:
domestic_gross_month = df2.loc[:,['production_budget_in_mill', 'domestic_gross_in_mill', 'genres', 'release_month_num', 'movie']].dropna() #drops the NaN values
highest_month = domestic_gross_month.head(20)
highest_month

Unnamed: 0,production_budget_in_mill,domestic_gross_in_mill,genres,release_month_num,movie
0,425.0,760.507625,Horror,12,Avatar
40,200.0,700.059566,"Action,Adventure,Sci-Fi",2,Black Panther
5,300.0,678.815482,"Action,Adventure,Sci-Fi",4,Avengers: Infinity War
32,215.0,652.270625,"Action,Adventure,Sci-Fi",6,Jurassic World
25,225.0,623.279547,"Action,Adventure,Sci-Fi",5,The Avengers
41,200.0,608.581744,"Action,Adventure,Animation",6,Incredibles 2
42,200.0,532.177324,"Action,Adventure,Sci-Fi",12,Rogue One: A Star Wars Story
130,160.0,504.014165,"Drama,Fantasy,Romance",3,Beauty and the Beast
43,200.0,486.295561,"Adventure,Animation,Comedy",6,Finding Dory
2,330.6,459.005868,"Action,Adventure,Sci-Fi",5,Avengers: Age of Ultron


A visualization was needed to actually see the correlation because it was hard to see our findings from the above dataset. We plotted our findings in a regression plot. This visual gave us a lot of interesting information. 
1. 60% of the top movies of all time fell in the 'Action, Adventure, Sci-Fi' genre category.
     
     
2. The majority of the 'Action, Adventure, Sci-Fi' movies were released in late Spring, Early-mid summer. The other popular seasons were Spring break and the holidays. But there was one other finding that was interesting as well.
        
        
3. One outlier 'Action, Adventure, Sci-Fi' movie which was the second highest grossing movie 
of all time and the most profitable movie of all time was released in February. That movie was 'Black Panther'. It was released during Black History Month because it was a cultural Marvel movie. So this data shows that if you want to create an 'Action, Adventure, Sci-Fi' cultural movie, it will have better success being released during that culture's heritage month. 

In [21]:
sns.lmplot(x="release_month_num", y="domestic_gross_in_mill", hue="genres", palette="GnBu_d", data=highest_month);

<IPython.core.display.Javascript object>

The last question that we wanted to answer about genre was how costly is the overall opportunity cost of the most successful genre to produce, and if the profits are worth it.

The table below has the top 100 movies sorted by return on investment first and then grouped by all of the 'Action, Adventure, Sci-Fi' genre to isolate all of the most profitable movies in that genre. 

In [23]:
df3 = df2.sort_values('Return_on_Investment', ascending=False).dropna()
dfgen = df3.loc[df['genres'] == 'Action,Adventure,Sci-Fi']
dfgen

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,genres,tconst,runtime,popularity,year_released,release_day,release_month,domestic_gross_in_mill,production_budget_in_mill,domestic_net_in_mill,Return_on_Investment,release_day_num,release_month_num
494,38,2012-03-23,The Hunger Games,80000000,408010692,"Action,Adventure,Sci-Fi",tt1392170,142.0,14.212,2012,Friday,March,408.010692,80.0,328.010692,410.013365,23,3
3699,100,2016-04-08,Hardcore Henry,2000000,9252038,"Action,Adventure,Sci-Fi",tt3072482,96.0,10.459,2016,Friday,April,9.252038,2.0,7.252038,362.6019,8,4
40,42,2018-02-16,Black Panther,200000000,700059566,"Action,Adventure,Sci-Fi",tt1825683,134.0,2.058,2018,Friday,February,700.059566,200.0,500.059566,250.029783,16,2
228,38,2013-11-22,The Hunger Games: Catching Fire,130000000,424668047,"Action,Adventure,Sci-Fi",tt1951264,146.0,20.187,2013,Friday,November,424.668047,130.0,294.668047,226.667728,22,11
32,34,2015-06-12,Jurassic World,215000000,652270625,"Action,Adventure,Sci-Fi",tt0369610,124.0,20.709,2015,Friday,June,652.270625,215.0,437.270625,203.381686,12,6
25,27,2012-05-04,The Avengers,225000000,623279547,"Action,Adventure,Sci-Fi",tt0848228,143.0,50.289,2012,Friday,May,623.279547,225.0,398.279547,177.013132,4,5
252,62,2014-11-21,The Hunger Games: Mockingjay - Part 1,125000000,337135885,"Action,Adventure,Sci-Fi",tt1951265,123.0,33.837,2014,Friday,November,337.135885,125.0,212.135885,169.708708,21,11
42,45,2016-12-16,Rogue One: A Star Wars Story,200000000,532177324,"Action,Adventure,Sci-Fi",tt3748528,133.0,21.401,2016,Friday,December,532.177324,200.0,332.177324,166.088662,16,12
108,13,2018-06-22,Jurassic World: Fallen Kingdom,170000000,417719760,"Action,Adventure,Sci-Fi",tt4881806,128.0,34.958,2018,Friday,June,417.71976,170.0,247.71976,145.717506,22,6
5,7,2018-04-27,Avengers: Infinity War,300000000,678815482,"Action,Adventure,Sci-Fi",tt4154756,149.0,80.773,2018,Friday,April,678.815482,300.0,378.815482,126.271827,27,4


Next, we generate a table of just the positive grossing movies in this cateogry.

In [24]:
top_domestic_month = dfftest.loc[:,['Return_on_Investment', 'domestic_gross_in_mill', 'production_budget_in_mill','domestic_net_in_mill', 'genres', 'movie',]].dropna() #drops the NaN values
highest_net_month = top_domestic_month.head(20)

highest_net_month

Unnamed: 0,Return_on_Investment,domestic_gross_in_mill,production_budget_in_mill,domestic_net_in_mill,genres,movie
494,410.013365,408.010692,80.0,328.010692,"Action,Adventure,Sci-Fi",The Hunger Games
3699,362.6019,9.252038,2.0,7.252038,"Action,Adventure,Sci-Fi",Hardcore Henry
40,250.029783,700.059566,200.0,500.059566,"Action,Adventure,Sci-Fi",Black Panther
228,226.667728,424.668047,130.0,294.668047,"Action,Adventure,Sci-Fi",The Hunger Games: Catching Fire
32,203.381686,652.270625,215.0,437.270625,"Action,Adventure,Sci-Fi",Jurassic World
25,177.013132,623.279547,225.0,398.279547,"Action,Adventure,Sci-Fi",The Avengers
252,169.708708,337.135885,125.0,212.135885,"Action,Adventure,Sci-Fi",The Hunger Games: Mockingjay - Part 1
42,166.088662,532.177324,200.0,332.177324,"Action,Adventure,Sci-Fi",Rogue One: A Star Wars Story
108,145.717506,417.71976,170.0,247.71976,"Action,Adventure,Sci-Fi",Jurassic World: Fallen Kingdom
5,126.271827,678.815482,300.0,378.815482,"Action,Adventure,Sci-Fi",Avengers: Infinity War


We then create a graph to visually display our findings to see the correlation of production budget vs. domestic net gross in the 'Action, Adventure, Sci-Fi' genre. We used a hexbin marginal plot to show that correlation. This shows that the most successful  'Action, Adventure, Sci-Fi' movies had a production budget of around  200 million dollars, and a domestic net gross between 200-500 million dollars.

In [26]:
sns.jointplot(x=highest_net_month["production_budget_in_mill"], y=highest_net_month["domestic_net_in_mill"], kind='kde')
plt.show()

<IPython.core.display.Javascript object>

## Conclusion 

The final recommendation to Microsoft pertaining to what genre would be the most profitable for them to make movies in would be 'Action, Adventure, Sci-Fi'. We concluded this finding from the following analysis discoveries:
  1. Out of the top 100 domestic gross movies over the past 30 years, the genre 'Action, Adventure, Sci-Fi' made up the largest successful genre group in that data sample.
    
    
  2. Our findings showed that releasing 'Action, Adventure, Sci-Fi' movies in late Spring/early-mid Summer, Spring Break week, during the holidays, and if it is a cultural movie, released during that culture's Heritage month, all proved to be the most profitable times of the year to release that genre. 
    
    
  3. Sticking to a production budget of 200 million dollars while producing an 'Action, Adventure, Sci-Fi' movie has proven to be the key ingredient to high net profitability that can be forecasted to be between 200-500 million dollars. 