# Investigate a TMDb movie Database

## Introduction

In this project we will be investigating a TMDb movies database file which has collection of important details of about 10k+ movies, including their details of budget, revenue, release dates, etc.

Let's take a glimpse at TMDb movie database csv file...

In [5]:
import pandas as pd

#reading tmdb csv file and storing that to a variable
glimpse_tmdb = pd.read_csv('data.csv')

#calling out first 5 rows (excluding headers) of tmdb database
glimpse_tmdb.head()

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,tagline,...,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj
0,135397,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,http://www.jurassicworld.com/,Colin Trevorrow,The park is open.,...,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/2015,5562,6.5,2015,137999939.3,1392446000.0
1,76341,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,http://www.madmaxmovie.com/,George Miller,What a Lovely Day.,...,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,5/13/2015,6185,7.1,2015,137999939.3,348161300.0
2,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,One Choice Can Destroy You,...,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,3/18/2015,2480,6.3,2015,101199955.5,271619000.0
3,140607,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,http://www.starwars.com/films/star-wars-episod...,J.J. Abrams,Every generation has a story.,...,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,12/15/2015,5292,7.5,2015,183999919.0,1902723000.0
4,168259,tt2820852,9.335014,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,http://www.furious7.com/,James Wan,Vengeance Hits Home,...,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,4/1/2015,2947,7.3,2015,174799923.1,1385749000.0


### What can we say about the dataset provided?
<ul>
    <li>The columns *'budget', 'revenue', 'budget_adj', 'revenue_adj'* has not given us the currency but for this dataset we will assume that it is in dollars.</li>
    <li>The vote count for each movie is not similar, for example, the movie *'Mad Max : Fury Road'* has *6k+* votes while *Sinister 2* has only *331 votes* (as seen above). Since the votes of the movies vary so much the *vote_average* column also is effected by it. So we cannot calculate or assume that movie with highest votes or rating was more successful since the voters of each film vary.</li>
</ul>  

### What Questions can be brainstormed?
Looking at this database...
<ul>
<li>The first question comes in my mind is which movie gained the most profit or we can also kind of say that which movie has been the people's favourite?</li>

<li>Since this is just the glimpse of the database, the glimpse of the data just shows the movies in the year 2015, but there are also other movies released in different years so the Second question comes in my mind is in which year the movies made the most profit?</li>

<li>Finally my curious mind wanted to know what are the similar characteristics of movies which have gained highest profits?</li>
</ul>


### Questions to be Answered
<ol>
    <li>General questions about the dataset.</li>
        <ol type = 'a'>
            <li>Which movie earns the most and least profit?</li>
            <li>Which movie had the greatest and least runtime?</li>
            <li>Which movie had the greatest and least budget?</li>
            <li>Which movie had the greatest and least revenue?</li>
            <li>What is the average runtime of all movies?</li>
            <li>In which year we had the most movies making profits?</li>
        </ol>
    <li>What are the similar characteristics does the most profitable movie have?</li>
        <ol type = 'a'>
            <li>Average duration of movies.</li>
            <li>Average Budget.</li>
            <li>Average revenue.</li>
            <li>Average profits.</li>
            <li>Which director directed most films?</li>
            <li>Whcih cast has appeared the most?</li>
            <li>Which genre were more successful?</li>
        </ol>
</ol>


-----


## Data Cleaning

**Before answering the above questions we need a clean dataset which has columns and rows we need for calculations.**

First, lets clean up the columns.
We will only keep the columns we need and remove the rest of them.

Columns to delete -  `id, imdb_id, popularity, budget_adj, revenue_adj, homepage, keywords, overview, production_companies, vote_count and vote_average.`

**We have already cleaned the dataset for you**

In [6]:
#importing all the nescessory libraries we need for our analysis
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

#this variable will store the database of tmdb movies into a dataframe
movie_data = pd.read_csv('movie_data_clean.csv')
movie_data.head(3)

Unnamed: 0,budget_(in_US-Dollars),revenue_(in_US-Dollars),profit_(in_US_Dollars),original_title,cast,director,tagline,runtime,genres,release_date,release_year
0,150000000,1513528810,1363528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Colin Trevorrow,The park is open.,124,Action|Adventure|Science Fiction|Thriller,2015-06-09,2015
1,150000000,378436354,228436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,George Miller,What a Lovely Day.,120,Action|Adventure|Science Fiction|Thriller,2015-05-13,2015
2,110000000,295238201,185238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,Robert Schwentke,One Choice Can Destroy You,119,Adventure|Science Fiction|Thriller,2015-03-18,2015


In [7]:
movie_data.shape

(3854, 11)

In [4]:
movie_data[['original_title','profit_(in_US_Dollars)']].set_index('original_title').shape

(3854, 1)

In [5]:
'''
a = set(movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title').min())
for i in range(len(movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title'))):
    if movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title')[movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title') == a]:
        print('hi')'''

"\na = set(movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title').min())\nfor i in range(len(movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title'))):\n    if movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title')[movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title') == a]:\n        print('hi')"

In [6]:
#a = movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title')
#pd.value_counts(a['revenue_(in_US-Dollars)'].astype(int) == 2)

In [7]:

#for i in range(len(movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title'))):
 #   movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title').min()

In [8]:
#a = movie_data[['original_title','revenue_(in_US-Dollars)']].set_index('original_title').min().tolist()[0]
#movie_data[movie_data['revenue_(in_US-Dollars)'] == a]
#a = movie_data[['original_title','profit_(in_US_Dollars)']].set_index('original_title')
#c = a[a.values > 0].min().tolist()[0]
#movie_data[movie_data['profit_(in_US_Dollars)'] == c][['original_title','profit_(in_US_Dollars)']]

In [9]:
def maxmin(col):
    b = movie_data[['original_title',col]].set_index('original_title').max().tolist()[0]
    print('MOVIE WITH MAX',col,'IS:\n')
    print(movie_data[movie_data[col] == b][['original_title',col]].set_index('original_title'),' \n-------------------  ')
    a = movie_data[['original_title',col]].set_index('original_title')
    c = a[a.values > 0].min().tolist()[0]
    print('MOVIE WITH MIN',col,' IS:\n')
    print(movie_data[movie_data[col] == c][['original_title',col]])
    #print('Max value is',movie_data[['original_title',col]].set_index('original_title').idxmax().values.tolist()[0])
    #print('Min value is',movie_data[['original_title',col]].set_index('original_title').idxmin().values.tolist()[0])

**Now let's dig deep and answer the questions!**

### Q1. 1A Which movie earns the most and least profit?

In [10]:
maxmin('profit_(in_US_Dollars)')

MOVIE WITH MAX profit_(in_US_Dollars) IS:

                profit_(in_US_Dollars)
original_title                        
Avatar                      2544505847  
-------------------  
MOVIE WITH MIN profit_(in_US_Dollars)  IS:

     original_title  profit_(in_US_Dollars)
2028   Hross Ã­ oss                       1


### 1B Which movie had the greatest and least runtime?

In [11]:
maxmin('runtime')

MOVIE WITH MAX runtime IS:

                runtime
original_title         
Carlos              338  
-------------------  
MOVIE WITH MIN runtime  IS:

     original_title  runtime
1758    Kid's Story       15


### 1C Which movie had the greatest and least budget?

In [12]:
maxmin('budget_(in_US-Dollars)')

MOVIE WITH MAX budget_(in_US-Dollars) IS:

                   budget_(in_US-Dollars)
original_title                           
The Warrior's Way               425000000  
-------------------  
MOVIE WITH MIN budget_(in_US-Dollars)  IS:

               original_title  budget_(in_US-Dollars)
810              Lost & Found                       1
1251  Love, Wedding, Marriage                       1


### 1D Which movie had the greatest and least revenue?

In [13]:
maxmin('revenue_(in_US-Dollars)')

MOVIE WITH MAX revenue_(in_US-Dollars) IS:

                revenue_(in_US-Dollars)
original_title                         
Avatar                       2781505847  
-------------------  
MOVIE WITH MIN revenue_(in_US-Dollars)  IS:

       original_title  revenue_(in_US-Dollars)
1732  Shattered Glass                        2
2897         Mallrats                        2


### 1E What is the average runtime of all movies?

In [14]:
movie_data['runtime'].mean()

109.22029060716139

### 1F In which year we had the most movies making profits?

In [15]:
#movie_data['profit_(in_US_Dollars)'] = movie_data[movie_data['profit_(in_US_Dollars)'] > 0]

In [16]:
#movie_data.groupby('release_year')['profit_(in_US_Dollars)'].sum()#.filter(lambda x: x > 0)

In [17]:
#movie_data['profit_(in_US_Dollars)'] = movie_data[movie_data['profit_(in_US_Dollars)'] > 0]
#a = movie_data.groupby('release_year')['profit_(in_US_Dollars)']
#a[a['profit_(in_US_Dollars)'] > 0]

In [18]:
movie = movie_data[movie_data['profit_(in_US_Dollars)'] > 0]
pd.value_counts(movie_data['release_year']).head(1)

2011    199
Name: release_year, dtype: int64

In [19]:
movie_data.shape

(3854, 11)

In [20]:
movie.shape

(2778, 11)

### Q2. 2A Average runtime of movies

In [21]:
movie_data['runtime'].mean()

109.22029060716139

### 2B Average Budget of Movies

In [22]:
movie_data['budget_(in_US-Dollars)'].mean()

37203696.954852104

### 2C Average Revenue of Movies

In [23]:
movie_data['revenue_(in_US-Dollars)'].mean()

107686616.09807992

### 2D Average Profit of Movies

In [24]:
movie['profit_(in_US_Dollars)'].mean()

103244454.42584594

### 2E Which directer directed most films?

In [8]:
pd.value_counts(movie_data['director']).head(1)

Steven Spielberg    27
Name: director, dtype: int64

### 2F Which cast has appeared the most?

In [27]:
a = movie_data['cast'].str.cat(sep='|').split('|')
b = pd.Series(a)
pd.value_counts(b).sort_values(ascending = False).head(1)
#pd.DataFrame.from_dict(a, orient = 'index')

Robert De Niro    52
dtype: int64

### 2G Which genre were more successful?

In [40]:
movie_data.groupby('genres')['profit_(in_US_Dollars)'].sum()


genres
Action                                                 1108505513
Action|Adventure                                        161160439
Action|Adventure|Animation|Family|Fantasy               424987477
Action|Adventure|Animation|Family|Science Fiction       539442092
Action|Adventure|Animation|Science Fiction                1731128
Action|Adventure|Animation|Science Fiction|Thriller      18428063
Action|Adventure|Comedy                                 227434058
Action|Adventure|Comedy|Crime                           209167176
Action|Adventure|Comedy|Crime|Drama                     158701578
Action|Adventure|Comedy|Crime|Thriller                  365612056
Action|Adventure|Comedy|Drama                            15754284
Action|Adventure|Comedy|Drama|Family                     49141030
Action|Adventure|Comedy|Drama|Mystery                   -10730514
Action|Adventure|Comedy|Drama|Science Fiction            35479424
Action|Adventure|Comedy|Drama|Western                    98033791
Act

In [31]:
movie_data

Unnamed: 0,budget_(in_US-Dollars),revenue_(in_US-Dollars),profit_(in_US_Dollars),original_title,cast,director,tagline,runtime,genres,release_date,release_year
0,150000000,1513528810,1363528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Colin Trevorrow,The park is open.,124,Action|Adventure|Science Fiction|Thriller,2015-06-09,2015
1,150000000,378436354,228436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,George Miller,What a Lovely Day.,120,Action|Adventure|Science Fiction|Thriller,2015-05-13,2015
2,110000000,295238201,185238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,Robert Schwentke,One Choice Can Destroy You,119,Adventure|Science Fiction|Thriller,2015-03-18,2015
3,200000000,2068178225,1868178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,J.J. Abrams,Every generation has a story.,136,Action|Adventure|Science Fiction|Fantasy,2015-12-15,2015
4,190000000,1506249360,1316249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,James Wan,Vengeance Hits Home,137,Action|Crime|Thriller,2015-04-01,2015
5,135000000,532950503,397950503,The Revenant,Leonardo DiCaprio|Tom Hardy|Will Poulter|Domhn...,Alejandro GonzÃ¡lez IÃ±Ã¡rritu,"(n. One who has returned, as if from the dead.)",156,Western|Drama|Adventure|Thriller,2015-12-25,2015
6,155000000,440603537,285603537,Terminator Genisys,Arnold Schwarzenegger|Jason Clarke|Emilia Clar...,Alan Taylor,Reset the future,125,Science Fiction|Action|Thriller|Adventure,2015-06-23,2015
7,108000000,595380321,487380321,The Martian,Matt Damon|Jessica Chastain|Kristen Wiig|Jeff ...,Ridley Scott,Bring Him Home,141,Drama|Adventure|Science Fiction,2015-09-30,2015
8,74000000,1156730962,1082730962,Minions,Sandra Bullock|Jon Hamm|Michael Keaton|Allison...,Kyle Balda|Pierre Coffin,"Before Gru, they had a history of bad bosses",91,Family|Animation|Adventure|Comedy,2015-06-17,2015
9,175000000,853708609,678708609,Inside Out,Amy Poehler|Phyllis Smith|Richard Kind|Bill Ha...,Pete Docter,Meet the little voices inside your head.,94,Comedy|Animation|Family,2015-06-09,2015


In [6]:
#filter  iteration
