## DATA SCIENCE QUESTIONS

- What movies have metascores greater than or equals to 95?
- What movie has the longest runtime?
- What genre makes the most movies?

## Step 1: Data Importation

In [40]:
import pandas as pd
from pandas import Series, DataFrame

In [41]:
data = pd.read_csv('IMDB-Movie-Data.csv')

In [42]:
data.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


## Step 2: Data Cleaning

In [43]:
data.isnull().sum()

Rank                    0
Title                   0
Genre                   0
Description             0
Director                0
Actors                  0
Year                    0
Runtime (Minutes)       0
Rating                  0
Votes                   0
Revenue (Millions)    128
Metascore              64
dtype: int64

In [44]:
data.dropna(how='any', inplace=True)

In [45]:
data.describe()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
count,838.0,838.0,838.0,838.0,838.0,838.0,838.0
mean,485.247017,2012.50716,114.638425,6.81432,193230.3,84.564558,59.575179
std,286.572065,3.17236,18.470922,0.877754,193099.0,104.520227,16.952416
min,1.0,2006.0,66.0,1.9,178.0,0.0,11.0
25%,238.25,2010.0,101.0,6.3,61276.5,13.9675,47.0
50%,475.5,2013.0,112.0,6.9,136879.5,48.15,60.0
75%,729.75,2015.0,124.0,7.5,271083.0,116.8,72.0
max,1000.0,2016.0,187.0,9.0,1791916.0,936.63,100.0


In [46]:
data[['Description', 'Rank']]

Unnamed: 0,Description,Rank
0,A group of intergalactic criminals are forced ...,1
1,"Following clues to the origin of mankind, a te...",2
2,Three girls are kidnapped by a man with a diag...,3
3,"In a city of humanoid animals, a hustling thea...",4
4,A secret government agency recruits some of th...,5
...,...,...
993,While still out to destroy the evil Umbrella C...,994
994,3 high school seniors throw a birthday party t...,995
996,Three American college students studying abroa...,997
997,Romantic sparks occur between two dance studen...,998


In [47]:
data.columns

Index(['Rank', 'Title', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime (Minutes)', 'Rating', 'Votes', 'Revenue (Millions)',
       'Metascore'],
      dtype='object')

## Step 3: Data Analysis

In [48]:
data.isnull().sum()

Rank                  0
Title                 0
Genre                 0
Description           0
Director              0
Actors                0
Year                  0
Runtime (Minutes)     0
Rating                0
Votes                 0
Revenue (Millions)    0
Metascore             0
dtype: int64

In [49]:
data.dropna(how='any', inplace=True)

In [50]:
data.isnull().sum()

Rank                  0
Title                 0
Genre                 0
Description           0
Director              0
Actors                0
Year                  0
Runtime (Minutes)     0
Rating                0
Votes                 0
Revenue (Millions)    0
Metascore             0
dtype: int64

## Question 1: Movies with a metascore greater than or equals to 95

In [73]:
n = 95
small_df = data[['Title', 'Metascore']]
result = small_df[small_df['Metascore'] >= n]
print(result)

                     Title  Metascore
21   Manchester by the Sea       96.0
41               Moonlight       99.0
111       12 Years a Slave       96.0
230        Pan's Labyrinth       98.0
324     The Social Network       95.0
406       Zero Dark Thirty       95.0
489            Ratatouille       96.0
501                  Carol       95.0
509                Gravity       96.0
656                Boyhood      100.0


From the data above, we can see the list of movies with metascores higher or equals to 95

## Question 2: What movie has the longest runtime

In [86]:
small_df = data[['Title', 'Runtime (Minutes)', ]]

result = small_df.sort_values('Runtime (Minutes)', ascending=False)
print (result)

                                                 Title  Runtime (Minutes)
88                                   The Hateful Eight                187
82                             The Wolf of Wall Street                180
311                                     La vie d'Adèle                180
267                                        Cloud Atlas                172
430                                           3 Idiots                170
..                                                 ...                ...
258                                         Lights Out                 81
862  Alexander and the Terrible, Horrible, No Good,...                 81
949                                              Kicks                 80
711                                    La tortue rouge                 80
793                                Ma vie de Courgette                 66

[838 rows x 2 columns]


   From the data above, we can see that The Hateful Eight has the longest runtime

## Question 3: What movie has the highest metascore

In [53]:
data.nlargest(5, 'Metascore')

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
656,657,Boyhood,Drama,"The life of Mason, from early childhood to his...",Richard Linklater,"Ellar Coltrane, Patricia Arquette, Ethan Hawke...",2014,165,7.9,286722,25.36,100.0
41,42,Moonlight,Drama,"A chronicle of the childhood, adolescence and ...",Barry Jenkins,"Mahershala Ali, Shariff Earp, Duan Sanderson, ...",2016,111,7.5,135095,27.85,99.0
230,231,Pan's Labyrinth,"Drama,Fantasy,War","In the falangist Spain of 1944, the bookish yo...",Guillermo del Toro,"Ivana Baquero, Ariadna Gil, Sergi López,Maribe...",2006,118,8.2,498879,37.62,98.0
21,22,Manchester by the Sea,Drama,A depressed uncle is asked to take care of his...,Kenneth Lonergan,"Casey Affleck, Michelle Williams, Kyle Chandle...",2016,137,7.9,134213,47.7,96.0
111,112,12 Years a Slave,"Biography,Drama,History","In the antebellum United States, Solomon North...",Steve McQueen,"Chiwetel Ejiofor, Michael Kenneth Williams, Mi...",2013,134,8.1,486338,56.67,96.0


From the data above, we can see that Boyhood has the highest metascore