## Data Analysis with Pandas on IMDB Movie Dataset

Objective: To find some interesting insights into a few movies released between 1916 and 2016, using Python pandas.

In [178]:
# Import the numpy and pandas packages

import numpy as np
import pandas as pd

### Task 1: Reading and Inspection

**Subtask 1.1: Import and read**

Import and read the movie database. Store it in a variable called `movies`.

In [179]:
# Import csv dataset
movies = pd.read_csv('Movies.csv')
movies

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3848,Color,Shane Carruth,143.0,77.0,291.0,8.0,David Sullivan,291.0,424760.0,Drama|Sci-Fi|Thriller,...,371.0,English,USA,PG-13,7000.0,2004.0,45.0,7.0,1.85,19000
3849,Color,Neill Dela Llana,35.0,80.0,0.0,0.0,Edgar Tancangco,0.0,70071.0,Thriller,...,35.0,English,Philippines,Not Rated,7000.0,2005.0,0.0,6.3,,74
3850,Color,Robert Rodriguez,56.0,81.0,0.0,6.0,Peter Marquardt,121.0,2040920.0,Action|Crime|Drama|Romance|Thriller,...,130.0,Spanish,USA,R,7000.0,1992.0,20.0,6.9,1.37,0
3851,Color,Edward Burns,14.0,95.0,0.0,133.0,Caitlin FitzGerald,296.0,4584.0,Comedy|Drama,...,14.0,English,USA,Not Rated,9000.0,2011.0,205.0,6.4,,413


**Subtask 1.2: Inspect the dataframe**

Inspect the dataframe's columns, shapes, variable types etc.

In [180]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3853 entries, 0 to 3852
Data columns (total 28 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   color                      3851 non-null   object 
 1   director_name              3853 non-null   object 
 2   num_critic_for_reviews     3852 non-null   float64
 3   duration                   3852 non-null   float64
 4   director_facebook_likes    3853 non-null   float64
 5   actor_3_facebook_likes     3847 non-null   float64
 6   actor_2_name               3852 non-null   object 
 7   actor_1_facebook_likes     3853 non-null   float64
 8   gross                      3853 non-null   float64
 9   genres                     3853 non-null   object 
 10  actor_1_name               3853 non-null   object 
 11  movie_title                3853 non-null   object 
 12  num_voted_users            3853 non-null   int64  
 13  cast_total_facebook_likes  3853 non-null   int64

In [181]:
movies.describe()

Unnamed: 0,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_1_facebook_likes,gross,num_voted_users,cast_total_facebook_likes,facenumber_in_poster,num_user_for_reviews,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
count,3852.0,3852.0,3853.0,3847.0,3853.0,3853.0,3853.0,3853.0,3847.0,3853.0,3853.0,3853.0,3852.0,3853.0,3781.0,3853.0
mean,163.046989,109.92108,784.507397,747.290096,7575.558526,50965470.0,102424.0,11239.480145,1.378217,326.720478,45248170.0,2003.061511,1959.018432,6.462886,2.109569,9089.546068
std,123.927258,22.741728,3027.395958,1841.623945,15405.383114,69320330.0,150294.2,18922.948757,2.056071,407.920904,223420800.0,10.007168,4472.17129,1.053843,0.353236,21277.375223
min,1.0,34.0,0.0,0.0,0.0,162.0,5.0,0.0,0.0,1.0,218.0,1920.0,0.0,1.6,1.18,0.0
25%,72.0,95.0,10.0,183.0,721.0,6830957.0,17309.0,1817.0,0.0,102.0,10000000.0,1999.0,362.0,5.9,1.85,0.0
50%,134.0,106.0,58.0,427.0,1000.0,27900000.0,50523.0,3876.0,1.0,203.0,24000000.0,2005.0,664.0,6.6,2.35,206.0
75%,221.0,120.0,222.0,685.0,12000.0,65500000.0,124185.0,15972.0,2.0,391.0,50000000.0,2010.0,971.0,7.2,2.35,11000.0
max,813.0,330.0,23000.0,23000.0,640000.0,760505800.0,1689764.0,656730.0,43.0,5060.0,12215500000.0,2016.0,137000.0,9.3,16.0,349000.0


In [182]:
movies.shape

(3853, 28)

In [183]:
movies.columns

Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

In [184]:
movies.head(10)

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
5,Color,Sam Raimi,392.0,156.0,0.0,4000.0,James Franco,24000.0,336530303.0,Action|Adventure|Romance,...,1902.0,English,USA,PG-13,258000000.0,2007.0,11000.0,6.2,2.35,0
6,Color,Nathan Greno,324.0,100.0,15.0,284.0,Donna Murphy,799.0,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,...,387.0,English,USA,PG,260000000.0,2010.0,553.0,7.8,1.85,29000
7,Color,Joss Whedon,635.0,141.0,0.0,19000.0,Robert Downey Jr.,26000.0,458991599.0,Action|Adventure|Sci-Fi,...,1117.0,English,USA,PG-13,250000000.0,2015.0,21000.0,7.5,2.35,118000
8,Color,David Yates,375.0,153.0,282.0,10000.0,Daniel Radcliffe,25000.0,301956980.0,Adventure|Family|Fantasy|Mystery,...,973.0,English,UK,PG,250000000.0,2009.0,11000.0,7.5,2.35,10000
9,Color,Zack Snyder,673.0,183.0,0.0,2000.0,Lauren Cohan,15000.0,330249062.0,Action|Adventure|Sci-Fi,...,3018.0,English,USA,PG-13,250000000.0,2016.0,4000.0,6.9,2.35,197000


In [185]:
movies.tail(10)

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
3843,Color,Daryl Wein,22.0,88.0,38.0,211.0,Heather Burns,331.0,76382.0,Romance,...,8.0,English,USA,,15000.0,2009.0,212.0,6.2,2.35,324
3844,Color,John Waters,73.0,108.0,0.0,105.0,Mink Stole,462.0,180483.0,Comedy|Crime|Horror,...,183.0,English,USA,NC-17,10000.0,1972.0,143.0,6.1,1.37,0
3845,Color,Olivier Assayas,81.0,110.0,107.0,45.0,Béatrice Dalle,576.0,136007.0,Drama|Music|Romance,...,39.0,French,France,R,4500.0,2004.0,133.0,6.9,2.35,171
3846,Color,Jafar Panahi,64.0,90.0,397.0,0.0,Nargess Mamizadeh,5.0,673780.0,Drama,...,26.0,Persian,Iran,Not Rated,10000.0,2000.0,0.0,7.5,1.85,697
3847,Color,Kiyoshi Kurosawa,78.0,111.0,62.0,6.0,Anna Nakagawa,89.0,94596.0,Crime|Horror|Mystery|Thriller,...,50.0,Japanese,Japan,,1000000.0,1997.0,13.0,7.4,1.85,817
3848,Color,Shane Carruth,143.0,77.0,291.0,8.0,David Sullivan,291.0,424760.0,Drama|Sci-Fi|Thriller,...,371.0,English,USA,PG-13,7000.0,2004.0,45.0,7.0,1.85,19000
3849,Color,Neill Dela Llana,35.0,80.0,0.0,0.0,Edgar Tancangco,0.0,70071.0,Thriller,...,35.0,English,Philippines,Not Rated,7000.0,2005.0,0.0,6.3,,74
3850,Color,Robert Rodriguez,56.0,81.0,0.0,6.0,Peter Marquardt,121.0,2040920.0,Action|Crime|Drama|Romance|Thriller,...,130.0,Spanish,USA,R,7000.0,1992.0,20.0,6.9,1.37,0
3851,Color,Edward Burns,14.0,95.0,0.0,133.0,Caitlin FitzGerald,296.0,4584.0,Comedy|Drama,...,14.0,English,USA,Not Rated,9000.0,2011.0,205.0,6.4,,413
3852,Color,Jon Gunn,43.0,90.0,16.0,16.0,Brian Herzlinger,86.0,85222.0,Documentary,...,84.0,English,USA,PG,1100.0,2004.0,23.0,6.6,1.85,456


How many rows and columns are present in the dataframe?

In [186]:
movies.shape

(3853, 28)

3853 rows, 28 columns

How many columns have null values present in them?

In [187]:
movies.isnull().any()

color                         True
director_name                False
num_critic_for_reviews        True
duration                      True
director_facebook_likes      False
actor_3_facebook_likes        True
actor_2_name                  True
actor_1_facebook_likes       False
gross                        False
genres                       False
actor_1_name                 False
movie_title                  False
num_voted_users              False
cast_total_facebook_likes    False
actor_3_name                  True
facenumber_in_poster          True
plot_keywords                 True
movie_imdb_link              False
num_user_for_reviews         False
language                      True
country                      False
content_rating                True
budget                       False
title_year                   False
actor_2_facebook_likes        True
imdb_score                   False
aspect_ratio                  True
movie_facebook_likes         False
dtype: bool

In [188]:
columns_with_null = movies.columns[movies.isnull().any()]
print(columns_with_null)

Index(['color', 'num_critic_for_reviews', 'duration', 'actor_3_facebook_likes',
       'actor_2_name', 'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'language', 'content_rating', 'actor_2_facebook_likes', 'aspect_ratio'],
      dtype='object')


In [189]:
len(columns_with_null)

12

There are 12 columns with null values.

### Task 2: Cleaning the Data

**Subtask 2.1: Drop unecessary columns**

For this project, I will mostly be analyzing the movies with respect to the ratings, gross collection, popularity of movies, etc. Some of the columns in this dataframe are not required. So I will be dropping the following columns.
-  color
-  director_facebook_likes
-  actor_1_facebook_likes
-  actor_2_facebook_likes
-  actor_3_facebook_likes
-  actor_2_name
-  cast_total_facebook_likes
-  actor_3_name
-  duration
-  facenumber_in_poster
-  content_rating
-  country
-  movie_imdb_link
-  aspect_ratio
-  plot_keywords

In [190]:
movies.head()

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000


In [191]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3853 entries, 0 to 3852
Data columns (total 28 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   color                      3851 non-null   object 
 1   director_name              3853 non-null   object 
 2   num_critic_for_reviews     3852 non-null   float64
 3   duration                   3852 non-null   float64
 4   director_facebook_likes    3853 non-null   float64
 5   actor_3_facebook_likes     3847 non-null   float64
 6   actor_2_name               3852 non-null   object 
 7   actor_1_facebook_likes     3853 non-null   float64
 8   gross                      3853 non-null   float64
 9   genres                     3853 non-null   object 
 10  actor_1_name               3853 non-null   object 
 11  movie_title                3853 non-null   object 
 12  num_voted_users            3853 non-null   int64  
 13  cast_total_facebook_likes  3853 non-null   int64

In [192]:
# Drop the columns that are not necessary
drop_columns = ['color', 'director_facebook_likes','actor_1_facebook_likes', 'actor_2_facebook_likes', 
                'actor_3_facebook_likes', 'actor_2_name', 'cast_total_facebook_likes', 'actor_3_name',
                'duration', 'facenumber_in_poster', 'content_rating', 'country', 'movie_imdb_link',
                'aspect_ratio', 'plot_keywords']
new_movies = movies.drop(columns = drop_columns)
new_movies.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes
0,James Cameron,723.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar,886204,3054.0,English,237000000.0,2009.0,7.9,33000
1,Gore Verbinski,302.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,Pirates of the Caribbean: At World's End,471220,1238.0,English,300000000.0,2007.0,7.1,0
2,Sam Mendes,602.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz,Spectre,275868,994.0,English,245000000.0,2015.0,6.8,85000
3,Christopher Nolan,813.0,448130642.0,Action|Thriller,Tom Hardy,The Dark Knight Rises,1144337,2701.0,English,250000000.0,2012.0,8.5,164000
4,Andrew Stanton,462.0,73058679.0,Action|Adventure|Sci-Fi,Daryl Sabara,John Carter,212204,738.0,English,263700000.0,2012.0,6.6,24000


In [193]:
new_movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3853 entries, 0 to 3852
Data columns (total 13 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   director_name           3853 non-null   object 
 1   num_critic_for_reviews  3852 non-null   float64
 2   gross                   3853 non-null   float64
 3   genres                  3853 non-null   object 
 4   actor_1_name            3853 non-null   object 
 5   movie_title             3853 non-null   object 
 6   num_voted_users         3853 non-null   int64  
 7   num_user_for_reviews    3853 non-null   float64
 8   language                3849 non-null   object 
 9   budget                  3853 non-null   float64
 10  title_year              3853 non-null   float64
 11  imdb_score              3853 non-null   float64
 12  movie_facebook_likes    3853 non-null   int64  
dtypes: float64(6), int64(2), object(5)
memory usage: 391.4+ KB


What is the count of columns in the new dataframe?

In [194]:
new_movies.shape

(3853, 13)

New dataframe has 13 columns.

**Subtask 2.2: Inspect Null values**

Since, there are null values in multiple columns of the dataframe. Find out the percentage of null values in each column of the dataframe 'movies'. 

In [195]:
movies_null = new_movies[new_movies.columns[new_movies.isnull().any()]]
movies_null

Unnamed: 0,num_critic_for_reviews,language
0,723.0,English
1,302.0,English
2,602.0,English
3,813.0,English
4,462.0,English
...,...,...
3848,143.0,English
3849,35.0,English
3850,56.0,Spanish
3851,14.0,English


In [196]:
movies_null.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3853 entries, 0 to 3852
Data columns (total 2 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   num_critic_for_reviews  3852 non-null   float64
 1   language                3849 non-null   object 
dtypes: float64(1), object(1)
memory usage: 60.3+ KB


In [197]:
# count percentage of null values
null_percentage = round((movies_null.isnull().sum() / 3853)*100,2) 
null_percentage

num_critic_for_reviews    0.03
language                  0.10
dtype: float64

In [198]:
percentage_null_df = pd.DataFrame(null_percentage, columns=["Percentage of Null Values (%)"])
percentage_null_df.sort_values("Percentage of Null Values (%)")

Unnamed: 0,Percentage of Null Values (%)
num_critic_for_reviews,0.03
language,0.1


Which column has the highest percentage of null values?

In [199]:
percentage_null_df['Percentage of Null Values (%)'].max()

0.1

Language has the highest percentage of null values.

In [200]:
new_movies['language'].groupby(new_movies['language']).count().sort_values(ascending=False)

language
English       3671
French          37
Spanish         26
Mandarin        14
German          13
Japanese        12
Hindi           10
Cantonese        8
Italian          7
Portuguese       5
Korean           5
Norwegian        4
Dutch            3
Persian          3
Danish           3
Thai             3
Aboriginal       2
Indonesian       2
Dari             2
Hebrew           2
Czech            1
Vietnamese       1
Aramaic          1
Telugu           1
Swedish          1
Bosnian          1
Russian          1
Romanian         1
Arabic           1
Icelandic        1
Dzongkha         1
Mongolian        1
Maya             1
Filipino         1
Kazakh           1
Hungarian        1
Zulu             1
Name: language, dtype: int64

**Subtask 2.3: Fill NaN values**

Since the `language` column has some NaN values. Since, English the most commonly used in the film industry, especially in this dataset, I will replace the null values with `'English'`.

In [201]:
# replace the NaN values in the 'language' column with English

new_movies['language'] = new_movies['language'].fillna("English")
new_movies.isnull().any()

director_name             False
num_critic_for_reviews     True
gross                     False
genres                    False
actor_1_name              False
movie_title               False
num_voted_users           False
num_user_for_reviews      False
language                  False
budget                    False
title_year                False
imdb_score                False
movie_facebook_likes      False
dtype: bool

What is the count of movies made in English language after replacing the NaN values with English?

In [202]:
new_movies.shape

(3853, 13)

In [203]:
print("Count of movies in English: ",new_movies[new_movies['language'] == 'English'].shape[0])

Count of movies in English:  3675


### Task 3: Data Analysis

**Subtask 3.1: Change the unit of columns**

Convert the unit of the `budget` and `gross` columns from `$` to `million $`.

In [204]:
new_movies.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes
0,James Cameron,723.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar,886204,3054.0,English,237000000.0,2009.0,7.9,33000
1,Gore Verbinski,302.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,Pirates of the Caribbean: At World's End,471220,1238.0,English,300000000.0,2007.0,7.1,0
2,Sam Mendes,602.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz,Spectre,275868,994.0,English,245000000.0,2015.0,6.8,85000
3,Christopher Nolan,813.0,448130642.0,Action|Thriller,Tom Hardy,The Dark Knight Rises,1144337,2701.0,English,250000000.0,2012.0,8.5,164000
4,Andrew Stanton,462.0,73058679.0,Action|Adventure|Sci-Fi,Daryl Sabara,John Carter,212204,738.0,English,263700000.0,2012.0,6.6,24000


In [205]:
# divide budget and gross by 1 million
new_movies['budget'] = new_movies['budget'] / 1000000  
new_movies['gross'] = new_movies['gross'] / 1000000
new_movies.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes
0,James Cameron,723.0,760.505847,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar,886204,3054.0,English,237.0,2009.0,7.9,33000
1,Gore Verbinski,302.0,309.404152,Action|Adventure|Fantasy,Johnny Depp,Pirates of the Caribbean: At World's End,471220,1238.0,English,300.0,2007.0,7.1,0
2,Sam Mendes,602.0,200.074175,Action|Adventure|Thriller,Christoph Waltz,Spectre,275868,994.0,English,245.0,2015.0,6.8,85000
3,Christopher Nolan,813.0,448.130642,Action|Thriller,Tom Hardy,The Dark Knight Rises,1144337,2701.0,English,250.0,2012.0,8.5,164000
4,Andrew Stanton,462.0,73.058679,Action|Adventure|Sci-Fi,Daryl Sabara,John Carter,212204,738.0,English,263.7,2012.0,6.6,24000


**Subtask 3.2: Find the movies with highest profit**

   1. Create a new column called `profit` which contains the difference of the two columns: `gross` and `budget`.
   2. Sort the dataframe using the `profit` column as reference. 
   3. Extract the top ten profiting movies in descending order and store them in a new dataframe - `top10`

In [206]:
# assign new column profit
new_movies['profit'] = new_movies['gross'] - new_movies['budget']
new_movies.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
0,James Cameron,723.0,760.505847,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar,886204,3054.0,English,237.0,2009.0,7.9,33000,523.505847
1,Gore Verbinski,302.0,309.404152,Action|Adventure|Fantasy,Johnny Depp,Pirates of the Caribbean: At World's End,471220,1238.0,English,300.0,2007.0,7.1,0,9.404152
2,Sam Mendes,602.0,200.074175,Action|Adventure|Thriller,Christoph Waltz,Spectre,275868,994.0,English,245.0,2015.0,6.8,85000,-44.925825
3,Christopher Nolan,813.0,448.130642,Action|Thriller,Tom Hardy,The Dark Knight Rises,1144337,2701.0,English,250.0,2012.0,8.5,164000,198.130642
4,Andrew Stanton,462.0,73.058679,Action|Adventure|Sci-Fi,Daryl Sabara,John Carter,212204,738.0,English,263.7,2012.0,6.6,24000,-190.641321


In [207]:
# sort
new_movies.sort_values('profit', ascending=False).head(10)

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
0,James Cameron,723.0,760.505847,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar,886204,3054.0,English,237.0,2009.0,7.9,33000,523.505847
28,Colin Trevorrow,644.0,652.177271,Action|Adventure|Sci-Fi|Thriller,Bryce Dallas Howard,Jurassic World,418214,1290.0,English,150.0,2015.0,7.0,150000,502.177271
25,James Cameron,315.0,658.672302,Drama|Romance,Leonardo DiCaprio,Titanic,793059,2528.0,English,200.0,1997.0,7.7,26000,458.672302
2704,George Lucas,282.0,460.935665,Action|Adventure|Fantasy|Sci-Fi,Harrison Ford,Star Wars: Episode IV - A New Hope,911097,1470.0,English,11.0,1977.0,8.7,33000,449.935665
2748,Steven Spielberg,215.0,434.949459,Family|Sci-Fi,Henry Thomas,E.T. the Extra-Terrestrial,281842,515.0,English,10.5,1982.0,7.9,34000,424.449459
16,Joss Whedon,703.0,623.279547,Action|Adventure|Sci-Fi,Chris Hemsworth,The Avengers,995415,1722.0,English,220.0,2012.0,8.1,123000,403.279547
482,Roger Allers,186.0,422.783777,Adventure|Animation|Drama|Family|Musical,Matthew Broderick,The Lion King,644348,656.0,English,45.0,1994.0,8.5,17000,377.783777
230,George Lucas,320.0,474.544677,Action|Adventure|Fantasy|Sci-Fi,Natalie Portman,Star Wars: Episode I - The Phantom Menace,534658,3597.0,English,115.0,1999.0,6.5,13000,359.544677
64,Christopher Nolan,645.0,533.316061,Action|Crime|Drama|Thriller,Christian Bale,The Dark Knight,1676169,4667.0,English,185.0,2008.0,9.0,37000,348.316061
419,Gary Ross,673.0,407.999255,Adventure|Drama|Sci-Fi|Thriller,Jennifer Lawrence,The Hunger Games,701607,1959.0,English,78.0,2012.0,7.3,140000,329.999255


In [208]:
# get the top 10 profiting movies

top10_movies = new_movies.sort_values('profit', ascending=False).head(10)
top10_movies

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
0,James Cameron,723.0,760.505847,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar,886204,3054.0,English,237.0,2009.0,7.9,33000,523.505847
28,Colin Trevorrow,644.0,652.177271,Action|Adventure|Sci-Fi|Thriller,Bryce Dallas Howard,Jurassic World,418214,1290.0,English,150.0,2015.0,7.0,150000,502.177271
25,James Cameron,315.0,658.672302,Drama|Romance,Leonardo DiCaprio,Titanic,793059,2528.0,English,200.0,1997.0,7.7,26000,458.672302
2704,George Lucas,282.0,460.935665,Action|Adventure|Fantasy|Sci-Fi,Harrison Ford,Star Wars: Episode IV - A New Hope,911097,1470.0,English,11.0,1977.0,8.7,33000,449.935665
2748,Steven Spielberg,215.0,434.949459,Family|Sci-Fi,Henry Thomas,E.T. the Extra-Terrestrial,281842,515.0,English,10.5,1982.0,7.9,34000,424.449459
16,Joss Whedon,703.0,623.279547,Action|Adventure|Sci-Fi,Chris Hemsworth,The Avengers,995415,1722.0,English,220.0,2012.0,8.1,123000,403.279547
482,Roger Allers,186.0,422.783777,Adventure|Animation|Drama|Family|Musical,Matthew Broderick,The Lion King,644348,656.0,English,45.0,1994.0,8.5,17000,377.783777
230,George Lucas,320.0,474.544677,Action|Adventure|Fantasy|Sci-Fi,Natalie Portman,Star Wars: Episode I - The Phantom Menace,534658,3597.0,English,115.0,1999.0,6.5,13000,359.544677
64,Christopher Nolan,645.0,533.316061,Action|Crime|Drama|Thriller,Christian Bale,The Dark Knight,1676169,4667.0,English,185.0,2008.0,9.0,37000,348.316061
419,Gary Ross,673.0,407.999255,Adventure|Drama|Sci-Fi|Thriller,Jennifer Lawrence,The Hunger Games,701607,1959.0,English,78.0,2012.0,7.3,140000,329.999255


- Six out of the top 10 films with highest profits, belong to the action genre.
- Movie with the highest profit is Avatar by James Cameron.
- There are two movies directed by James Cameron in the top 10 most profitable movies.

Which movie is ranked 5th from the top in the list obtained?

In [209]:
top10_movies[4:5]

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
2748,Steven Spielberg,215.0,434.949459,Family|Sci-Fi,Henry Thomas,E.T. the Extra-Terrestrial,281842,515.0,English,10.5,1982.0,7.9,34000,424.449459


Movie ranked 5th from the top is E.T. the Extra-Terrestrial.

**Subtask 3.3: Find IMDb Top 200**

Create a new dataframe `IMDb_Top_200` and store the top 200 movies with the highest IMDb Rating (column: `imdb_score`). And only get movies with the `num_voted_users` is greater than 25,000. 

Add `Rank` column containing the values 1 to 200 indicating the ranks of the corresponding films.

In [210]:
# New df 'IMDb_Top_200', sort values in descending order

IMDb_Top_200 = new_movies.sort_values('imdb_score', ascending=False)
IMDb_Top_200.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
1795,Frank Darabont,199.0,28.341469,Crime|Drama,Morgan Freeman,The Shawshank Redemption,1689764,4144.0,English,25.0,1994.0,9.3,108000,3.341469
3016,Francis Ford Coppola,208.0,134.821952,Crime|Drama,Al Pacino,The Godfather,1155770,2238.0,English,6.0,1972.0,9.2,43000,128.821952
2543,Francis Ford Coppola,149.0,57.3,Crime|Drama,Robert De Niro,The Godfather: Part II,790926,650.0,English,13.0,1974.0,9.0,14000,44.3
64,Christopher Nolan,645.0,533.316061,Action|Crime|Drama|Thriller,Christian Bale,The Dark Knight,1676169,4667.0,English,185.0,2008.0,9.0,37000,348.316061
325,Peter Jackson,328.0,377.019252,Action|Adventure|Drama|Fantasy,Orlando Bloom,The Lord of the Rings: The Return of the King,1215718,3189.0,English,94.0,2003.0,8.9,16000,283.019252


In [211]:
# only get rows with num_voted_users > 25000
IMDb_Top_200 = IMDb_Top_200[IMDb_Top_200.num_voted_users > 25000]
IMDb_Top_200

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
1795,Frank Darabont,199.0,28.341469,Crime|Drama,Morgan Freeman,The Shawshank Redemption,1689764,4144.0,English,25.0,1994.0,9.3,108000,3.341469
3016,Francis Ford Coppola,208.0,134.821952,Crime|Drama,Al Pacino,The Godfather,1155770,2238.0,English,6.0,1972.0,9.2,43000,128.821952
2543,Francis Ford Coppola,149.0,57.300000,Crime|Drama,Robert De Niro,The Godfather: Part II,790926,650.0,English,13.0,1974.0,9.0,14000,44.300000
64,Christopher Nolan,645.0,533.316061,Action|Crime|Drama|Thriller,Christian Bale,The Dark Knight,1676169,4667.0,English,185.0,2008.0,9.0,37000,348.316061
325,Peter Jackson,328.0,377.019252,Action|Adventure|Drama|Fantasy,Orlando Bloom,The Lord of the Rings: The Return of the King,1215718,3189.0,English,94.0,2003.0,8.9,16000,283.019252
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019,Jason Friedberg,112.0,39.737645,Adventure|Comedy,David Carradine,Epic Movie,89687,666.0,English,20.0,2007.0,2.3,0,19.737645
305,Lawrence Guterman,78.0,17.010646,Comedy|Family|Fantasy,Jamie Kennedy,Son of the Mask,40751,239.0,English,84.0,2005.0,2.2,881,-66.989354
2085,Jason Friedberg,111.0,14.174654,Comedy,Carmen Electra,Disaster Movie,74945,359.0,English,25.0,2008.0,1.9,0,-10.825346
2111,Bob Clark,32.0,9.109322,Comedy|Family|Sci-Fi,Scott Baio,Superbabies: Baby Geniuses 2,25371,129.0,English,20.0,2004.0,1.9,0,-10.890678


In [212]:
# return only the first 200 rows
IMDb_Top_200 = IMDb_Top_200[0:200]
IMDb_Top_200

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
1795,Frank Darabont,199.0,28.341469,Crime|Drama,Morgan Freeman,The Shawshank Redemption,1689764,4144.0,English,25.0,1994.0,9.3,108000,3.341469
3016,Francis Ford Coppola,208.0,134.821952,Crime|Drama,Al Pacino,The Godfather,1155770,2238.0,English,6.0,1972.0,9.2,43000,128.821952
2543,Francis Ford Coppola,149.0,57.300000,Crime|Drama,Robert De Niro,The Godfather: Part II,790926,650.0,English,13.0,1974.0,9.0,14000,44.300000
64,Christopher Nolan,645.0,533.316061,Action|Crime|Drama|Thriller,Christian Bale,The Dark Knight,1676169,4667.0,English,185.0,2008.0,9.0,37000,348.316061
325,Peter Jackson,328.0,377.019252,Action|Adventure|Drama|Fantasy,Orlando Bloom,The Lord of the Rings: The Return of the King,1215718,3189.0,English,94.0,2003.0,8.9,16000,283.019252
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1622,Ron Clements,124.0,217.350219,Adventure|Animation|Comedy|Family|Fantasy|Musi...,Robin Williams,Aladdin,260939,244.0,English,28.0,1992.0,8.0,0,189.350219
293,Edward Zwick,166.0,57.366262,Adventure|Drama|Thriller,Leonardo DiCaprio,Blood Diamond,400292,657.0,English,100.0,2006.0,8.0,14000,-42.633738
2304,Peter Weir,96.0,95.860116,Comedy|Drama,Robin Williams,Dead Poets Society,277451,491.0,English,16.4,1989.0,8.0,23000,79.460116
2360,Tom Hooper,479.0,138.795342,Biography|Drama|History|Romance,Colin Firth,The King's Speech,503631,636.0,English,15.0,2010.0,8.0,64000,123.795342


In [213]:
# create a new column called rank and move it to the first column

IMDb_Top_200 = IMDb_Top_200.copy() # make copy of the dataframe to prevent setting warning
IMDb_Top_200['rank'] = np.arange(1,201)
IMDb_Top_200.insert(0, 'Rank', IMDb_Top_200['rank'])
IMDb_Top_200.drop(columns=['rank'], inplace=True)
IMDb_Top_200.head(10)

Unnamed: 0,Rank,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
1795,1,Frank Darabont,199.0,28.341469,Crime|Drama,Morgan Freeman,The Shawshank Redemption,1689764,4144.0,English,25.0,1994.0,9.3,108000,3.341469
3016,2,Francis Ford Coppola,208.0,134.821952,Crime|Drama,Al Pacino,The Godfather,1155770,2238.0,English,6.0,1972.0,9.2,43000,128.821952
2543,3,Francis Ford Coppola,149.0,57.3,Crime|Drama,Robert De Niro,The Godfather: Part II,790926,650.0,English,13.0,1974.0,9.0,14000,44.3
64,4,Christopher Nolan,645.0,533.316061,Action|Crime|Drama|Thriller,Christian Bale,The Dark Knight,1676169,4667.0,English,185.0,2008.0,9.0,37000,348.316061
325,5,Peter Jackson,328.0,377.019252,Action|Adventure|Drama|Fantasy,Orlando Bloom,The Lord of the Rings: The Return of the King,1215718,3189.0,English,94.0,2003.0,8.9,16000,283.019252
3607,6,Sergio Leone,181.0,6.1,Western,Clint Eastwood,"The Good, the Bad and the Ugly",503509,780.0,Italian,1.2,1966.0,8.9,20000,4.9
2938,7,Quentin Tarantino,215.0,107.93,Crime|Drama,Bruce Willis,Pulp Fiction,1324680,2195.0,English,8.0,1994.0,8.9,45000,99.93
1737,8,Steven Spielberg,174.0,96.067179,Biography|Drama|History,Liam Neeson,Schindler's List,865020,1273.0,English,22.0,1993.0,8.9,41000,74.067179
94,9,Christopher Nolan,642.0,292.568851,Action|Adventure|Sci-Fi|Thriller,Leonardo DiCaprio,Inception,1468200,2803.0,English,160.0,2010.0,8.8,175000,132.568851
646,10,David Fincher,315.0,37.023395,Drama,Brad Pitt,Fight Club,1347461,2968.0,English,63.0,1999.0,8.8,48000,-25.976605


In [214]:
# return top 10 profits from the imdb top 200
IMDb_Top_200.sort_values('profit', ascending=False).head(10)

Unnamed: 0,Rank,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
2704,18,George Lucas,282.0,460.935665,Action|Adventure|Fantasy|Sci-Fi,Harrison Ford,Star Wars: Episode IV - A New Hope,911097,1470.0,English,11.0,1977.0,8.7,33000,449.935665
16,119,Joss Whedon,703.0,623.279547,Action|Adventure|Sci-Fi,Chris Hemsworth,The Avengers,995415,1722.0,English,220.0,2012.0,8.1,123000,403.279547
482,46,Roger Allers,186.0,422.783777,Adventure|Animation|Drama|Family|Musical,Matthew Broderick,The Lion King,644348,656.0,English,45.0,1994.0,8.5,17000,377.783777
64,4,Christopher Nolan,645.0,533.316061,Action|Crime|Drama|Thriller,Christian Bale,The Dark Knight,1676169,4667.0,English,185.0,2008.0,9.0,37000,348.316061
767,108,Tim Miller,579.0,363.024263,Action|Adventure|Comedy|Romance|Sci-Fi,Ryan Reynolds,Deadpool,479047,1058.0,English,58.0,2016.0,8.1,117000,305.024263
659,109,Steven Spielberg,308.0,356.784,Adventure|Sci-Fi|Thriller,Wayne Knight,Jurassic Park,613473,895.0,English,63.0,1993.0,8.1,19000,293.784
324,102,Andrew Stanton,301.0,380.83887,Adventure|Animation|Comedy|Family,Alexander Gould,Finding Nemo,692482,866.0,English,94.0,2003.0,8.2,11000,286.83887
325,5,Peter Jackson,328.0,377.019252,Action|Adventure|Drama|Fantasy,Orlando Bloom,The Lord of the Rings: The Return of the King,1215718,3189.0,English,94.0,2003.0,8.9,16000,283.019252
1435,49,Richard Marquand,197.0,309.125409,Action|Adventure|Fantasy|Sci-Fi,Harrison Ford,Star Wars: Episode VI - Return of the Jedi,681857,647.0,English,32.5,1983.0,8.4,14000,276.625409
787,12,Robert Zemeckis,149.0,329.691196,Comedy|Drama,Tom Hanks,Forrest Gump,1251222,1398.0,English,55.0,1994.0,8.8,59000,274.691196


- The top 5 movies with the highest IMDB score belongs to crime and action genre.
- Although at rank 18th, Star Wars: Episode IV - A New Hope by George Lucas, secures first place in terms of profit. Likely due to the big fanbase of the Star Wars series.
- The top 10 highest profits belongs to mostly action and adventure genre.
- The success and profitability of a movie are likely influenced by factors such as belonging to the action genre and having a devoted fanbase like Marvel and Star Wars.

Suppose movies are divided into 5 buckets based on the IMDb ratings: 
-  7.5 to 8
-  8 to 8.5
-  8.5 to 9
-  9 to 9.5
-  9.5 to 10

Which bucket holds the maximum number of movies from 'IMDb_Top_200'?

In [215]:
# filter values by IMDB rating buckets
print("Number of movies by IMDb rating")
print("7.5 to 8 = " , IMDb_Top_200[(IMDb_Top_200['imdb_score'] >= 7.5) & (IMDb_Top_200['imdb_score'] <= 8)].shape[0])
print("8 to 8.5 = " , IMDb_Top_200[(IMDb_Top_200['imdb_score'] >= 8) & (IMDb_Top_200['imdb_score'] <= 8.5)].shape[0])
print("8.5 to 9 = " , IMDb_Top_200[(IMDb_Top_200['imdb_score'] >= 8.5) & (IMDb_Top_200['imdb_score'] <= 9)].shape[0])
print("9 to 9.5 = " , IMDb_Top_200[(IMDb_Top_200['imdb_score'] >= 9) & (IMDb_Top_200['imdb_score'] <= 9.5)].shape[0])
print("9.5 to 10 = " , IMDb_Top_200[(IMDb_Top_200['imdb_score'] >= 9.5) & (IMDb_Top_200['imdb_score'] <= 10.5)].shape[0])

print("Bucket with Max number of movies is 8 to 8.5.")

Number of movies by IMDb rating
7.5 to 8 =  48
8 to 8.5 =  172
8.5 to 9 =  44
9 to 9.5 =  4
9.5 to 10 =  0
Bucket with Max number of movies is 8 to 8.5.


**Subtask 3.4: Find the critic-favorite and audience-favorite actors**

   1. From the `new_movies` dataframe, create three new dataframes namely, `Meryl_Streep`, `Leo_Caprio`, and `Brad_Pitt` which contain the movies in which the actors: 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt' are the lead actors.
   2. Append the rows of all these dataframes and store them in a new dataframe named `Combined`.
   3. Group the combined dataframe using the `actor_1_name` column.
   4. Find the mean of the `num_critic_for_reviews` and `num_user_for_review` and identify the actors which have the highest mean.

In [216]:
# Meryl Streep
Meryl_Streep = new_movies[new_movies.actor_1_name == "Meryl Streep"]
Meryl_Streep.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
392,Nancy Meyers,187.0,112.70347,Comedy|Drama|Romance,Meryl Streep,It's Complicated,69860,214.0,English,85.0,2009.0,6.6,0,27.70347
1038,Curtis Hanson,42.0,46.815748,Action|Adventure|Crime|Thriller,Meryl Streep,The River Wild,32544,69.0,English,45.0,1994.0,6.3,0,1.815748
1132,Nora Ephron,252.0,94.125426,Biography|Drama|Romance,Meryl Streep,Julie & Julia,79264,277.0,English,40.0,2009.0,7.0,13000,54.125426
1322,David Frankel,208.0,124.732962,Comedy|Drama|Romance,Meryl Streep,The Devil Wears Prada,286178,631.0,English,35.0,2006.0,6.8,0,89.732962
1390,Robert Redford,227.0,14.99807,Drama|Thriller|War,Meryl Streep,Lions for Lambs,41170,298.0,English,35.0,2007.0,6.2,0,-20.00193


In [217]:
# highest imdb score for movies with meryl streep

Meryl_Streep.sort_values('imdb_score', ascending=False)[0:1]

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
1784,Stephen Daldry,174.0,41.59783,Drama|Romance,Meryl Streep,The Hours,102123,660.0,English,25.0,2002.0,7.6,0,16.59783


- The Hours is the highest rated movie with Meryl Streep at 7.6 IMDB score.

In [218]:
#Leo_Caprio 
Leo_Caprio = new_movies[new_movies.actor_1_name == "Leonardo DiCaprio"]
Leo_Caprio.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
25,James Cameron,315.0,658.672302,Drama|Romance,Leonardo DiCaprio,Titanic,793059,2528.0,English,200.0,1997.0,7.7,26000,458.672302
49,Baz Luhrmann,490.0,144.812796,Drama|Romance,Leonardo DiCaprio,The Great Gatsby,362912,753.0,English,105.0,2013.0,7.3,115000,39.812796
94,Christopher Nolan,642.0,292.568851,Action|Adventure|Sci-Fi|Thriller,Leonardo DiCaprio,Inception,1468200,2803.0,English,160.0,2010.0,8.8,175000,132.568851
173,Alejandro G. Iñárritu,556.0,183.635922,Adventure|Drama|Thriller|Western,Leonardo DiCaprio,The Revenant,406020,1188.0,English,135.0,2015.0,8.1,190000,48.635922
246,Martin Scorsese,267.0,102.608827,Biography|Drama,Leonardo DiCaprio,The Aviator,264318,799.0,English,110.0,2004.0,7.5,0,-7.391173


In [219]:
# highest imdb score for movies with leo

Leo_Caprio.sort_values('imdb_score', ascending=False)[0:1]

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
94,Christopher Nolan,642.0,292.568851,Action|Adventure|Sci-Fi|Thriller,Leonardo DiCaprio,Inception,1468200,2803.0,English,160.0,2010.0,8.8,175000,132.568851


- Inception is the highest rated movie with Leonardo Dicaprio at 8.8 IMDB score.

In [220]:
#Brad_Pitt
Brad_Pitt = new_movies[new_movies.actor_1_name == "Brad Pitt"]
Brad_Pitt.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
97,David Fincher,362.0,127.490802,Drama|Fantasy|Romance,Brad Pitt,The Curious Case of Benjamin Button,459346,822.0,English,150.0,2008.0,7.8,23000,-22.509198
142,Wolfgang Petersen,220.0,133.228348,Adventure,Brad Pitt,Troy,381672,1694.0,English,175.0,2004.0,7.2,0,-41.771652
243,Steven Soderbergh,198.0,125.531634,Crime|Thriller,Brad Pitt,Ocean's Twelve,284852,627.0,English,110.0,2004.0,6.4,0,15.531634
244,Doug Liman,233.0,186.336103,Action|Comedy|Crime|Romance|Thriller,Brad Pitt,Mr. & Mrs. Smith,348861,798.0,English,120.0,2005.0,6.5,0,66.336103
367,Tony Scott,142.0,0.026871,Action|Crime|Thriller,Brad Pitt,Spy Game,121259,361.0,English,92.0,2001.0,7.0,0,-91.973129


In [221]:
# highest imdb score for movies with brad pitt

Brad_Pitt.sort_values('imdb_score', ascending=False)[0:1]

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
646,David Fincher,315.0,37.023395,Drama,Brad Pitt,Fight Club,1347461,2968.0,English,63.0,1999.0,8.8,48000,-25.976605


- Highest rated movie with Brad Pitt is Fight Club at 8.8 IMDB score.

In [222]:
# append all three dataframes of the actors

Combined = pd.concat([Meryl_Streep, Leo_Caprio, Brad_Pitt])
Combined

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
392,Nancy Meyers,187.0,112.70347,Comedy|Drama|Romance,Meryl Streep,It's Complicated,69860,214.0,English,85.0,2009.0,6.6,0,27.70347
1038,Curtis Hanson,42.0,46.815748,Action|Adventure|Crime|Thriller,Meryl Streep,The River Wild,32544,69.0,English,45.0,1994.0,6.3,0,1.815748
1132,Nora Ephron,252.0,94.125426,Biography|Drama|Romance,Meryl Streep,Julie & Julia,79264,277.0,English,40.0,2009.0,7.0,13000,54.125426
1322,David Frankel,208.0,124.732962,Comedy|Drama|Romance,Meryl Streep,The Devil Wears Prada,286178,631.0,English,35.0,2006.0,6.8,0,89.732962
1390,Robert Redford,227.0,14.99807,Drama|Thriller|War,Meryl Streep,Lions for Lambs,41170,298.0,English,35.0,2007.0,6.2,0,-20.00193
1471,Sydney Pollack,66.0,87.1,Biography|Drama|Romance,Meryl Streep,Out of Africa,52339,200.0,English,31.0,1985.0,7.2,0,56.1
1514,David Frankel,234.0,63.536011,Comedy|Drama|Romance,Meryl Streep,Hope Springs,34258,178.0,English,30.0,2012.0,6.3,0,33.536011
1563,Carl Franklin,64.0,23.20944,Drama,Meryl Streep,One True Thing,9283,112.0,English,30.0,1998.0,7.0,592,-6.79056
1784,Stephen Daldry,174.0,41.59783,Drama|Romance,Meryl Streep,The Hours,102123,660.0,English,25.0,2002.0,7.6,0,16.59783
2500,Phyllida Lloyd,331.0,29.959436,Biography|Drama|History,Meryl Streep,The Iron Lady,82327,350.0,English,13.0,2011.0,6.4,18000,16.959436


In [223]:
# Find mean of critic reviews and audience reviews 

Combined.groupby('actor_1_name')[['num_critic_for_reviews','num_user_for_reviews']].mean(numeric_only=True)

Unnamed: 0_level_0,num_critic_for_reviews,num_user_for_reviews
actor_1_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Brad Pitt,245.0,742.352941
Leonardo DiCaprio,330.190476,914.47619
Meryl Streep,181.454545,297.181818


- Actor with the highest mean of critic reviews: Leonardo Dicaprio
- Actor with the highest mean of audience reviews: Leonardo Dicaprio

Which actor is highest rated among the three actors according to the user reviews?

In [224]:
# sort descending by user reviews and get top row

Combined.sort_values('num_user_for_reviews', ascending=False)[0:1]

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
646,David Fincher,315.0,37.023395,Drama,Brad Pitt,Fight Club,1347461,2968.0,English,63.0,1999.0,8.8,48000,-25.976605


The highest rated among the three actors according to the user reviews is Brad Pitt.

Which actor is highest rated among the three actors according to the critics?

In [225]:
# sort descending by critic reviews

Combined.sort_values('num_critic_for_reviews', ascending=False)[0:1]

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
283,Quentin Tarantino,765.0,162.804648,Drama|Western,Leonardo DiCaprio,Django Unchained,955174,1193.0,English,100.0,2012.0,8.5,199000,62.804648


The highest rated among the three actors according to the critics is Leonardo Dicaprio.

Which movie starting with the letter A, has the 3rd highest profit?

In [226]:
# using the new_movies dataframe. create new df that includes movies that starts with A
A_movie = new_movies[new_movies['movie_title'].str.startswith('A')]
A_movie.head()

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
0,James Cameron,723.0,760.505847,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar,886204,3054.0,English,237.0,2009.0,7.9,33000,523.505847
7,Joss Whedon,635.0,458.991599,Action|Adventure|Sci-Fi,Chris Hemsworth,Avengers: Age of Ultron,462669,1117.0,English,250.0,2015.0,7.5,118000,208.991599
32,Tim Burton,451.0,334.185206,Adventure|Family|Fantasy,Johnny Depp,Alice in Wonderland,306320,736.0,English,200.0,2010.0,6.5,24000,134.185206
59,Robert Zemeckis,240.0,137.850096,Animation|Drama|Family|Fantasy,Robin Wright,A Christmas Carol,72809,249.0,English,200.0,2009.0,6.8,0,-62.149904
102,James Bobin,218.0,76.846624,Adventure|Family|Fantasy,Johnny Depp,Alice Through the Looking Glass,21352,131.0,English,170.0,2016.0,6.4,30000,-93.153376


In [227]:
# sort descending and return movie with the 3rd highest profit
A_movie.sort_values('profit', ascending=False)[2:3]

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
7,Joss Whedon,635.0,458.991599,Action|Adventure|Sci-Fi,Chris Hemsworth,Avengers: Age of Ultron,462669,1117.0,English,250.0,2015.0,7.5,118000,208.991599


The movie starting with the letter A with the 3rd highest profit is Avengers: Age of Ultron.

Among the movies starting with the letter A, who has the highest imdb score?

In [228]:
# sort descending and return the top row

A_movie.sort_values('imdb_score', ascending=False)[0:1]

Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
2826,Tony Kaye,162.0,6.712241,Crime|Drama,Ethan Suplee,American History X,782437,1420.0,English,7.5,1998.0,8.6,35000,-0.787759


The movie with the highest imdb score among the movies starting with the letter A is American History X.

Among the movies that '*Joss Whedon*' has directed, which of them made the highest profit?

In [229]:
# get rows where director name is Joss Whedon and sort descending and return the top row

new_movies[new_movies['director_name'] == 'Joss Whedon'].sort_values('profit', ascending=False)[0:1]


Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
16,Joss Whedon,703.0,623.279547,Action|Adventure|Sci-Fi,Chris Hemsworth,The Avengers,995415,1722.0,English,220.0,2012.0,8.1,123000,403.279547


The movie directed by Josh Whedon with the highest profit is The Avengers.

Among the movies that were released in the year '*2007*' & '*2008*', which of them had the 2nd highest imdb score?

In [230]:
# get rows with movies released in 2007 and 2008, sort by imdb_score in descending order and display the 2nd row

new_movies[(new_movies['title_year'] == 2007.0) | (new_movies['title_year'] == 2008.0)].sort_values('imdb_score', ascending=False)[1:2]


Unnamed: 0,director_name,num_critic_for_reviews,gross,genres,actor_1_name,movie_title,num_voted_users,num_user_for_reviews,language,budget,title_year,imdb_score,movie_facebook_likes,profit
56,Andrew Stanton,421.0,223.806889,Adventure|Animation|Family|Sci-Fi,John Ratzenberger,WALL·E,718837,1043.0,English,180.0,2008.0,8.4,16000,43.806889


Among the movies that were released in the year '2007' & '2008', the movie with the 2nd highest imdb score is WALL·E.

### Conclusion

- **Action Dominance in Top Profits and IMDb Scores:**
  - Six out of the top 10 highest-profit movies are in the action genre.
  - The movie with the highest profit is "Avatar" by James Cameron.
  - There are two movies directed by James Cameron in the top profit list.
  - The top 5 movies with the highest IMDb scores belong to the crime and action genres.
  - "Star Wars: Episode IV - A New Hope" by George Lucas, ranks 18th but leads in profit, likely due to its large fanbase.
  - The majority of the top 10 highest profits come from action and adventure genres. 
  <br>
  <br>
- **Distribution of Top 200 Movies by IMDb Ratings:**
  - 48 movies have IMDb ratings between 7.5 and 8.
  - 172 movies have IMDb ratings between 8 and 8.5.
  - 44 movies have IMDb ratings between 8.5 and 9.
  - 4 movies have IMDb ratings between 9 and 9.5.
  - No movies have IMDb ratings between 9.5 and 10.
  - The bucket with the maximum number of movies is between 8 and 8.5.
<br>
  <br>
- **Top Rated Movies by Specific Actors:**
  - "The Hours" is the highest-rated movie featuring Meryl Streep, with an IMDb score of 7.6.
  - "Fight Club" is the highest-rated movie featuring Brad Pitt, with an IMDb score of 8.8.
  - "Inception" is the highest-rated movie featuring Leonardo DiCaprio, also with an IMDb score of 8.8.
<br>
  <br>
- **Analysis of Actor Reviews:**
  - Leonardo DiCaprio has the highest mean of both critic and audience reviews.
  - According to user reviews, "Fight Club" featuring Brad Pitt is the highest-rated movie among the three actors.
  - According to critics, "Django Unchained" featuring Leonardo DiCaprio is the highest-rated movie among the three actors.