# Exploratory Data Analysis of IMDb's Top Global Movies (1950-2020)

This notebook contains the exploratory data analysis (EDA) for the dataset of IMDb's top global movies from 1950 to 2020. The analysis includes data loading, preprocessing, statistical analysis, and visualizations.

In [61]:
import pandas as pd

## Load Data

In this section, we will load the processed data from the `data/` directory.

In [62]:
# Load the processed data
data_path = '../data/imdb_top_movies.csv'
df = pd.read_csv(data_path)

# Display the first few rows of the dataset
df.head()

Unnamed: 0,Title,Year,Rating,Genre,Director(s),Box Office Revenue,Lead Actors
0,1. The Shawshank Redemption,1994,9.3 (3M),"Epic, Period Drama, Prison Drama, Drama","Bob Gunton, Frank Darabont, Morgan Freeman, Ti...","Gross worldwide$29,332,133","Bob Gunton, Tim Robbins, Morgan Freeman"
1,2. The Godfather,1972,9.2 (2.1M),"Epic, Gangster, Tragedy, Crime, Drama","Al Pacino, Marlon Brando, Mario Puzo, Peter Cl...","Gross worldwide$250,342,198","Al Pacino, Marlon Brando, James Caan"
2,3. The Dark Knight,2008,9.0 (3M),"Action Epic, Epic, Superhero, Tragedy, Action,...","Salvatore Maroni, Michael Caine, Christian Bal...","Gross worldwide$1,009,057,329","Christian Bale, Aaron Eckhart, Heath Ledger"
3,4. The Godfather Part II,1974,9.0 (1.4M),"Epic, Gangster, Tragedy, Crime, Drama","Livio Giorgi, Al Pacino, Mario Puzo, Francis F...","Gross worldwide$47,964,222","Al Pacino, Robert De Niro, Robert Duvall"
4,5. 12 Angry Men,1957,9.0 (917K),"Legal Drama, Psychological Drama, Crime, Drama","Jack Warden, Lee J. Cobb, Sidney Lumet, Regina...","Gross worldwide$2,945","Henry Fonda, Martin Balsam, Lee J. Cobb"


## Data Overview

Let's take a look at the basic statistics and structure of the dataset.

In [63]:
# Display basic statistics
df.describe(include='all')

# Display the data types and missing values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 7 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Title               250 non-null    object
 1   Year                250 non-null    int64 
 2   Rating              250 non-null    object
 3   Genre               250 non-null    object
 4   Director(s)         250 non-null    object
 5   Box Office Revenue  250 non-null    object
 6   Lead Actors         250 non-null    object
dtypes: int64(1), object(6)
memory usage: 13.8+ KB


In [64]:
df.iloc[0]

Title                                       1. The Shawshank Redemption
Year                                                               1994
Rating                                                         9.3 (3M)
Genre                           Epic, Period Drama, Prison Drama, Drama
Director(s)           Bob Gunton, Frank Darabont, Morgan Freeman, Ti...
Box Office Revenue                           Gross worldwide$29,332,133
Lead Actors                     Bob Gunton, Tim Robbins, Morgan Freeman
Name: 0, dtype: object

In [65]:
# Check for duplicates
df.duplicated().sum()

0

In [66]:
print(df.isnull().sum())

Title                 0
Year                  0
Rating                0
Genre                 0
Director(s)           0
Box Office Revenue    0
Lead Actors           0
dtype: int64


In [67]:
# Rename all columns
df.columns = ['title', 'year', 'rating', 'genre', 'directors', 'revenue', 'lead_actors']
df

Unnamed: 0,title,year,rating,genre,directors,revenue,lead_actors
0,1. The Shawshank Redemption,1994,9.3 (3M),"Epic, Period Drama, Prison Drama, Drama","Bob Gunton, Frank Darabont, Morgan Freeman, Ti...","Gross worldwide$29,332,133","Bob Gunton, Tim Robbins, Morgan Freeman"
1,2. The Godfather,1972,9.2 (2.1M),"Epic, Gangster, Tragedy, Crime, Drama","Al Pacino, Marlon Brando, Mario Puzo, Peter Cl...","Gross worldwide$250,342,198","Al Pacino, Marlon Brando, James Caan"
2,3. The Dark Knight,2008,9.0 (3M),"Action Epic, Epic, Superhero, Tragedy, Action,...","Salvatore Maroni, Michael Caine, Christian Bal...","Gross worldwide$1,009,057,329","Christian Bale, Aaron Eckhart, Heath Ledger"
3,4. The Godfather Part II,1974,9.0 (1.4M),"Epic, Gangster, Tragedy, Crime, Drama","Livio Giorgi, Al Pacino, Mario Puzo, Francis F...","Gross worldwide$47,964,222","Al Pacino, Robert De Niro, Robert Duvall"
4,5. 12 Angry Men,1957,9.0 (917K),"Legal Drama, Psychological Drama, Crime, Drama","Jack Warden, Lee J. Cobb, Sidney Lumet, Regina...","Gross worldwide$2,945","Henry Fonda, Martin Balsam, Lee J. Cobb"
...,...,...,...,...,...,...,...
245,246. A Silent Voice: The Movie,2016,8.1 (117K),"Anime, Coming-of-Age, Psychological Drama, Shō...","Reiko Yoshida, Pete Townshend, Lexi Marman, Mi...","Gross worldwide$30,819,442","Saori Hayami, Miyu Irino, Aoi Yûki"
246,247. The Help,2011,8.1 (510K),"Period Drama, Drama","Emma Stone, Octavia Spencer, Johnny Cash, Hill...","Gross worldwide$221,802,186","Octavia Spencer, Emma Stone, Viola Davis"
247,248. Amores Perros,2000,8.0 (261K),"Tragedy, Drama, Thriller","Emilio Echevarría, Goya Toledo, Guillermo Arri...","Gross worldwide$20,908,467","Emilio Echevarría, Gael García Bernal, Goya To..."
248,249. Rebecca,1940,8.1 (153K),"Dark Romance, Psychological Drama, Psychologic...","Laurence Olivier, The Second Mrs. de Winter, J...","Gross worldwide$113,328","Laurence Olivier, Joan Fontaine, George Sanders"


In [68]:
df.columns

Index(['title', 'year', 'rating', 'genre', 'directors', 'revenue',
       'lead_actors'],
      dtype='object')

### Rearranging and Filtering Dataset Values

To ensure the dataset is clean and meets the requirements for analysis, we will perform the following steps:

1. **Filter the `title` Column**: Ensure that the `title` column contains only alphabetic characters by removing any non-alphabetic values.
2. **Filter the `rating` Column**: Ensure that the `rating` column contains only numeric values.
3. **Filter the `revenue` Column**: Ensure that the `revenue` column contains only numeric values.

In [69]:
#divide the rating column into two columns with rating and votes
df[['rating', 'votes']] = df['rating'].str.split("(", expand=True)
print(df.head())

                         title  year rating  \
0  1. The Shawshank Redemption  1994   9.3    
1             2. The Godfather  1972   9.2    
2           3. The Dark Knight  2008   9.0    
3     4. The Godfather Part II  1974   9.0    
4              5. 12 Angry Men  1957   9.0    

                                               genre  \
0            Epic, Period Drama, Prison Drama, Drama   
1              Epic, Gangster, Tragedy, Crime, Drama   
2  Action Epic, Epic, Superhero, Tragedy, Action,...   
3              Epic, Gangster, Tragedy, Crime, Drama   
4     Legal Drama, Psychological Drama, Crime, Drama   

                                           directors  \
0  Bob Gunton, Frank Darabont, Morgan Freeman, Ti...   
1  Al Pacino, Marlon Brando, Mario Puzo, Peter Cl...   
2  Salvatore Maroni, Michael Caine, Christian Bal...   
3  Livio Giorgi, Al Pacino, Mario Puzo, Francis F...   
4  Jack Warden, Lee J. Cobb, Sidney Lumet, Regina...   

                         revenue           

In [70]:
df.columns

Index(['title', 'year', 'rating', 'genre', 'directors', 'revenue',
       'lead_actors', 'votes'],
      dtype='object')

In [71]:
df.head()

Unnamed: 0,title,year,rating,genre,directors,revenue,lead_actors,votes
0,1. The Shawshank Redemption,1994,9.3,"Epic, Period Drama, Prison Drama, Drama","Bob Gunton, Frank Darabont, Morgan Freeman, Ti...","Gross worldwide$29,332,133","Bob Gunton, Tim Robbins, Morgan Freeman",3M)
1,2. The Godfather,1972,9.2,"Epic, Gangster, Tragedy, Crime, Drama","Al Pacino, Marlon Brando, Mario Puzo, Peter Cl...","Gross worldwide$250,342,198","Al Pacino, Marlon Brando, James Caan",2.1M)
2,3. The Dark Knight,2008,9.0,"Action Epic, Epic, Superhero, Tragedy, Action,...","Salvatore Maroni, Michael Caine, Christian Bal...","Gross worldwide$1,009,057,329","Christian Bale, Aaron Eckhart, Heath Ledger",3M)
3,4. The Godfather Part II,1974,9.0,"Epic, Gangster, Tragedy, Crime, Drama","Livio Giorgi, Al Pacino, Mario Puzo, Francis F...","Gross worldwide$47,964,222","Al Pacino, Robert De Niro, Robert Duvall",1.4M)
4,5. 12 Angry Men,1957,9.0,"Legal Drama, Psychological Drama, Crime, Drama","Jack Warden, Lee J. Cobb, Sidney Lumet, Regina...","Gross worldwide$2,945","Henry Fonda, Martin Balsam, Lee J. Cobb",917K)


In [72]:
#remove the votes column from dataframe
df = df.drop('votes', axis=1)
df

Unnamed: 0,title,year,rating,genre,directors,revenue,lead_actors
0,1. The Shawshank Redemption,1994,9.3,"Epic, Period Drama, Prison Drama, Drama","Bob Gunton, Frank Darabont, Morgan Freeman, Ti...","Gross worldwide$29,332,133","Bob Gunton, Tim Robbins, Morgan Freeman"
1,2. The Godfather,1972,9.2,"Epic, Gangster, Tragedy, Crime, Drama","Al Pacino, Marlon Brando, Mario Puzo, Peter Cl...","Gross worldwide$250,342,198","Al Pacino, Marlon Brando, James Caan"
2,3. The Dark Knight,2008,9.0,"Action Epic, Epic, Superhero, Tragedy, Action,...","Salvatore Maroni, Michael Caine, Christian Bal...","Gross worldwide$1,009,057,329","Christian Bale, Aaron Eckhart, Heath Ledger"
3,4. The Godfather Part II,1974,9.0,"Epic, Gangster, Tragedy, Crime, Drama","Livio Giorgi, Al Pacino, Mario Puzo, Francis F...","Gross worldwide$47,964,222","Al Pacino, Robert De Niro, Robert Duvall"
4,5. 12 Angry Men,1957,9.0,"Legal Drama, Psychological Drama, Crime, Drama","Jack Warden, Lee J. Cobb, Sidney Lumet, Regina...","Gross worldwide$2,945","Henry Fonda, Martin Balsam, Lee J. Cobb"
...,...,...,...,...,...,...,...
245,246. A Silent Voice: The Movie,2016,8.1,"Anime, Coming-of-Age, Psychological Drama, Shō...","Reiko Yoshida, Pete Townshend, Lexi Marman, Mi...","Gross worldwide$30,819,442","Saori Hayami, Miyu Irino, Aoi Yûki"
246,247. The Help,2011,8.1,"Period Drama, Drama","Emma Stone, Octavia Spencer, Johnny Cash, Hill...","Gross worldwide$221,802,186","Octavia Spencer, Emma Stone, Viola Davis"
247,248. Amores Perros,2000,8.0,"Tragedy, Drama, Thriller","Emilio Echevarría, Goya Toledo, Guillermo Arri...","Gross worldwide$20,908,467","Emilio Echevarría, Gael García Bernal, Goya To..."
248,249. Rebecca,1940,8.1,"Dark Romance, Psychological Drama, Psychologic...","Laurence Olivier, The Second Mrs. de Winter, J...","Gross worldwide$113,328","Laurence Olivier, Joan Fontaine, George Sanders"


In [73]:
#datatypes of columns
df.dtypes

title          object
year            int64
rating         object
genre          object
directors      object
revenue        object
lead_actors    object
dtype: object

In [74]:
#convert the rating column to float
df['rating'] = df['rating'].astype(float)
df.dtypes

title           object
year             int64
rating         float64
genre           object
directors       object
revenue         object
lead_actors     object
dtype: object

In [75]:
#check for missing values like unknown or NA
df.isin(['Unknown', 'NA']).sum()

title          0
year           0
rating         0
genre          0
directors      0
revenue        4
lead_actors    1
dtype: int64

In [76]:
# check what are the Unknown values in the revenue column
df[df['revenue'] == 'Unknown']

Unnamed: 0,title,year,rating,genre,directors,revenue,lead_actors
127,128. Hamilton,2020,8.3,"Epic, Biography, Drama, History, Musical","Okieriete Onaodowan, Phillipa Soo, Thomas Kail...",Unknown,"Leslie Odom Jr., Phillipa Soo, Lin-Manuel Miranda"
163,164. Klaus,2019,8.2,"Hand-Drawn Animation, Holiday Animation, Holid...","Zara Larsson, Klaus, Justin Tranter, Sergio Pa...",Unknown,"Jason Schwartzman, Rashida Jones, J.K. Simmons"
222,223. Jai Bhim,2021,8.7,"Legal Drama, Crime, Drama","T.J. Gnanavel, Rajendra Sapre, Suriya, Lijo Mo...",Unknown,"Suriya, Lijo Mol Jose, Manikandan K."
249,250. Drishyam,2015,8.2,"Crime, Drama, Mystery, Thriller","Upendra Sidhaye, Ash King, Gulzar, Meera Deshm...",Unknown,"Shriya Saran, Ajay Devgn, Tabu"


In [77]:
# Findings from the internet to validate the actual data
# replace the Unknown values for spescific columns by name of the movie

df.loc[df['title'].str.contains('128. Hamilton'), 'revenue'] = 'Gross worldwide$1,000,000,000'  # $1 billion
df.loc[df['title'].str.contains('164. Klaus'), 'revenue'] = 'Gross worldwide$4,000,000,000'  # $40 million
df.loc[df['title'].str.contains('250. Drishyam'), 'revenue'] = 'Gross worldwide$1,100,000,000'  # $1.1 billion

In [78]:
# Due to the nature of the data, we will remove the Jai Bhim movie from the dataset for analysis
# Remove movie with the title "Jai Bhim"
df = df[~df['title'].str.contains('223. Jai Bhim')]

# Verify the updates
# Filter rows where revenue is zero
zero_revenue_rows = df[df['revenue'] == 0]

# Print the rows
print(zero_revenue_rows)

Empty DataFrame
Columns: [title, year, rating, genre, directors, revenue, lead_actors]
Index: []


In [79]:
df.dtypes

title           object
year             int64
rating         float64
genre           object
directors       object
revenue         object
lead_actors     object
dtype: object

In [80]:
print(df[['revenue']])

                           revenue
0       Gross worldwide$29,332,133
1      Gross worldwide$250,342,198
2    Gross worldwide$1,009,057,329
3       Gross worldwide$47,964,222
4            Gross worldwide$2,945
..                             ...
245     Gross worldwide$30,819,442
246    Gross worldwide$221,802,186
247     Gross worldwide$20,908,467
248        Gross worldwide$113,328
249  Gross worldwide$1,100,000,000

[249 rows x 1 columns]


In [81]:
# Remove non-numeric characters from the 'revenue' column
df['revenue'] = (
    df['revenue']
    .str.replace('Gross worldwide', '', regex=False)  # Remove the 'Gross worldwide' text
    .str.replace(r'[\$,]', '', regex=True)            # Remove dollar signs and commas
)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['revenue'] = (


Unnamed: 0,title,year,rating,genre,directors,revenue,lead_actors
0,1. The Shawshank Redemption,1994,9.3,"Epic, Period Drama, Prison Drama, Drama","Bob Gunton, Frank Darabont, Morgan Freeman, Ti...",29332133,"Bob Gunton, Tim Robbins, Morgan Freeman"
1,2. The Godfather,1972,9.2,"Epic, Gangster, Tragedy, Crime, Drama","Al Pacino, Marlon Brando, Mario Puzo, Peter Cl...",250342198,"Al Pacino, Marlon Brando, James Caan"
2,3. The Dark Knight,2008,9.0,"Action Epic, Epic, Superhero, Tragedy, Action,...","Salvatore Maroni, Michael Caine, Christian Bal...",1009057329,"Christian Bale, Aaron Eckhart, Heath Ledger"
3,4. The Godfather Part II,1974,9.0,"Epic, Gangster, Tragedy, Crime, Drama","Livio Giorgi, Al Pacino, Mario Puzo, Francis F...",47964222,"Al Pacino, Robert De Niro, Robert Duvall"
4,5. 12 Angry Men,1957,9.0,"Legal Drama, Psychological Drama, Crime, Drama","Jack Warden, Lee J. Cobb, Sidney Lumet, Regina...",2945,"Henry Fonda, Martin Balsam, Lee J. Cobb"
...,...,...,...,...,...,...,...
245,246. A Silent Voice: The Movie,2016,8.1,"Anime, Coming-of-Age, Psychological Drama, Shō...","Reiko Yoshida, Pete Townshend, Lexi Marman, Mi...",30819442,"Saori Hayami, Miyu Irino, Aoi Yûki"
246,247. The Help,2011,8.1,"Period Drama, Drama","Emma Stone, Octavia Spencer, Johnny Cash, Hill...",221802186,"Octavia Spencer, Emma Stone, Viola Davis"
247,248. Amores Perros,2000,8.0,"Tragedy, Drama, Thriller","Emilio Echevarría, Goya Toledo, Guillermo Arri...",20908467,"Emilio Echevarría, Gael García Bernal, Goya To..."
248,249. Rebecca,1940,8.1,"Dark Romance, Psychological Drama, Psychologic...","Laurence Olivier, The Second Mrs. de Winter, J...",113328,"Laurence Olivier, Joan Fontaine, George Sanders"


In [82]:
# Convert the 'revenue' column to numeric
df['revenue'] = pd.to_numeric(df['revenue'], errors='coerce')
print(df[['revenue']])

        revenue
0      29332133
1     250342198
2    1009057329
3      47964222
4          2945
..          ...
245    30819442
246   221802186
247    20908467
248      113328
249  1100000000

[249 rows x 1 columns]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['revenue'] = pd.to_numeric(df['revenue'], errors='coerce')


In [83]:
df.dtypes

title           object
year             int64
rating         float64
genre           object
directors       object
revenue          int64
lead_actors     object
dtype: object

In [84]:
#remove numeric characters from the 'title' column
df['title'] = df['title'].str.replace(r'[0-9]', '')
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['title'] = df['title'].str.replace(r'[0-9]', '')


Unnamed: 0,title,year,rating,genre,directors,revenue,lead_actors
0,1. The Shawshank Redemption,1994,9.3,"Epic, Period Drama, Prison Drama, Drama","Bob Gunton, Frank Darabont, Morgan Freeman, Ti...",29332133,"Bob Gunton, Tim Robbins, Morgan Freeman"
1,2. The Godfather,1972,9.2,"Epic, Gangster, Tragedy, Crime, Drama","Al Pacino, Marlon Brando, Mario Puzo, Peter Cl...",250342198,"Al Pacino, Marlon Brando, James Caan"
2,3. The Dark Knight,2008,9.0,"Action Epic, Epic, Superhero, Tragedy, Action,...","Salvatore Maroni, Michael Caine, Christian Bal...",1009057329,"Christian Bale, Aaron Eckhart, Heath Ledger"
3,4. The Godfather Part II,1974,9.0,"Epic, Gangster, Tragedy, Crime, Drama","Livio Giorgi, Al Pacino, Mario Puzo, Francis F...",47964222,"Al Pacino, Robert De Niro, Robert Duvall"
4,5. 12 Angry Men,1957,9.0,"Legal Drama, Psychological Drama, Crime, Drama","Jack Warden, Lee J. Cobb, Sidney Lumet, Regina...",2945,"Henry Fonda, Martin Balsam, Lee J. Cobb"
...,...,...,...,...,...,...,...
245,246. A Silent Voice: The Movie,2016,8.1,"Anime, Coming-of-Age, Psychological Drama, Shō...","Reiko Yoshida, Pete Townshend, Lexi Marman, Mi...",30819442,"Saori Hayami, Miyu Irino, Aoi Yûki"
246,247. The Help,2011,8.1,"Period Drama, Drama","Emma Stone, Octavia Spencer, Johnny Cash, Hill...",221802186,"Octavia Spencer, Emma Stone, Viola Davis"
247,248. Amores Perros,2000,8.0,"Tragedy, Drama, Thriller","Emilio Echevarría, Goya Toledo, Guillermo Arri...",20908467,"Emilio Echevarría, Gael García Bernal, Goya To..."
248,249. Rebecca,1940,8.1,"Dark Romance, Psychological Drama, Psychologic...","Laurence Olivier, The Second Mrs. de Winter, J...",113328,"Laurence Olivier, Joan Fontaine, George Sanders"


In [85]:
# Remove numbering from the "Title" column
df["title"] = df["title"].str.replace(r"^\d+\.\s*", "", regex=True)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["title"] = df["title"].str.replace(r"^\d+\.\s*", "", regex=True)


Unnamed: 0,title,year,rating,genre,directors,revenue,lead_actors
0,The Shawshank Redemption,1994,9.3,"Epic, Period Drama, Prison Drama, Drama","Bob Gunton, Frank Darabont, Morgan Freeman, Ti...",29332133,"Bob Gunton, Tim Robbins, Morgan Freeman"
1,The Godfather,1972,9.2,"Epic, Gangster, Tragedy, Crime, Drama","Al Pacino, Marlon Brando, Mario Puzo, Peter Cl...",250342198,"Al Pacino, Marlon Brando, James Caan"
2,The Dark Knight,2008,9.0,"Action Epic, Epic, Superhero, Tragedy, Action,...","Salvatore Maroni, Michael Caine, Christian Bal...",1009057329,"Christian Bale, Aaron Eckhart, Heath Ledger"
3,The Godfather Part II,1974,9.0,"Epic, Gangster, Tragedy, Crime, Drama","Livio Giorgi, Al Pacino, Mario Puzo, Francis F...",47964222,"Al Pacino, Robert De Niro, Robert Duvall"
4,12 Angry Men,1957,9.0,"Legal Drama, Psychological Drama, Crime, Drama","Jack Warden, Lee J. Cobb, Sidney Lumet, Regina...",2945,"Henry Fonda, Martin Balsam, Lee J. Cobb"
...,...,...,...,...,...,...,...
245,A Silent Voice: The Movie,2016,8.1,"Anime, Coming-of-Age, Psychological Drama, Shō...","Reiko Yoshida, Pete Townshend, Lexi Marman, Mi...",30819442,"Saori Hayami, Miyu Irino, Aoi Yûki"
246,The Help,2011,8.1,"Period Drama, Drama","Emma Stone, Octavia Spencer, Johnny Cash, Hill...",221802186,"Octavia Spencer, Emma Stone, Viola Davis"
247,Amores Perros,2000,8.0,"Tragedy, Drama, Thriller","Emilio Echevarría, Goya Toledo, Guillermo Arri...",20908467,"Emilio Echevarría, Gael García Bernal, Goya To..."
248,Rebecca,1940,8.1,"Dark Romance, Psychological Drama, Psychologic...","Laurence Olivier, The Second Mrs. de Winter, J...",113328,"Laurence Olivier, Joan Fontaine, George Sanders"


In [86]:
#save the cleaned data to a new csv file
df.to_csv('../data/imdb_top_movies_cleaned.csv', index=False)