# Module 5 Challenge Question: Finding the Top Rated Movie Directors

#### **data source(s):**
- https://pybit.es/codechallenge13.html
- https://nycdatascience.com/blog/student-works/web-scraping/movie-rating-prediction/

Using the data on IMDB movies located at the following url, obtain the 20 highest rated movie directors based on their average movie IMDB ratings.

 - Only consider directors with a minimum of 4 movies, otherwise you get misrepresentative data. 
 - Take movies of year >= 1960.
 - Print the top 20 highest rated directors with their movies ordered desc on rating


In [None]:
import pandas as pd
dataUrl = 'https://raw.githubusercontent.com/sundeepblue/movie_rating_prediction/master/movie_metadata.csv'
data = pd.read_csv(dataUrl)

In [None]:
data

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5038,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
5039,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
5040,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
5041,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


In [None]:
data.columns

Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

In [None]:
columns_needed = ['director_name', 'movie_title', 'title_year', 'imdb_score']
data_needed = data[columns_needed]
data_needed

Unnamed: 0,director_name,movie_title,title_year,imdb_score
0,James Cameron,Avatar,2009.0,7.9
1,Gore Verbinski,Pirates of the Caribbean: At World's End,2007.0,7.1
2,Sam Mendes,Spectre,2015.0,6.8
3,Christopher Nolan,The Dark Knight Rises,2012.0,8.5
4,Doug Walker,Star Wars: Episode VII - The Force Awakens ...,,7.1
...,...,...,...,...
5038,Scott Smith,Signed Sealed Delivered,2013.0,7.7
5039,,The Following,,7.5
5040,Benjamin Roberds,A Plague So Pleasant,2013.0,6.3
5041,Daniel Hsia,Shanghai Calling,2012.0,6.3


In [None]:
movies_made = data_needed['director_name'].value_counts()
movies_made

Steven Spielberg    26
Woody Allen         22
Clint Eastwood      20
Martin Scorsese     20
Ridley Scott        17
                    ..
John Crowley         1
Rob Pritts           1
David S. Ward        1
R.J. Cutler          1
Daniel Hsia          1
Name: director_name, Length: 2398, dtype: int64

In [None]:
directors_more_4 = movies_made[movies_made >= 4]
directors_more_4

Steven Spielberg    26
Woody Allen         22
Clint Eastwood      20
Martin Scorsese     20
Ridley Scott        17
                    ..
Spike Jonze          4
John Milius          4
Ron Shelton          4
Anthony Russo        4
Rick Famuyiwa        4
Name: director_name, Length: 354, dtype: int64

In [None]:
directors_needed = directors_more_4.keys()
directors_needed

Index(['Steven Spielberg', 'Woody Allen', 'Clint Eastwood', 'Martin Scorsese',
       'Ridley Scott', 'Tim Burton', 'Steven Soderbergh', 'Spike Lee',
       'Renny Harlin', 'Oliver Stone',
       ...
       'Steven Brill', 'John Moore', 'Mark L. Lester', 'Phil Lord',
       'Albert Hughes', 'Spike Jonze', 'John Milius', 'Ron Shelton',
       'Anthony Russo', 'Rick Famuyiwa'],
      dtype='object', length=354)

In [None]:
# Data with directors who has more or equal 4 movies and displayed movies with year equal or more 1960
data_filtered = data_needed.loc[(data_needed['title_year'] >= 1960) & (data_needed['director_name'].isin(directors_needed))].sort_values(by = 'imdb_score', ascending = False)
data_filtered

Unnamed: 0,director_name,movie_title,title_year,imdb_score
1937,Frank Darabont,The Shawshank Redemption,1994.0,9.3
3466,Francis Ford Coppola,The Godfather,1972.0,9.2
4409,John Stockwell,Kickboxer: Vengeance,2016.0,9.1
66,Christopher Nolan,The Dark Knight,2008.0,9.0
2837,Francis Ford Coppola,The Godfather: Part II,1974.0,9.0
...,...,...,...,...
2192,Jason Friedberg,Epic Movie,2007.0,2.3
2313,Uwe Boll,Alone in the Dark,2005.0,2.3
2295,Bob Clark,Superbabies: Baby Geniuses 2,2004.0,1.9
2268,Jason Friedberg,Disaster Movie,2008.0,1.9


In [None]:
director_score = data_filtered.groupby('director_name')['imdb_score'].mean().sort_values(ascending=False).head(20)
director_score = dict(director_score)
director_score

{'Sergio Leone': 8.475,
 'Christopher Nolan': 8.425,
 'Hayao Miyazaki': 8.225,
 'Quentin Tarantino': 8.2,
 'Stanley Kubrick': 8.0,
 'Frank Darabont': 7.9750000000000005,
 'Richard Brooks': 7.95,
 'David Lean': 7.933333333333334,
 'Joss Whedon': 7.925,
 'James Cameron': 7.914285714285714,
 'Billy Wilder': 7.800000000000001,
 'Alfonso Cuarón': 7.8,
 'Alejandro G. Iñárritu': 7.783333333333334,
 'David Fincher': 7.75,
 'Peter Weir': 7.725,
 'Peter Jackson': 7.675000000000001,
 'Martin Scorsese': 7.659999999999999,
 'Fred Zinnemann': 7.65,
 'Matthew Vaughn': 7.65,
 'David Lynch': 7.640000000000001}

In [None]:
data_movies = data_filtered[['director_name', 'movie_title']]
data_movies = dict(data_movies)
data_movies

{'director_name': 0           James Cameron
 1          Gore Verbinski
 2              Sam Mendes
 3       Christopher Nolan
 6               Sam Raimi
               ...        
 5015    Richard Linklater
 5018        Alex Kendrick
 5021          Jay Duplass
 5035     Robert Rodriguez
 5037         Edward Burns
 Name: director_name, Length: 2163, dtype: object,
 'movie_title': 0                                         Avatar 
 1       Pirates of the Caribbean: At World's End 
 2                                        Spectre 
 3                          The Dark Knight Rises 
 6                                   Spider-Man 3 
                           ...                    
 5015                                     Slacker 
 5018                                    Flywheel 
 5021                             The Puffy Chair 
 5035                                 El Mariachi 
 5037                                   Newlyweds 
 Name: movie_title, Length: 2163, dtype: object}

In [None]:
for line in director_score:
    
    #Directors and mean scrore
    print(line, '%.1f' %director_score[line])
    print("--------------------------------")

    #Movies
    M = data_filtered[data_filtered['director_name'] == line]
    for i in range(len(M)):
      print('%.0f' %M.iloc[i]['title_year'], M.iloc[i]['movie_title'], M.iloc[i]['imdb_score'])

    print('\n')


Sergio Leone 8.5
--------------------------------
1966 The Good, the Bad and the Ugly  8.9
1968 Once Upon a Time in the West  8.6
1984 Once Upon a Time in America  8.4
1964 A Fistful of Dollars  8.0


Christopher Nolan 8.4
--------------------------------
2008 The Dark Knight  9.0
2010 Inception  8.8
2014 Interstellar  8.6
2006 The Prestige  8.5
2000 Memento  8.5
2012 The Dark Knight Rises  8.5
2005 Batman Begins  8.3
2002 Insomnia  7.2


Hayao Miyazaki 8.2
--------------------------------
2001 Spirited Away  8.6
1997 Princess Mononoke  8.4
2004 Howl's Moving Castle  8.2
2008 Ponyo  7.7


Quentin Tarantino 8.2
--------------------------------
1994 Pulp Fiction  8.9
2012 Django Unchained  8.5
1992 Reservoir Dogs  8.4
2009 Inglourious Basterds  8.3
2003 Kill Bill: Vol. 1  8.1
2004 Kill Bill: Vol. 2  8.0
2015 The Hateful Eight  7.9
1997 Jackie Brown  7.5


Stanley Kubrick 8.0
--------------------------------
1964 Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb  8.5
19