# Exploring Movie Data with the Collections datatypes

This project in the #100daysofcode helps one understand more about datatypes over and above the standard built in ones like strings, lists and dictionaries.

I'm most comfortable with Pandas as a data scientist, but I took on the challange to use CSV anyway and it was really fun and very challanging to be honest.

### I first imported the files modules I am going to need below.

In [1]:
import csv
from collections import defaultdict, namedtuple


All the variables I will use in the functions below are initialized at this point.

In [2]:
MOVIE_CSV= 'movie_metadata.csv'
NUM_TOP_DIRECTORS = 20
MIN_MOVIES = 4
MIN_YEAR = 1960  

In [15]:
def get_movies_by_director():
    '''Extracts all movies from csv and stores them in a dictionary
    where keys are directors, and values is a list of movies (named tuples)'''
    directors = defaultdict(list)
    Movie = namedtuple("movie", ('movie_title', 'title_year', 'imdb_score'))
    with open(MOVIE_CSV, encoding= 'utf-8') as f:
        movies = csv.DictReader(f)
        for movie in movies:
            directors[movie['director_name']].append(Movie(movie['movie_title'], movie['title_year'], movie['imdb_score']))
     
    return directors
     
     
def get_average_scores(directors):
    '''Filter directors with < MIN_MOVIES and calculate averge score'''
    nominated_directors = {}
    for director, movies in directors.items():
        if len(movies) >= MIN_MOVIES:
            nominated_directors[(director, _calc_mean(movies))] = movies
     
    return nominated_directors
     
     
     
def _calc_mean(movies):
    '''Helper method to calculate mean of list of Movie namedtuples'''
    scores = [float(movie.imdb_score) for movie in movies]
     
    return round(sum(scores)/len(scores),1)
     
     
def print_results(directors):
    '''Print directors ordered by highest average rating. For each director
    print his/her movies also ordered by highest rated movie.
    See http://pybit.es/codechallenge13.html for example output'''
    fmt_director_entry = '{counter}. {director:<52} {avg}'
    fmt_movie_entry = '{year}] {title:<50} {score}'
    sep_line = '-' * 60
    report = sorted(directors.items(), key=lambda x: float(x[0][1]), reverse=True)
    for i in range(NUM_TOP_DIRECTORS):
        print(fmt_director_entry.format(counter=i+1, director=report[i][0][0], avg=report[i][0][1]))
        print(sep_line)
        for movie in report[i][1]:
            print(fmt_movie_entry.format(year=movie.title_year, title=movie.movie_title, score=movie.imdb_score))
        print()
     
     


## The main function
Below is where I run the main function to call all the others to do the work. This is a coding convention I learnt from Talk Python FM. It's normally used in .py files but I decided to use it in a notebook,

In [16]:
def main():
    '''This is a template, feel free to structure your code differently.
    We wrote some tests based on our solution: test_directors.py'''
    

    directors = get_movies_by_director()
    directors = get_average_scores(directors)
    print_results(directors)
     
     
if __name__ == '__main__':
     main()

1. Sergio Leone                                         8.5
------------------------------------------------------------
1984] Once Upon a Time in America                        8.4
1968] Once Upon a Time in the West                       8.6
1966] The Good, the Bad and the Ugly                     8.9
1964] A Fistful of Dollars                               8.0

2. Christopher Nolan                                    8.4
------------------------------------------------------------
2012] The Dark Knight Rises                              8.5
2008] The Dark Knight                                    9.0
2014] Interstellar                                       8.6
2010] Inception                                          8.8
2005] Batman Begins                                      8.3
2002] Insomnia                                           7.2
2006] The Prestige                                       8.5
2000] Memento                                            8.5

3. Quentin Tarantino    

## Conclusion:
Above we see the top movie directors ordered by ther average movie rating.