## Day 5 puts what we previously learned into practice...and in a fun way!

---
We will use [pybites challenge # 13](https://codechalleng.es/challenges/13/) for today.  

We will use defaultdict, namedtuple and Counter from the [collections](https://docs.python.org/2/library/collections.html) module.



In [1]:
from collections import defaultdict, namedtuple, Counter, deque
import csv
import random
from urllib.request import urlretrieve

---
import our csv data

In [2]:
movie_data = 'https://raw.githubusercontent.com/pybites/challenges/solutions/13/movie_metadata.csv'
movies_csv = 'movies.csv'
urlretrieve(movie_data, movies_csv)

('movies.csv', <http.client.HTTPMessage at 0x10579a160>)

---
By using namedtuple we can refer to title or score in a very readable way.

In [3]:
Movie = namedtuple('Movie', 'title year score')

---
We will use dictreader from the [csv module](https://docs.python.org/2/library/csv.html) to read our csv data and store it in our named tuple first to be appended to our defaultfict object.

The dictreader is slightly different from the standard reader method.
>Create an object which operates like a regular reader but maps the information read into a dict whose keys are given by the optional fieldnames parameter. The fieldnames parameter is a sequence whose elements are associated with the fields of the input data in order. These elements become the keys of the resulting dictionary. If the fieldnames parameter is omitted, the values in the first row of the file f will be used as the fieldnames. If the row read has more fields than the fieldnames sequence, the remaining data is added as a sequence keyed by the value of restkey. If the row read has fewer fields than the fieldnames sequence, the remaining keys take the value of the optional restval parameter. Any other optional or keyword arguments are passed to the underlying reader instance.

In [4]:
def get_movies_by_director(data=movies_csv):
    """Extracts all movies from csv and stores them in a dictionary
       where keys are directors, and values is a list of movies (named tuples)"""
    directors = defaultdict(list)
    with open(data, encoding='utf-8') as f:
        for line in csv.DictReader(f):
            try:
                director = line['director_name']
                movie = line['movie_title'].replace('\xa0', '')
                year = int(line['title_year'])
                score = float(line['imdb_score'])
            except ValueError:
                continue

            m = Movie(title=movie, year=year, score=score)
            directors[director].append(m)

    return directors

In [5]:
directors = get_movies_by_director()

---

now we can search for a director and see the movies where they were a director. 

In [6]:
directors['Christopher Nolan']

[Movie(title='The Dark Knight Rises', year=2012, score=8.5),
 Movie(title='The Dark Knight', year=2008, score=9.0),
 Movie(title='Interstellar', year=2014, score=8.6),
 Movie(title='Inception', year=2010, score=8.8),
 Movie(title='Batman Begins', year=2005, score=8.3),
 Movie(title='Insomnia', year=2002, score=7.2),
 Movie(title='The Prestige', year=2006, score=8.5),
 Movie(title='Memento', year=2000, score=8.5)]

---
and out of curiosity...Clint Eastwood, just because

In [10]:
directors['Clint Eastwood']

[Movie(title='Space Cowboys', year=2000, score=6.4),
 Movie(title='Invictus', year=2009, score=7.4),
 Movie(title='American Sniper', year=2014, score=7.3),
 Movie(title='Changeling', year=2008, score=7.8),
 Movie(title='Flags of Our Fathers', year=2006, score=7.1),
 Movie(title='Absolute Power', year=1997, score=6.7),
 Movie(title='Hereafter', year=2010, score=6.5),
 Movie(title='Blood Work', year=2002, score=6.4),
 Movie(title='Jersey Boys', year=2014, score=6.9),
 Movie(title='J. Edgar', year=2011, score=6.6),
 Movie(title='Midnight in the Garden of Good and Evil', year=1997, score=6.6),
 Movie(title='Mystic River', year=2003, score=8.0),
 Movie(title='Million Dollar Baby', year=2004, score=8.1),
 Movie(title='Gran Torino', year=2008, score=8.2),
 Movie(title='The Bridges of Madison County', year=1995, score=7.5),
 Movie(title='Firefox', year=1982, score=5.9),
 Movie(title='Unforgiven', year=1992, score=8.3),
 Movie(title='Letters from Iwo Jima', year=2006, score=7.9),
 Movie(title='

---
Now lets use Counter and see who the directors have the most movies!

In [9]:
cnt = Counter()
for director, movies in directors.items():
    cnt[director] += len(movies)
    
cnt.most_common(5)

[('Steven Spielberg', 26),
 ('Woody Allen', 22),
 ('Martin Scorsese', 20),
 ('Clint Eastwood', 20),
 ('Ridley Scott', 17)]