<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

## Python Review With Movie Data

_Author: Kiefer Katovich and Dave Yerrington (San Francisco)

---

In this lab, you'll be using the [IMDb](http://www.imdb.com/) `movies` list below as your data set. 

This lab is designed to help you practice iteration and functions in particular. The normal questions are more gentle, and the challenge questions are suitable for advanced/expert Python students or those with programming experience. 

All of the questions require writing functions and using iteration to solve. You should print out a test of each function you write.


### 1) Load the provided list of `movies` dictionaries.

In [1]:
# List of movies dictionaries:

movies = [
{
"name": "Usual Suspects", 
"imdb": 7.0,
"category": "Thriller"
},
{
"name": "Hitman",
"imdb": 6.3,
"category": "Action"
},
{
"name": "Dark Knight",
"imdb": 9.0,
"category": "Adventure"
},
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
},
{
"name": "The Choice",
"imdb": 6.2,
"category": "Romance"
},
{
"name": "Colonia",
"imdb": 7.4,
"category": "Romance"
},
{
"name": "Love",
"imdb": 6.0,
"category": "Romance"
},
{
"name": "Bride Wars",
"imdb": 5.4,
"category": "Romance"
},
{
"name": "AlphaJet",
"imdb": 3.2,
"category": "War"
},
{
"name": "Ringing Crime",
"imdb": 4.0,
"category": "Crime"
},
{
"name": "Joking muck",
"imdb": 7.2,
"category": "Comedy"
},
{
"name": "What is the name",
"imdb": 9.2,
"category": "Suspense"
},
{
"name": "Detective",
"imdb": 7.0,
"category": "Suspense"
},
{
"name": "Exam",
"imdb": 4.2,
"category": "Thriller"
},
{
"name": "We Two",
"imdb": 7.2,
"category": "Romance"
}
]

---

### 2) Filtering data by IMDb score.

#### 2.1)

Write a function that:

1) Accepts a single movie dictionary from the `movies` list as an argument.
2) Returns `True` if the IMDb score is greater than 5.5.

#### 2.2 [Challenge])

Write a function that:

1) Accepts the `movies` list and a specified category.
2) Returns `True` if the average score of the category is higher than the average score of all movies.

In [2]:
# 2.1:

def imdb_score_over_bad(movie):
    if movie['imdb'] > 5.5:
        return True
    else:
        return False

print(movies[0])
print(imdb_score_over_bad(movies[0]))

{'name': 'Usual Suspects', 'imdb': 7.0, 'category': 'Thriller'}
True


In [3]:
# 2.2:

def movies_category_over_avg(movies, category):
    overall_average = []
    category_average = []
    
    
    for movie in movies:
        # Creates a list of all IMDb scores:
        overall_average.append(movie['imdb'])
        # Creates a list of all IMDb scores that match the category argument:
        if movie['category'] == category:
            category_average.append(movie['imdb'])
            
    # Uses IMDb scores list to manually calculate the data set's mean:
    overall_average = sum(overall_average)/len(overall_average)
    # Catch to identify and respond to invalid categories:
    if len(category_average) == 0:
        print('no movies in specified category:', category)
        return False
    # Else valid category, calculate mean:
    else:
        category_average = sum(category_average)/len(category_average)
        # Compare category and overall means:
        if category_average > overall_average:
            return True
        else:
            return False

print(movies_category_over_avg(movies, 'Thriller'))
print(movies_category_over_avg(movies, 'Suspense'))

False
True


---

### 3) Creating subsets by numeric condition.

#### 3.1)

Write a function that:

1) Accepts the list of movies and a specified IMDb score.
2) Returns the sublist of movies that have scores greater than the one specified.

#### 3.2 [Expert])

Write a function that:

1) Accepts the `movies` list as an argument.
2) Returns the `movies` list sorted first by category and then by movie according to category average score and individual IMDb score, respectively.

In [4]:
# 3.1:

def score_greater_subset(movies, score):
    subset = []
    for movie in movies:
        if movie['imdb'] > score:
            subset.append(movie)
    return subset

print(score_greater_subset(movies, 8.5))

[{'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'}, {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'}]


In [5]:
# 3.2:
# See these Stack Overflow questions and answers for another example and explanation of the lambda search:
# http://stackoverflow.com/questions/3766633/how-to-sort-with-lambda-in-python
# http://stackoverflow.com/questions/14299448/sorting-by-multiple-conditions-in-python

def category_score_sorted(movies):
    category_scores = {}
    for movie in movies:
        # If the category key does not exist in the category_scores dic:
        if not movie['category'] in category_scores:
            # Add the category key with its first value being the IMDb score:
            category_scores[movie['category']] = [movie['imdb']]
        else:
            # Otherwise, append the score to the existing category values list:
            category_scores[movie['category']].append(movie['imdb'])
    
    # Uses the category key-and-values list to create a new dic in which the values are the means:
    category_averages = {}
    for cat, vals in list(category_scores.items()):
        category_averages[cat] = sum(vals)/len(vals)
    
    
    movies_sorted = sorted(movies, key=lambda x: (category_averages[x['category']],
                                                  x['imdb']), reverse=True)
        # "key" argument in the sorted function refers the desired means of sorting.
        # "x" is referring to each individual entry in the movies list.
        # Lambda functions are like single-use, one-line functions.
        # In this case, we are sorting by category_avg and then IMDb scores.
        # Reverse because we want high to low instead of low to high.
    
    return movies_sorted

category_score_sorted(movies)

[{'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'},
 {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'},
 {'name': 'Detective', 'imdb': 7.0, 'category': 'Suspense'},
 {'name': 'The Help', 'imdb': 8.0, 'category': 'Drama'},
 {'name': 'Joking muck', 'imdb': 7.2, 'category': 'Comedy'},
 {'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'},
 {'name': 'We Two', 'imdb': 7.2, 'category': 'Romance'},
 {'name': 'The Choice', 'imdb': 6.2, 'category': 'Romance'},
 {'name': 'Love', 'imdb': 6.0, 'category': 'Romance'},
 {'name': 'Bride Wars', 'imdb': 5.4, 'category': 'Romance'},
 {'name': 'Hitman', 'imdb': 6.3, 'category': 'Action'},
 {'name': 'Usual Suspects', 'imdb': 7.0, 'category': 'Thriller'},
 {'name': 'Exam', 'imdb': 4.2, 'category': 'Thriller'},
 {'name': 'Ringing Crime', 'imdb': 4.0, 'category': 'Crime'},
 {'name': 'AlphaJet', 'imdb': 3.2, 'category': 'War'}]