# Iteration practice with movies

In this lab you'll be using the provided imdb `movies` list below as your dataset. 

This lab is designed to practice iteration. The normal questions are more gentle, and the challenge questions are suitable for advanced/expert python or programming-experienced students. 

All the questions require writing functions and also use iteration to solve. You should print out a test of each function you write.

---

### 1. Load the provided list of movies dictionaries.

In [1]:
# List of movies dictionaries:

movies = [
{
"name": "Usual Suspects", 
"imdb": 7.0,
"category": "Thriller"
},
{
"name": "Hitman",
"imdb": 6.3,
"category": "Action"
},
{
"name": "Dark Knight",
"imdb": 9.0,
"category": "Adventure"
},
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
},
{
"name": "The Choice",
"imdb": 6.2,
"category": "Romance"
},
{
"name": "Colonia",
"imdb": 7.4,
"category": "Romance"
},
{
"name": "Love",
"imdb": 6.0,
"category": "Romance"
},
{
"name": "Bride Wars",
"imdb": 5.4,
"category": "Romance"
},
{
"name": "AlphaJet",
"imdb": 3.2,
"category": "War"
},
{
"name": "Ringing Crime",
"imdb": 4.0,
"category": "Crime"
},
{
"name": "Joking muck",
"imdb": 7.2,
"category": "Comedy"
},
{
"name": "What is the name",
"imdb": 9.2,
"category": "Suspense"
},
{
"name": "Detective",
"imdb": 7.0,
"category": "Suspense"
},
{
"name": "Exam",
"imdb": 4.2,
"category": "Thriller"
},
{
"name": "We Two",
"imdb": 7.2,
"category": "Romance"
}
]

---

### 2. Filtering data by IMDB score

#### 2.1 

Write a function that:

1. Accepts a single movie dictionary from the `movies` list as an argument.
2. Returns True if the IMDB score is above 5.5.

#### 2.2 [Challenge] 

Write a function that:

1. Accepts the movies list and a specified category.
2. Returns True if the average score of the category is higher than the average score of all movies.

In [2]:
# Your code here.
def filter_imdb(movie, score=5.5):
    return movie['imdb'] > score

def filter_cat(movie, category):
    return movie['category'] == category 

def category_average(movies, category):
    cat_movies = [movie['imdb'] for movie in movies if filter_cat(movie, category)]
    cat_movies_avg = sum(cat_movies) / len(cat_movies)
    return cat_movies_avg

def good_category(movies, category):
    total_avg = sum([movie['imdb'] for movie in movies]) / len(movies)
    return category_average(movies, category) > total_avg

---

### 3. Creating subsets by numeric condition

#### 3.1

Write a function that:

1. Accepts the list of movies and a specified imdb score.
2. Returns the sublist of movies that have a score greater than the specified score.

#### 3.2 [Expert] 

Write a function that:

1. Accepts the movies list as an argument.
2. Returns the movies list sorted first by category and then by movie according to average score and individual score, respectively.

In [3]:
# Your code here.
def good_movies(movies, score):
    return [movie for movie in movies if filter_imdb(movie, score)]


def sorted_cat(movies):
    cat_avg_movies = []
    
    for item in movies:
        item.update({'cat_avg': category_average(movies, item['category'])})
        cat_avg_movies.append(item)
        
    cat_avg_movies.sort(key = lambda x: (x['cat_avg'], x['imdb']), reverse=True)
    return cat_avg_movies

---

### 4. Creating subsets by string condition

#### 4.1

Write a function that:

1. Accepts the movies list and a category name.
2. Returns the movie names within that category (case-insensitive!)
3. If the category is not in the data, print a message that it does not exist and return None.

Recall that to convert a string to lowercase, you can use:

```python
mystring = 'Dumb and Dumber'
lowercase_mystring = mystring.lower()
print lowercase_mystring
'dumb and dumber'
```

#### 4.2 [Challenge]

Write a function that:

1. Accepts the movies list and a "search string".
2. Returns a dictionary with keys `'category'` and `'title'` whose values are lists of categories that contain the search string and titles that contain the search string, respectively (case-insensitive!)

In [4]:
# Your code here.
def get_cat(movies, category):
    category = category.title()
    filtered_movies = [movie['name'] for movie in movies if filter_cat(movie, category)]
    if filtered_movies:
        return filtered_movies
    else:
        print "This category doesn't exist"
        return None

In [5]:
def search_cat_title(movies, substr):
    substr = substr.lower()
    fuzzy_movies = {'title': set(), 'category': set()}
    
    for movie in movies:
        if substr in movie['name'].lower():
            fuzzy_movies['title'].add(movie['name'])
        
        if substr in movie['category'].lower():
            fuzzy_movies['category'].add(movie['category'])
    
    return fuzzy_movies

In [6]:
search_cat_title(movies, 'SUS')

{'category': {'Suspense'}, 'title': {'Usual Suspects'}}

---

### Multiple conditions

#### 5.1

Write a function that:

1. Accepts the movies list and a "search criteria" variable.
2. If the criteria variable is numeric, return a list of movie titles with a score greater than or equal to the criteria.
3. If the criteria variable is a string, return a list of movie titles that match that category (case-insensitive!). If there is no match, return an empty list and print an informative message.

#### 5.2 [Expert]

Write a function that:

1. Accepts the movies list and a string search criteria variable.
2. The search criteria variable can contain within it:
  - Boolean operations: `'AND'`, `'OR'`, and `'NOT'` (can have/be lowercase as well, I just capitalized for clarity).
  - Search criteria specified with syntax `score=...`, `category=...`, and/or `title=...`, where the `...` indicates what to look for.
    - If `score` is present, it means scores greater than or equal to the value.
    - For `category` and `title`, the string indicates that the category or title must _contain_ the search string (case-insensitive).
3. Return the matches for the search criteria specified.

In [7]:
# Your code here.
# Code for 5.1 Section 
def filter_movies(movies, search_crit):
    if type(search_crit) in [int, float]:
        return [movie['name'] for movie in movies if filter_imdb(movie, search_crit)]
    elif type(search_crit) == str:
        filtered_movies = [movie['name'] for movie in movies
                           if search_crit.lower() in movie['category'].lower()]    
        return filtered_movies
    else:
        return "Input either a numeric or string search criteria"
            

print filter_movies(movies, 6.9)
print filter_movies(movies, 'suspense')
print filter_movies(movies, 'horror')
print filter_movies(movies, {'name':'the godfather'})

['Usual Suspects', 'Dark Knight', 'The Help', 'Colonia', 'Joking muck', 'What is the name', 'Detective', 'We Two']
['What is the name', 'Detective']
[]
Input either a numeric or string search criteria


In [97]:
# Code for 5.2 Section 
def movie_matches_subparser(movies, movie_key, value):
    if movie_key == 'title':
        movie_key = 'name'
    elif movie_key not in ['category','imdb']:
        print 'movie lookup key', movie_key, 'incorrect'
        return []
        
    if movie_key == 'imdb':
        try:
            value = float(value)
        except:
            print 'imdb', value, 'cannot become float'
            return []
        
    subset = []
    for movie_ind, movie in enumerate(movies):
        if type(value) == float:
            if movie[movie_key] >= value:
                subset.append(movie_ind)
        else:
            if value in movie[movie_key].lower():
                subset.append(movie_ind)
    
    return subset



def meets_boolean_criteria(movies, criteria_info):
    
    movie_inds = range(len(movies))
    
    full_set = set(movie_inds)
    return_set = set(movie_inds)
    
    for boolean, movie_subset in criteria_info:
        
        movie_subset = set(movie_subset)
        
        if boolean == 'and':
            return_set = return_set & movie_subset
        elif boolean == 'or':
            return_set = return_set | movie_subset
        elif boolean == 'not':
            return_set = return_set - movie_subset
        elif boolean == 'ornot':
            return_set = return_set | (full_set - movie_subset)
            
    return_list = []
    for ind in list(return_set):
        return_list.append(movies[ind])
        
    return return_list  
            
                

def boolean_search(movies, search):
    
    search = search.lower()
    search = search.split(' ')
    
    criteria_info = []
    current_boolean = 'and'
    
    while len(search) > 0:
        item = search.pop(0)
        if item in ['and','or','not']:
            if (current_boolean == 'or') and (item == 'not'):
                current_boolean = 'ornot'
            else:
                current_boolean = item
            continue
        else:
            if '=' in item:
                item = item.split('=')
            else:
                print item, 'syntax incorrect'
                return []
                            
            movie_match_inds = movie_matches_subparser(movies, item[0], item[1])
            criteria_info.append([current_boolean, movie_match_inds])

    matches = meets_boolean_criteria(movies, criteria_info)
    return matches

In [98]:
print boolean_search(movies, 'imdb=7.0 NOT category=suspense OR NOT title=love')

print boolean_search(movies, 'imdb=8.9 AND NOT category=suspense')

boolean_search(movies, 'imdb=notafloat')

boolean_search(movies, 'category=1')

boolean_search(movies, 'category=suspense WHEN imdb=5.5')

[{'category': 'Thriller', 'imdb': 7.0, 'name': 'Usual Suspects'}, {'category': 'Action', 'imdb': 6.3, 'name': 'Hitman'}, {'category': 'Adventure', 'imdb': 9.0, 'name': 'Dark Knight'}, {'category': 'Drama', 'imdb': 8.0, 'name': 'The Help'}, {'category': 'Romance', 'imdb': 6.2, 'name': 'The Choice'}, {'category': 'Romance', 'imdb': 7.4, 'name': 'Colonia'}, {'category': 'Romance', 'imdb': 5.4, 'name': 'Bride Wars'}, {'category': 'War', 'imdb': 3.2, 'name': 'AlphaJet'}, {'category': 'Crime', 'imdb': 4.0, 'name': 'Ringing Crime'}, {'category': 'Comedy', 'imdb': 7.2, 'name': 'Joking muck'}, {'category': 'Suspense', 'imdb': 9.2, 'name': 'What is the name'}, {'category': 'Suspense', 'imdb': 7.0, 'name': 'Detective'}, {'category': 'Thriller', 'imdb': 4.2, 'name': 'Exam'}, {'category': 'Romance', 'imdb': 7.2, 'name': 'We Two'}]
[{'category': 'Adventure', 'imdb': 9.0, 'name': 'Dark Knight'}]
imdb notafloat cannot become float
when syntax incorrect


[]