# Reviewing List Comprehensions

### Introduction

In this lesson, we'll review list comprehensions.  Let's work with the movies dataset to do so.

### Loading the data

In [2]:
import pandas as pd
movies_df = pd.read_csv("https://raw.githubusercontent.com/jigsawlabs-student/tech-interview/main/movies.csv")

movies = movies_df.to_dict('records')

And now let's look at our data.

In [7]:
movies[:1]

[{'title': 'Oliver Twist',
  'genre': 'Crime',
  'budget': 50000000,
  'runtime': 130.0,
  'year': 2005,
  'month': 9,
  'revenue': 42093706}]

Ok, so movies is a list of dictionaries with attributes describing each movie.

### Loops to List Comprehension

Now, as we know we can select attributes from our list of dictionaries with something like the following:

> Below we select each movie's title.

In [9]:
titles = []
# block variable, input_list
for movie in movies:
    title = movie['title'] # each_output
    titles.append(title)
titles[:3]

['Oliver Twist', 'X-Men: Apocalypse', 'Man on the Moon']

And now if we move to using list comprehension, we can do the following:

In [17]:
titles = [movie['title'] for movie in movies]
### each_output    block_var, input_list 

titles[:4]

['Oliver Twist', 'X-Men: Apocalypse', 'Man on the Moon', 'The Tree of Life']

So this is very useful, because we essentially just look at the very beginning of the list comprehension to see the login our for loop.

> Tip: when writing the list comprehension, it's often easier if you leave the logic for the end. 

For example, below see how we select the revenue of each movie.

In [None]:
# 1. write the loop component
# revenues = [... for movie in movies]

# 2. Fill in the logic
# revenues = [movie['revenue'] for movie in movies]

Now it's your turn.  

Use a list comprehension to coerce our data.  Some of our titles are a mix of uppercase and lowercase values.  Below, use a list comprehension to `title` each of the movies.

In [18]:
upper_titles = [movie['title'].title() for movie in movies]

upper_titles[:4]

# ['Oliver Twist', 'X-Men: Apocalypse', 'Man On The Moon', 'The Tree Of Life']

['Oliver Twist', 'X-Men: Apocalypse', 'Man On The Moon', 'The Tree Of Life']

When performing a list comprehension, it's still useful to try implementing the logic for a single record before performing for all of the records.

For example, let's say that we want to change the movie data to round to the nearest thousand.  We can first try to accomplish it for one movie.

In [32]:
first_movie = movies[1]
first_movie['revenue']

543934787

In [27]:
first_movie = movies[1]

first_movie['revenue']/1000 # divide by 1000

543934.787

* Round to zero decimals

In [29]:
round(first_movie['revenue']/1000, 0)

543935.0

* And then multiply back by 1000, and coerce to an integer

In [33]:
int(round(first_movie['revenue']/1000, 0)*1000)

# previously was 543934787, and now is 543935000

543935000

Ok, so now below, return a list of all of our movie revenue, rounded to the nearest thousand.

In [36]:
rounded_revenues = [int(round(movie['revenue']/1000, 0)*1000) for movie in movies]
rounded_revenues[:5]

# [42094000, 543935000, 47434000, 54674000, 527069000]

[42094000, 543935000, 47434000, 54674000, 527069000]

> Note that these operations do not change the original data, but rather return a new list of data.  Our revenues in the dictionary are unchanged.

In [38]:
# original unrounded revenue is still there
movies[0]['revenue']

42093706

### Filtering with List Comprehensions

Now another thing we can do with list comprehension is to add an if statement.  Let's see how we can do this.

Below let's only select the movies from the year 2000.

In [43]:
movies_2000 = [movie for movie in movies if movie['year'] == 2000]

movies_2000[:1]

[{'title': 'X-Men',
  'genre': 'Adventure',
  'budget': 75000000,
  'runtime': 104.0,
  'year': 2000,
  'month': 7,
  'revenue': 296339527}]

Let's see make sure that we understand the format.

In [44]:
#              return_val, loop         if condition 
movies_2000 = [movie for movie in movies if movie['year'] == 2000]

Again, we can write this out, working on the easier components to the more difficult components.

In [45]:
# 1. For loop
# movies_2000 = [ ...for movie in movies ...]

# 2. if statement
# movies_2000 = [movie for movie in movies if movie['year'] == 2000]

# 3. update return value
# movies_2000 = [movie['title'] for movie in movies if movie['year'] == 2000] 

Now it's your turn.  Write out a list comprehension that selects the movies that have a runtime less than 80 minutes.

In [52]:
short_movies = [movie for movie in movies if movie['runtime'] < 80]
short_movies[:1]

[{'title': 'Winnie the Pooh',
  'genre': 'Animation',
  'budget': 30000000,
  'runtime': 63.0,
  'year': 2011,
  'month': 4,
  'revenue': 14460000}]

Finally, let's alter the return value as well.  This time, we'll just return the title of movies that are less than 80 minutes.

In [55]:
short_movie_titles = [movie['title'] for movie in movies if movie['runtime'] < 80]
short_movie_titles[:3]

['Winnie the Pooh', 'Fantasia 2000', 'The Thief and the Cobbler']

> So as you can see, we just had to update the code in the very beginning of our list comprehension.

### A Challenge

Now let's try to select the genre of all of our movies and make sure they are `title`ized.

In [57]:
movies[0]['genre']

'Crime'

> Uh oh.

In [62]:
[movie['genre'].title() for movie in movies]

AttributeError: 'float' object has no attribute 'title'

Ok, so we see an error message -- saying the `float` object has no attribute capitalize.  The real problem is that apparently we are working with a float, where we thought we were working with a string.

And this is likely because, somewhere in our list of dictionaries, genre is a float -- perhaps a `nan` (not a number)  value.

Ok, so we can work around this by adding an if condition to only call capitalize where the genre is of type string.  Try to do this with a list comprehension below.

In [65]:

cap_movies = [movie['genre'].title() for movie in movies \
              if type(movie['genre']) == str ]
cap_movies[:3]

['Crime', 'Science Fiction', 'Comedy']

### Summary

In this lesson, we saw how to use list comprehensions to coerce our data. 

In [71]:
titles = [movie['title'] for movie in movies]
titles[:3]

['Oliver Twist', 'X-Men: Apocalypse', 'Man on the Moon']

For example, above we start with a list of dictionaries, and then use a list of dictionaries to coerce our data to a string. 

We then saw how we can add an if statement to filter from our list. 

In [73]:
# \ used to split statement into multiple lines
short_movie_titles = [movie['title'] for movie in movies \
                      if movie['runtime'] < 80]
short_movie_titles[:3]

['Winnie the Pooh', 'Fantasia 2000', 'The Thief and the Cobbler']