<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

Created by [Nathan Kelber](http://nkelber.com) and Zhuo Chen under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)<br />
For questions/comments/improvements, email nathan.kelber@ithaka.org or zhuo.chen@ithaka.org.<br />
___

# Python Comprehensions

**Description:** This notebook describes:
* What a Python comprehension is
* How to write and use list comprehensions
* How to write and use dictionary comprehensions
* How to write and use set comprehensions
* How to write and use generator comprehensions

**Use Case:** For Learners (Detailed explanation, not ideal for researchers)

**Difficulty:** Intermediate

**Completion Time:** 90 minutes

**Knowledge Required:** 
* Python Basics Series ([Start Python Basics 1](./python-basics-1.ipynb))

**Knowledge Recommended:** None

**Data Format:** None

**Libraries Used:** None

**Research Pipeline:** None
___

## List Comprehensions

A Python comprehension is a helpful shortcut for creating a list, dictionary, or set from an existing list, dictionary, or set. The same task can usually be accomplished with a for loop, map, or filter, but comprehensions have the benefit of being shorter.

### List Comprehensions (Numbers)

In this first example, we will use a list with numbers.

In [None]:
# Create a list of numbers

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
# Create a new list using a for loop

new_list = [] # An empty list we will add to

for item in numbers:
    if item > 5:
        new_list.append(item)

print(new_list)

Take a look again at the for loop.
```
for item in numbers:
    if item > 5:
        new_list.append(item)
```

We can read this as: for each ```item```, if ```item``` is bigger than 5, append ```item``` to ```new_list```. 

<!-- If we rearrange this slightly, we have the form for a list comprehension:

"Append item, for item in numbers" -->

<!-- We write this as list comprehension like so:

`new_list = [item for item in numbers]`

The brackets `[]` indicate we are creating a list. -->

In [None]:
# Now, let's do the same thing using a list comprehension

new_list = [item for item in numbers if item > 5]
print(new_list)

The order of the comprehension seems confusing. How do we understand it?
<!-- If the order of the comprehension is confusing, it may help to skip the first variable name and start with:
`for item in numbers`
then return to the beginning of the comprehension to see what will be appended:
`item`. -->

### An excursion: using set-builder to help us understand list comprehensions

We can use sets in mathematics to help us understand the syntax of a list comprehension. As I explain the notation of sets, you will soon see the similarity between the list comprehension and the set-builder notation. 

A set is essentially a collection of objects. Recall that in mathematics we notate a set in the following way: within a pair of curly brackets, we put all the elements of that set. 

$$\{5,6,7,8,9\}$$

If the set is very big in that it contains, say, hundreds of thousands of elements, we will not want to list all the elements exhaustively because it will take too long to write such a set! Suppose you want to create a set that contains all the natural numbers from 1 to 1000. You will definitely not want to write an extremely long set of numbers!

Fortunately, we have another way of writing a set, which we call the set-builder notation. 

The set-builder notation can be simply put in the following way. 
\begin{align}
\{y~|~\text{conditions that}~y~\text{must satisfy to be a member in the set}\}\\\hfill{OR}
\end{align}
\begin{align}
\{y~:~\text{conditions that}~y~\text{must satisfy to be a member in the set}\}  \\
\end{align}

Back to the previous scenario where you want to create a set that contains all the natural numbers from 1 to 1000, you can easily write the set using the set-builder notation. 

$$\{y~|~y\in N \wedge 1\leq y\leq 1000 \}$$

### Back to the list comprehension ###
Going back to the list comprehension we write to create ```new_list```, we can easily see an analogy. 
```
new_list = [item for item in numbers if item > 5]
```


The list comprehension also consists of two parts. To the left is the variable name standing for the elements we want to include in ```new_list```. To the right are the conditions we want those elements to satisfy! 

The only differences are:

(i) we use the list comprehension to create a **list**, not a set; 

(ii) the conditions start with a for loop, because we are looping over each element in ```numbers``` and check whether that element satisfies the specified if-condition;

(iii) there is no visible symbol like "$|$" or "$:$" that separates the two parts in list comprehensions.  


- ****Exercise one****

Create a list ```odd_num``` which contains all the odd numbers from the list ```numbers```. 

|Operator| Operation| Example | Evaluation |
    |---|----|---|---|
    |%| Modulus/Remainder| 5 % 2 | 1 |
    

```
odd_num=[]
for number in numbers:
    if number % 2 == 1:
        odd_num.append(number) 
```

Create a list ```new_list2``` which contains those numbers from the list ```numbers```, which, after being timed by 5, is smaller than 35. 

|Operator| Operation| Example | Evaluation |
    |---|----|---|---|
    |*| Multiplication | 7 * 8 | 56 |


<!-- Here's an example that changes what will be appended. -->
<!-- Again, if you find the order confusing, it may help to skip ahead to the `for`. You can also optionally include a parentheses that may help clarify which part will be appended: -->
<!-- # Create a new list where each number is doubled

new_list = [item * 2 for item in numbers]
print(new_list) -->
<!-- # # Create a new list where each number is doubled
# # Parentheses for clarity

# new_list = [(item * 2) for item in numbers]
# print(new_list) -->


### List Comprehensions (Strings)

A list comprehension also works on a list containing other data types, such as a string.

In [None]:
# Create a list of people
people = ['Aaron Aston',
         'Brianna Barton',
         'Carla Cameron',
         'Delia Darcy',
         'Evelyn Elgin',
         'Frederick Federov',
         'Gaston Garbo']

In the previous section, we see that in a list comprehension, the membership if-condition is nested in a for loop. Actually, not every list comprehension will have a if-condition. This is because in certain cases, the membership of an element is not further specified by a if-condition. Take a look at a simple example. 

**Question:** Can you take a guess what the resulting list ```names``` looks like?

In [None]:
### sometimes we may not see a if-condition in a list comprehension
names=[name for name in people] 
from pprint import pprint # import pprint to print out the list in a prettier way
pprint(names)

In many cases, though, we do not just want to loop over the elements in an existing list and append each element to a new list unchanged. We want to go through each element in the existing list, use a function to operate on that element and then append the output to the new list. 

In [None]:
# Create a new list that only includes first names
# Using a for loop

friends = []

for name in people:
    first_name = name.split()[0] # Split the name on whitespace, then grab the first name/item
    friends.append(first_name)
    
print(friends)



In [None]:
"John Doe".split()

In [None]:
"John Doe".split()[0]

In [None]:
# Create a new list that only includes first names
# Using a list comprehension

friends = [name.split()[0] for name in people]

print(friends)

In [None]:
# Create a new list that only includes the upper case form of the last names
friends =[]
for name in people:
    last_name_upper=name.split()[1].upper()
    friends.append(last_name_upper)
print(friends)

In [None]:
friends=[name.split()[1].upper() for name in people]

In [None]:
print(friends)

### List Comprehensions (Multiple Lists)

We can also create a list comprehension that pulls from multiple lists by using two for loops within a single list comprehension.
<!-- # # Create a first names list

# first_names = ['Aaron', 'Brianna', 'Carla', 'Delia', 'Evelyn', 'Frederick', 'Gaston']

# # Create a last names list

# last_names = ['Aston', 'Barton', 'Cameron', 'Darcy', 'Elgin', 'Federov', 'Garbo'] -->

Scenario: Suppose you are running a restaurant. For the lunch special, you provide different varieties of rice and different protein choices that go with the rice. 

In [None]:
rices=["white rice", "brown rice", "yellow rice"]

proteins=["beef", "pork", "chicken", "shrimp", "lamb","tofu"] 

In [None]:
# Create a list of all possible combinations of rice and protein 

all_lunch_special_choices = [rice+" with "+protein for rice in rices for protein in proteins]

pprint(all_lunch_special_choices)

In the previous example, the two lists we pull from are independent of each other. You can see that even if we switch the two for loops, the result is still a valid list comprehension. 

In [None]:
all_lunch_special_choices = [rice+" with "+protein for protein in proteins for rice in rices]

pprint(all_lunch_special_choices)

What if the lists we pull from are not independent of one another? What if one is nested in another? 
```
names=[[Abby, Bella,'Cecilia'],['Alex','Beatrice','Cynthia']]
```
Suppose we want to go through each sub-list in the list ```names```, grab the name starting with letter A, and put it in a new list ```A_names```.

In [None]:
names=[['Abby', 'Bella','Cecilia'],['Alex','Beatrice','Cynthia','David']]

A_names=[A_name for sub_list in names for A_name in sub_list]
A_names

In [None]:
A_names=[A_name for A_name in sub_list for sub_list in names ]

**Exercise two:** Define the variable ```saying``` to contain the list ['After', 'all', 'is', 'said', 'and', 'done', ',', 'more',
'is', 'said', 'than', 'done', '.']. Process this list using a for loop, and store the length of each word in a new list ```lengths```. Hint: begin by assigning the empty list to ```lengths```, using ```lengths = []```. Then each time through the loop, use ```append()``` to add another length value to the list. Now do the same thing using a list comprehension.

## Dictionary Comprehension
The form of a dictionary comprehension is the same as for a list. Since a dictionary comprehension may deal with keys, values, or both, we need to be prepared to use `.keys()`, `.values()`, or `.items()` (for both).

In [None]:
# Create a dictionary of contacts and occupations

contacts ={
 'Amanda Bennett': 'Engineer, electrical',
 'Bryan Miller': 'Radiation protection practitioner',
 'Christopher Garrison': 'Planning and development surveyor',
 'Debra Allen': 'Intelligence analyst',
 'Donna Decker': 'Architect',
 'Heather Bullock': 'Media planner',
 'Jason Brown': 'Energy manager',
 'Jason Soto': 'Lighting technician, broadcasting/film/video',
 'Marissa Munoz': 'Further education lecturer',
 'Matthew Mccall': 'Chief Technology Officer',
 'Michael Norman': 'Translator',
 'Nicole Leblanc': 'Financial controller',
 'Noah Delgado': 'Engineer, land',
 'Rachel Charles': 'Physicist, medical',
 'Stephanie Petty': 'Architect'}

When we loop over a dictionary, we will only loop over the keys of the dictionary. 

In [None]:
for item in contacts:
    print(item)

In [None]:
for key in contacts.keys():
    print(key)

To loop over both the keys and the values, we will need to use ```dict.items()```.

In [None]:
for item in contacts.items():
    print(item)

In [None]:
# Create a dictionary that only contains architects
architects={}
for item in contacts.items():
    if item[1]=='Architect':
        architects[item[0]]='Architect'
pprint(architects)       

In [None]:
# create the same dictionary using index to get the keys and values from each tuple
architects = {item[0]:item[1] for item in contacts.items() if item[1]=='Architect'}
print(architects)

In [None]:
# Create a dictionary that only contains architects

architects={name:occupation for (name,occupation) in contacts.items() if occupation == 'Architect'}
print(architects)

Note that our dictionary comprehension uses braces `{}` instead of brackets `[]`, since it is a dictionary. 

In the section of 'list comprehensions', we have created a new list by filtering on an old list (e.g. append a number to a new list if that number is bigger than 5). We have also created a new list by changing the original elements in an old list using a function (e.g. append the first name from an old list of names by splitting the name and selecting the first name). 

We can do the same thing in dictionary comprehensions. Just now, we have created a new dictionary by filtering on an old dictionary. 

In [None]:
# Create a dictionary only containing engineers
# Change longer title to just 'Engineer'

engineers = {name: occupation.split(',')[0] for (name, occupation) in contacts.items() if 'Engineer' in occupation}
print(engineers)

In [None]:
# Create a dictionary that indicates whether a person is an engineer

from pprint import pprint # import pprint for easier to read dictionary prints

engineers = {name: ('Engineer' if 'Engineer' in occupation else 'Not Engineer') for (name, occupation) in contacts.items()}
pprint(engineers)

**Exercise three** 
Suppose you are a grocery store owner. Due to the inflation, you have to raise the commodity price by 15%. In ```dict1``` are the commodity and their original price. You want to create a new dictionary with the new price.

```
dict1={"milk":3.49, "egg": 5.29, "bread": 2.99, "spinach": 1.99, "lettuce": '2.35', "banana":0.99}
```

## Set comprehension
Curly braces are used for both dictionaries and sets in Python. Which one is created depends on whether we supply the associated value or not. 

In [None]:
{1,2,3}

In [None]:
{1:'apple',2:'banana',3:'cherry'}

In [None]:
set1={5,6,7,8,9}
set2=set() ## note how we initialize an empty set
for num in set1:
    if num>5:
        set2.add(num)
print(set2)

In [None]:
set2={}
for num in set1:
    if num>5:
        set2.add(num)
print(set2)

In [None]:
set2={num for num in set1 if num>5}
set2

A set is an unordered collection of distinct objects. If you change the order of the elements or list an element more than once, that does not change the set. 

In [None]:
{1,2}=={2,1}

In [None]:
{1,1,2}=={1,2}

Suppose you run a book store and you record how many copies each book was sold in the past month. Now, you want to filter out all books from the dictionary that were sold more than 50 copies.

In [None]:
sale={
    'Great Expectations': 60, 
    'Lolita': 30, 
    'Scarlet Letter': 55, 
    'The Screw that Turns': 36, 
    'The Great Gatsby': 58,
    'The Beloved': 52
}

In [None]:
popular_books={book for (book,copy) in sale.items() if copy > 50}
print(popular_books)

<font size="5">$\color{blue}{\bf Coding~Challenge!}$</font>

Rewrite the following nested loop into a set comprehension:

 	
```
words = ['attribution', 'confabulation', 'elocution', 'sequoia', 'tenacious', 'unidirectional']
vsequences = set()
for word in words:
    vowels = []
    for char in word:
        if char in 'aeiou':
            vowels.append(char)
    vsequences.add(''.join(vowels))
sorted(vsequences)
```

In [None]:
words = ['attribution', 'confabulation', 'elocution', 'sequoia', 'tenacious', 'unidirectional']


## Generator comprehension
A generator comprehension is like a list comprehension, but instead of finding all the items you're interested in and packing them into list, it waits, and yields each item one by one. Because a generator expression only has to yield one item at a time, it can lead to big savings in memory usage. 

Generator expressions make the most sense in scenarios where you need to take one item at a time, do a lot of calculations based on that item, and then move on to the next item.

Generator comprehensions basically have the same syntax as list comprehensions, except that it uses parentheses ( ) instead of brackets [ ].

In [None]:
numbers=[5,6,7,8,9]
new_list=[num for num in numbers if num>5]
print(new_list)

In [None]:
len(new_list)

In [None]:
new_gen=(num for num in numbers if num>5)
type(new_gen) ### new_gen is a generator object

In [None]:
len(new_gen)

In [None]:
next(new_gen)

In [None]:
next(new_gen)

In [None]:
next(new_gen)

In [None]:
next(new_gen)

In [None]:
next(new_gen)

In [None]:
gen_to_list=list(new_gen)

**Exercise Four** 

* Create a generator object that will produce values from 0 to 30. Assign the result to ```result``` and use ```num``` as the iterator variable in the generator expression.

* Print the first 5 values by using ```next()``` appropriately in ```print()```.

* Print the rest of the values by using a for loop to iterate over the generator object.