# Comprehensions and generators

by Koenraad De Smedt at UiB


---
How to collect the results of an iteration? Python has a compact way of doing this, called a *comprehension* (Norwegian: *inklusjon*). It will work on all data types that are *iterable*, i.e. for which it is possible to iterate over all elements, such as strings, lists, tuples and sets.

This notebook shows how to write:

1.  List comprehensions
2.  Set comprehensions
3.  Generator expressions

---

## List comprehension

We first introduce *list comprehension*, which includes selected items in a list. The following example selects every city name that is six letters long from a list of city names and returns a list including those items. The names of the variables are arbitrary.

In [None]:
cities = ['Bergen', 'Bodø', 'Kristiansand', 'Tromsø', 'Gvarv']
[city for city in cities if len(city) == 6]

The first expression is the result that is included at each iteration. The following includes the first letter of every city that is six letters long.

In [None]:
[city[0:2] for city in cities if len(city) == 6]

Conditions can be connected with `and` or `or`. The following includes the first letter of every name that is six letters long or ends on *ø*.

In [None]:
[city[0:2] for city in cities if len(city) == 6 or city[-1] == 'ø']

This works for all kinds of sequences. The following example includes all vowels from a string into a list.

In [None]:
sentence = 'Annabel moved from Bergen to Oslo.'
vowels = 'aeiouy'
[char for char in sentence if char.lower() in vowels]

When including, the selected elements can be processed.

In [None]:
[char.upper() for char in sentence if char.lower() in vowels]

Here is an alternative. It is often the case that a result can be obtained in different ways.

In [None]:
[char.upper() for char in sentence.lower() if char in vowels]

## Set comprehension

With curly braces, we do a *set comprehension*. Remember that a set does not have duplicate members.

In [None]:
{char.upper() for char in sentence.lower() if char in vowels}

Let's try `not`.

In [None]:
{char.upper() for char in sentence.lower() if char not in vowels}

---
## Generator expressions

If we use parentheses instead, we write a *generator expression*. It results in an iterator that computes its elements only if needed.

In [None]:
vowels_generator = (char.upper() for char in sentence.lower() if char in vowels)
vowels_generator

The generator produces the next element every time we ask. This is like a ticket machine that prints a number only whenever a button is pushed.

<img src="https://www.timeaccessinc.com/sites/default/files/imagecache/product_original/next_please_ticket_printer_l.jpg" alt = "zipper" width = 250px>

If we try to take the next element and none is left, there will be an error. Generators may be efficient if the sequence could be very long or infinite, and we need only a few elements.

Execute the following cell several times.

In [None]:
next(vowels_generator)

A generator is an iterable type, which means we can iterate over its element, for instance, with `for`. The following simply prints out all elements in the generator.

In [None]:
vowels_generator = (char.upper() for char in sentence.lower() if char in vowels)
for nxt in vowels_generator:
  print(nxt, 'is a vowel')

When we check if something is in the generator, it will only produce as many elements as needed to find what we are looking for.

In [None]:
vowels_generator = (char.upper() for char in sentence.lower() if char in vowels)
'O' in vowels_generator

Now take all remaining elements out. These are the elements which were *not* computed by the above expression.

In [None]:
[*vowels_generator]

---
##Comparison of iteration on a list and on a generator

The following is a generic function for searching if something occurs in any type of *iterable* (sequence, set or generator). After one instance is found, the iteration stops and `True` is returned. If the iteratation stops and the element has not been found, `False` is returned.

In [None]:
def find_something (thing, gen):
  for nxt in gen:
    print(nxt)
    if nxt == thing:
      return True
  return False

Here we test on a list produced by a list comprehension. The example looks for the first occurence of an *O*. All elements of the list are computed and the whole list is passed to the function.

In [None]:
find_something('O', [char.upper() for char in sentence if char.lower() in vowels])

In the following we test the same on a generator. The result is the same, but this way is potentially more efficient, because only four elements are computed and then the generator stops.

In [None]:
find_something('O', (char.upper() for char in sentence if char.lower() in vowels))

### Exercises


1.  Try replacing the 'O' with 'U' in the previous two function calls .
2.  Using `len` and a set comprehension, count how many *different* vowels there are in the `sentence`.
3.  Given the following variable, use a list comprehension to select all philosopher names longer than 5 characters and join them with the string `' & '`.

In [None]:
philosophers = ['Socrates', 'Plato', 'Aristoteles', 'Hume', 'Berkeley']