List Comprehensions and Functional Tools:

Python supports the procedural, object-oriented, and
function programming paradigms. In fact, Python has a host of tools that most would
considered functional in nature —closures,
generators, lambdas, comprehensions, maps, decorators, function objects, and
more. These tools allow us to apply and combine functions in powerful ways, and often
offer state retention and coding solutions that are alternatives to classes and OOP.

In short, **list
comprehensions apply an arbitrary expression to items in an iterable, rather than applying a function.** Accordingly, they can be more general tools. In later releases, the
comprehension was extended to other roles—sets, dictionaries, and even the value
generator expressions. It’s not just for lists anymore.

List Comprehensions Versus map:

Python’s built-in **ord function** returns the integer code point of a single character
(the **chr** built-in is the converse—it returns the character for an integer code point).
These happen to be ASCII codes if your characters fall into the ASCII character set’s 7-
bit code point range:

In [1]:
ord('h')

104

Now, suppose we wish to collect the ASCII codes of all characters in an entire string.
Perhaps the most straightforward approach is to use a simple for loop and append the
results to a list:

In [2]:
res = []
for x in 'spam':
    res.append(ord(x))  # Manual results collection

In [3]:
res

[115, 112, 97, 109]

In [5]:
#using map function

res = list(map(ord, 'spam')) # Apply expression to sequence (or other)

res

[115, 112, 97, 109]

List comprehensions collect the results of applying an arbitrary expression to an iterable
of values and return them in a new list. Syntactically, list comprehensions are enclosed
in square brackets—to remind you that they construct lists. In their simple form, within
the brackets you code an expression that names a variable followed by what looks like
a for loop header that names the same variable. Python then collects the expression’s
results for each iteration of the implied loop.

The effect of the preceding example is similar to that of the manual for loop and the
map call. List comprehensions become more convenient, though, when we wish to apply
an arbitrary expression to an iterable instead of a function:

In [8]:
[ord(x) for x in 'spam']

[115, 112, 97, 109]

In [9]:
#another example

[x**2 for x in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

To do similar work with a map call, we would probably need to invent a little
function to implement the square operation. Because we won’t need this function elsewhere,
we’d typically (but not necessarily) code it inline, with a lambda, instead of using
a def statement elsewhere:

In [10]:
list(map((lambda x: x**2), range(10)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Adding Tests and Nested Loops: filter

In [11]:
[x for x in range(5) if x % 2 == 0]

[0, 2, 4]

In [12]:
list(filter((lambda x: x % 2 == 0), range(5)))

[0, 2, 4]

In [13]:
res = []
for x in range(5):
    if x % 2 == 0:
        res.append(x)

In [14]:
res

[0, 2, 4]

we can combine
an if clause and an arbitrary expression in our list comprehension, to give it the effect
of a filter and a map, in a single expression:

In [18]:
[x ** 2 for x in range(10) if x % 2 == 0]

[0, 4, 16, 36, 64]

In [25]:
list(map((lambda x: x**2),filter((lambda x: x % 2 == 0), range(10))))

[0, 4, 16, 36, 64]

In [4]:
res = []

for x in range(10):
    if x % 2 == 0:
        res.append((x**2))       

In [5]:
res

[0, 4, 16, 36, 64]

Formal comprehension syntax

In fact, list comprehensions are more general still. In their simplest form, you must
always code an accumulation expression and a single for clause:
    
[ expression for target in iterable ]

Though all other parts are optional, they allow richer iterations to be expressed—you
can code any number of nested for loops in a list comprehension, and each may have
an optional associated if test to act as a filter. The general structure of list comprehensions
looks like this:
    

In [None]:
[ expression for target1 in iterable1 if condition1
            for target2 in iterable2 if condition2 ...
            for targetN in iterableN if conditionN ]

This same syntax is inherited by set and dictionary comprehensions as well as the
generator expressions coming up, though these use different enclosing characters (curly
braces or often-optional parentheses), and the dictionary comprehension begins with
two expressions separated by a colon (for key and value).

We experimented with the if filter clause in the previous section. When for clauses
are nested within a list comprehension, they work like equivalent nested for loop statements.
For example:

In [27]:
res = [x + y for x in [0, 1, 2] for y in [100, 200, 300]]

In [28]:
res

[100, 200, 300, 101, 201, 301, 102, 202, 302]

This has the same effect as this substantially more verbose equivalent:

In [29]:
res = []

for x in [0, 1, 2]:
    for y in [100, 200, 300]:
        res.append(x+y)

In [30]:
res

[100, 200, 300, 101, 201, 301, 102, 202, 302]

Although list comprehensions construct list results, remember that they can iterate over
any sequence or other iterable type. Here’s a similar bit of code that traverses strings
instead of lists of numbers, and so collects concatenation results:

In [31]:
[ x + y for x in 'spam' for y in 'SPAM']

['sS',
 'sP',
 'sA',
 'sM',
 'pS',
 'pP',
 'pA',
 'pM',
 'aS',
 'aP',
 'aA',
 'aM',
 'mS',
 'mP',
 'mA',
 'mM']

Each for clause can have an associated if filter, no matter how deeply the loops are
nested—though use cases for the following sort of code, apart from perhaps multidimensional
arrays, start to become more and more difficult to imagine at this level:

In [32]:
[x + y for x in 'spam' if x in 'sm' for y in 'SPAM' if y in ('P', "A")]

['sP', 'sA', 'mP', 'mA']

In [33]:
[x + y + z for x in 'spam' if x in 'sm'
            for y in 'SPAM' if y in ('P', 'A')
            for z in '123' if z > '1']

['sP2', 'sP3', 'sA2', 'sA3', 'mP2', 'mP3', 'mA2', 'mA3']

Finally, here is a similar list comprehension that illustrates the effect of attached if
selections on nested for clauses applied to numeric objects rather than strings:

In [34]:
[(x,y) for x in range(5) if x % 2 == 0 for y in range(5) if y % 2 == 1]

[(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]

This expression combines even numbers from 0 through 4 with odd numbers from 0
through 4. The if clauses filter out items in each iteration. Here is the equivalent statement-
based code:

In [36]:
res = []

for x in range(5):
    if x % 2 == 0:
        for y in range(5):
            if y % 2 == 1:
                res.append((x,y))

In [37]:
res

[(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]

List Comprehensions and Matrixes:

one basic way
to code matrixes (a.k.a. multidimensional arrays) in Python is with nested list structures.
The following, for example, defines two 3 × 3 matrixes as lists of nested lists:

In [6]:
M = [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]

In [8]:
N = [[2, 2, 2],
     [3, 3, 3],
     [4, 4, 4]]

Given this structure, we can always index rows, and columns within rows, using normal
index operations:

In [9]:
M[1] # Row 2 -- because index starts from zero

[4, 5, 6]

In [11]:
M[1][2] # Row 2, item 3

6

List comprehensions are powerful tools for processing such structures, though, because
they automatically scan rows and columns for us. For instance, although this structure
stores the matrix by rows, to collect the second column we can simply iterate across the
rows and pull out the desired column, or iterate through positions in the rows and
index as we go:

In [12]:
[row[1] for row in M] # Column 2

[2, 5, 8]

In [14]:
[M[row][1] for row in (0,1,2)] # Using offsets

[1, 4, 7]

Given positions, we can also easily perform tasks such as pulling out a diagonal. The
first of the following expressions uses range to generate the list of offsets and then
indexes with the row and column the same, picking out M[0][0], then M[1][1], and so
on. The second scales the column index to fetch M[0][2], M[1][1], etc. (we assume the
matrix has the same number of rows and columns):

In [15]:
[M[i][i] for i in range(len(M))] # Diagonals

[1, 5, 9]

In [18]:
[M[i][len(M)-1-i] for i in range(len(M))]

[3, 5, 7]

In [19]:
[M[len(M)-1-i][i] for i in range(len(M))]

[7, 5, 3]

Changing such a matrix in place requires assignment to offsets (use range twice if shapes
differ):

In [23]:
L = [[1, 2, 3], [4, 5, 6]] 

for i in range(len(L)):
    for j in range(len(L[i])):     # Update in place
        L[i][j] +=10
        

In [24]:
L

[[11, 12, 13], [14, 15, 16]]

We can’t really do the same with list comprehensions, as they make new lists, but we
could always assign their results to the original name for a similar effect. For example,
we can apply an operation to every item in a matrix, producing results in either a simple
vector or a matrix of the same shape:

In [28]:
[col + 10 for row in M for col in row] # Assign to M to retain new value

[11, 12, 13, 14, 15, 16, 17, 18, 19]

In [31]:
[[col + 10 for col in row] for row in M]

[[11, 12, 13], [14, 15, 16], [17, 18, 19]]

To understand these, translate to their simple statement form equivalents that follow
—indent parts that are further to the right in the expression (as in the first loop in the
following), and make a new list when comprehensions are nested on the left (like the
second loop in the following). As its statement equivalent makes clearer, the second
expression in the preceding works because the row iteration is an outer loop: for each
row, it runs the nested column iteration to build up one row of the result matrix:

In [32]:
res = []

for row in M: # Statement equivalents
    for col in row: # Indent parts further right
        res.append(col + 10)

In [33]:
res

[11, 12, 13, 14, 15, 16, 17, 18, 19]

In [34]:
res = []

for row in M:
    tmp = [] # Left-nesting starts new list
    for col in row:
        tmp.append(col + 10)
    res.append(tmp)

In [35]:
res

[[11, 12, 13], [14, 15, 16], [17, 18, 19]]

Finally, with a bit of creativity, we can also use list comprehensions to combine values
of multiple matrixes. The following first builds a flat list that contains the result of
multiplying the matrixes pairwise, and then builds a nested list structure having the
same values by nesting list comprehensions again:

In [36]:
M

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [37]:
N

[[2, 2, 2], [3, 3, 3], [4, 4, 4]]

In [38]:
[M[row][col] * N[row][col] for row in range(3) for col in range(3)]

[2, 4, 6, 12, 15, 18, 28, 32, 36]

In [39]:
[[M[row][col] * N[row][col] for col in range(3)] for row in range(3)]

[[2, 4, 6], [12, 15, 18], [28, 32, 36]]

This last expression works because the row iteration is an outer loop again; it’s equivalent
to this statement-based code:

In [40]:
res = []

for row in range(3):
    tmp = []
    for col in range(3):
        tmp.append(M[row][col] * N[row][col])
    res.append(tmp)

In [41]:
res

[[2, 4, 6], [12, 15, 18], [28, 32, 36]]

And for more fun, we can use zip to pair items to be multiplied—the following comprehension
and loop statement forms both produce the same list-of-lists pairwise multiplication
result as the last preceding example (and because zip is a generator of values
in 3.X, this isn’t as inefficient as it may seem):

In [42]:
[[col1 * col2 for (col1, col2) in zip(row1, row2)] for (row1, row2) in zip(M, N)]

[[2, 4, 6], [12, 15, 18], [28, 32, 36]]

In [45]:
res = []

for (row1, row2) in zip(M, N):
    tmp = []
    for (col1, col2) in zip(row1, row2):
        tmp.append(col1 * col2)
    res.append(tmp)

In [46]:
res

[[2, 4, 6], [12, 15, 18], [28, 32, 36]]

On the other hand: performance, conciseness, expressiveness
    
However, in this case, there is currently a substantial performance advantage to the
extra complexity: based on tests run under Python today, map calls can be twice as fast
as equivalent for loops, and list comprehensions are often faster than map calls. This
speed difference can vary per usage pattern and Python, but is generally due to the fact
that map and list comprehensions run at C language speed inside the interpreter, which
is often much faster than stepping through Python for loop bytecode within the PVM.

In addition, list comprehensions offer a code conciseness that’s compelling and even
warranted when that reduction in size doesn’t also imply a reduction in meaning for
the next programmer. Moreover, many find the expressiveness of comprehensions to
be a powerful ally. Because map and list comprehensions are both expressions, they also
can show up syntactically in places that for loop statements cannot, such as in the
bodies of lambda functions, within list and dictionary literals, and more.

**Generator Functions and Expressions:**

• Generator functions (available since 2.3) are coded as normal def statements, but
use yield statements to return results one at a time, suspending and resuming their
state between each.

• Generator expressions (available since 2.4) are similar to the list comprehensions
of the prior section, but they return an object that produces results on demand
instead of building a result list.

Generator Functions: yield Versus return:

Generator functions are like normal functions in most respects, and in fact are coded
with normal def statements. However, when created, they are compiled specially into
an object that supports the iteration protocol. And when called, they don’t return a
result: they return a result generator that can appear in any iteration context.

State suspension

Unlike normal functions that return a value and exit, generator functions automatically
suspend and resume their execution and state around the point of value generation.
Because of that, they are often a useful alternative to both computing an entire series
of values up front and manually saving and restoring state in classes. The state that
generator functions retain when they are suspended includes both their code location,
and their entire local scope. Hence, their local variables retain information between
results, and make it available when the functions are resumed.

The chief code difference between generator and normal functions is that a generator
yields a value, rather than returning one—the yield statement suspends the function
and sends a value back to the caller, but retains enough state to enable the function to
resume from where it left off. When resumed, the function continues execution immediately
after the last yield run. From the function’s perspective, this allows its code
to produce a series of values over time, rather than computing them all at once and
sending them back in something like a list.

Iteration protocol integration:

To truly understand generator functions, you need to know that they are closely bound
up with the notion of the iteration protocol in Python. As we’ve seen, iterator objects
define a __next__ method (next in 2.X), which either returns the next item in the iteration,
or raises the special StopIteration exception to end the iteration. An iterable
object’s iterator is fetched initially with the iter built-in function, though this step is a
no-op for objects that are their own iterator.

In [7]:
def gensquares(n):
    for i in range(n):
        yield i ** 2  # Resume here later

This function yields a value, and so returns to its caller, each time through the loop;
when it is resumed, its prior state is restored, including the last values of its variables
i and N, and control picks up again immediately after the yield statement. For example,
when it’s used in the body of a for loop, the first iteration starts the function and gets
its first result; thereafter, control returns to the function after its yield statement each
time through the loop:

In [8]:
for i in gensquares(5):     # Resume the function
    print(i, end=' : ')     # Print last yielded value

0 : 1 : 4 : 9 : 16 : 

In [9]:
x = gensquares(5)

In [10]:
x

<generator object gensquares at 0x0000000004DA8830>

You get back a generator object that supports the iteration protocol —the generator function was compiled to return this automatically. The returned
generator object in turn has a __next__ method that starts the function or resumes it
from where it last yielded a value, and raises a StopIteration exception when the end
of the series of values is reached and the function returns. For convenience, the
next(X) built-in calls an object’s X.__next__() method for us in 3.X (and X.next() in
2.X):

In [11]:
next(x)

0

In [12]:
next(x)

1

In [13]:
next(x)

4

In [14]:
next(x)

9

In [15]:
next(x)

16

In [16]:
next(x)

StopIteration: 

In [17]:
y = gensquares(5) # Returns a generator which is its own iterator

In [18]:
iter(y) is y # iter() is not required: a no-op here

True

In [19]:
next(y)   # Can run next()immediately

0

Why generator functions?

In [20]:
def buildsquares(n):
    res = []
    for i in range(n): res.append(i ** 2)
    return res

In [22]:
for x in buildsquares(5): print(x, end=' : ')

0 : 1 : 4 : 9 : 16 : 

In [23]:
for x in [n**2 for n in range(5)]:
    print(x, end=' : ')

0 : 1 : 4 : 9 : 16 : 

In [24]:
for x in map((lambda n: n**2), range(5)):
    print(x, end=' : ')

0 : 1 : 4 : 9 : 16 : 

However, generators can be better in terms of both memory use and performance in
larger programs. They allow functions to avoid doing all the work up front, which is
especially useful when the result lists are large or when it takes a lot of computation to
produce each value. Generators distribute the time required to produce the series of
values among loop iterations.

In [27]:
def ups(line):
    for sub in line.split(','):    # Substring generator
        yield sub.upper()

In [30]:
tuple(ups('aaa,bbb,ccc')) # All iteration contexts

('AAA', 'BBB', 'CCC')

In [32]:
{i: s for (i, s) in enumerate(ups('aaa,bbb,ccc'))}

{0: 'AAA', 1: 'BBB', 2: 'CCC'}

Extended generator function protocol: send versus next

In Python 2.5, a send method was added to the generator function protocol. The send
method advances to the next item in the series of results, just like __next__, but also
provides a way for the caller to communicate with the generator, to affect its operation.

When this extra protocol is used, values are sent into a generator G by calling
G.send(value). The generator’s code is then resumed, and the yield expression in the
generator returns the value passed to send. If the regular G.__next__() method (or its
next(G) equivalent) is called to advance, the yield simply returns None. For example:

In [33]:
def gen():
    for i in range(10):
        X = yield i
        print(X)

In [34]:
G = gen()

In [35]:
next(G) # Must call next() first, to start generator

0

In [36]:
G.send(77) # Advance, and send value to yield expression

77


1

In [37]:
G.send(88)

88


2

In [38]:
next(G) # next() and X.__next__() send None

None


3

In [39]:
next(G)

None


4

The send method can be used to code a generator that its caller can terminate
by sending a termination code, or redirect by passing a new position in data
being processed inside the generator.

In addition, generators in 2.5 and later also support a throw(type) method to raise an
exception inside the generator at the latest yield, and a close method that raises a
special GeneratorExit exception inside the generator to terminate the iteration entirely.


Note that while Python 3.X provides a next(X) convenience built-in that calls the
X.__next__() method of an object, other generator methods, like send, must be called
as methods of generator objects directly (e.g., G.send(X)). This makes sense if you realize
that these extra methods are implemented on built-in generator objects only,
whereas the __next__ method applies to all iterable objects—both built-in types and
user-defined classes.

Generator Expressions: Iterables Meet Comprehensions

Because the delayed evaluation of generator functions was so useful, it eventually
spread to other tools. In both Python 2.X and 3.X, the notions of iterables and list
comprehensions are combined in a new tool: generator expressions. Syntactically, generator
expressions are just like normal list comprehensions, and support all their syntax
—including if filters and loop nesting—but they are enclosed in parentheses instead
of square brackets (like tuples, their enclosing parentheses are often optional):

In [40]:
[x**2 for x in range(5)] # List comprehension: build a list

[0, 1, 4, 9, 16]

In [41]:
(x**2 for x in range(5)) # Generator expression: make an iterable

<generator object <genexpr> at 0x0000000004DA84C0>

In fact, at least on a functionality basis, coding a list comprehension is essentially the
same as wrapping a generator expression in a list built-in call to force it to produce
all its results in a list at once:

In [42]:
list(x ** 2 for x in range(4)) # List comprehension equivalence

[0, 1, 4, 9]

generator expressions are very different: instead of building
the result list in memory, they return a generator object—an automatically created
iterable. This iterable object in turn supports the iteration protocol to yield one piece
of the result list at a time in any iteration context. The iterable object also retains generator state while active—the variable x in the preceding expressions, along with the
generator’s code location.

In [1]:
G = (x ** 2 for x in range(4))

In [2]:
iter(G) is G     # iter(G) optional: __iter__ returns self

True

In [3]:
next(G)     # Generator objects: automatic methods

0

In [4]:
next(G)

1

In [5]:
next(G)

4

In [6]:
next(G)

9

In [7]:
next(G)

StopIteration: 

In [8]:
G

<generator object <genexpr> at 0x0000000004C6DC50>

Again, we don’t typically see the next iterator machinery under the hood of a generator
expression like this because for loops trigger it for us automatically:

In [9]:
for num in (x ** 2 for x in range(4)): # Calls next() automatically
    print('%s, %s' % (num, num / 2.0))

0, 0.0
1, 0.5
4, 2.0
9, 4.5


For example, the following deploys generator expressions in the string join method
call and tuple assignment, iteration contexts both. In the first test here, join runs the
generator and joins the substrings it produces with nothing between—to simply concatenate:

In [10]:
''.join(x.upper() for x in 'aaa,bbb,ccc'.split(','))

'AAABBBCCC'

In [11]:
a, b, c = (x + '\n' for x in 'aaa,bbb,ccc'.split(','))

In [12]:
a, c

('aaa\n', 'ccc\n')

Notice how the join call in the preceding doesn’t require extra parentheses around the
generator. Syntactically, parentheses are not required around a generator expression
that is the sole item already enclosed in parentheses used for other purposes—like those
of a function call. Parentheses are required in all other cases, however, even if they seem
extra, as in the second call to sorted that follows:

In [18]:
sum(x ** 2 for x in range(4))    # Parens optional

14

In [19]:
sorted(x ** 2 for x in range(4)) # Parens optional

[0, 1, 4, 9]

In [20]:
sorted((x ** 2 for x in range(4)), reverse=True) # Parens required

[9, 4, 1, 0]

Why generator expressions?

Just like generator functions, generator expressions are a memory-space optimization
—they do not require the entire result list to be constructed all at once, as the squarebracketed
list comprehension does. Also like generator functions, they divide the work
of results production into smaller time slices—they yield results in piecemeal fashion,
instead of making the caller wait for the full set to be created in a single call.

On the other hand, generator expressions may also run slightly slower than list comprehensions
in practice, so they are probably best used only for very large result sets,
or applications that cannot wait for full results generation.

Generator expressions versus map:

generator expressions
often are equivalent to 3.X map calls, because both generate result items on request. Like
list comprehensions, though, generator expressions may be simpler to code when the
operation applied is not a function call. In 2.X, map makes temporary lists and generator
expressions do not, but the same coding comparisons apply:

In [26]:
list(map(abs, (-1, -2, 3, 4))) # Map function on tuple

[1, 2, 3, 4]

In [27]:
list(abs(x) for x in (-1, -2, 3, 4)) # Generator expression

[1, 2, 3, 4]

In [28]:
list(map(lambda x: x * 2, (1, 2, 3, 4))) # Nonfunction case

[2, 4, 6, 8]

In [31]:
list(x * 2 for x in (1, 2, 3, 4)) # Simpler as generator?

[2, 4, 6, 8]

The same holds true for text-processing use cases like the join call we saw earlier—a
list comprehension makes an extra temporary list of results, which is completely pointless
in this context because the list is not retained, and map loses simplicity points compared
to generator expression syntax when the operation being applied is not a call:

In [32]:
line = 'aaa,bbb,ccc'

In [33]:
''.join([x.upper() for x in line.split(',')]) # Makes a pointless list

'AAABBBCCC'

In [34]:
''.join(x.upper() for x in line.split(',')) # Generates results

'AAABBBCCC'

In [35]:
''.join(map(str.upper, line.split(','))) # Generates results

'AAABBBCCC'

In [36]:
''.join(x * 2 for x in line.split(',')) # Simpler as generator?

'aaaaaabbbbbbcccccc'

In [37]:
''.join(map(lambda x: x * 2, line.split(',')))

'aaaaaabbbbbbcccccc'

Both map and generator expressions can also be arbitrarily nested, which supports general
use in programs, and requires a list call or other iteration context to start the
process of producing results. For example, the list comprehension in the following
produces the same result as the 3.X map and generator equivalents that follow it, but
makes two physical lists; the others generate just one integer at a time with nested
generators, and the generator expression form may more clearly reflect its intent:

In [39]:
[x * 2 for x in [abs(x) for x in (-1, -2, 3, 4)]] # Nested comprehensions

[2, 4, 6, 8]

In [40]:
list(map(lambda x: x * 2, map(abs, (-1, -2, 3, 4)))) # Nested maps

[2, 4, 6, 8]

In [42]:
list(x * 2 for x in (abs(x) for x in (-1, -2, 3, 4))) # Nested generators

[2, 4, 6, 8]

Although the effect of all three of these is to combine operations, the generators do so
without making multiple temporary lists. In 3.X, the next example both nests and
combines generators—the nested generator expression is activated by map, which in
turn is only activated by list.

In [43]:
import math

list(map(math.sqrt, (x ** 2 for x in range(4)))) # Nested combinations

[0.0, 1.0, 2.0, 3.0]

In [44]:
list(map(abs, map(abs, map(abs, (-1, 0, 1))))) # Nesting gone bad?

[1, 0, 1]

In [45]:
list(abs(x) for x in (abs(x) for x in (abs(x) for x in (-1, 0, 1))))

[1, 0, 1]

When used well, though, generator expressions combine the expressiveness of list
comprehensions with the space and time benefits of other iterables. Here, for example,
nonnested approaches provide simpler solutions but still leverage generators’ strengths
—per a Python motto, flat is generally better than nested:

In [46]:
list(abs(x) * 2 for x in (-1, -2, 3, 4)) # Unnested equivalents

[2, 4, 6, 8]

In [47]:
list(math.sqrt(x ** 2) for x in range(4)) # Flat is often better

[0.0, 1.0, 2.0, 3.0]

In [49]:
list(abs(x) for x in (-1, 0, 1))

[1, 0, 1]

Generator expressions versus filter:

Generator expressions also support all the usual list comprehension syntax—including
if clauses, which work like the filter call we met earlier. Because filter is an iterable
in 3.X that generates its results on request, a generator expression with an if clause is
operationally equivalent (in 2.X, filter produces a temporary list that the generator
does not, but the code comparisons again apply). Again, the join in the following
suffices to force all forms to produce their results:

In [50]:
line = 'aa bbb c'

In [51]:
''.join(x for x in line.split() if len(x) > 1) # Generator with 'if'

'aabbb'

In [52]:
''.join(filter(lambda x: len(x) > 1, line.split())) # Similar to filter

'aabbb'

The generator seems marginally simpler than the filter here. As for list comprehensions,
though, adding processing steps to filter results requires a map too, which makes
filter noticeably more complex than a generator expression:

In [53]:
''.join(x.upper() for x in line.split() if len(x) > 1)

'AABBB'

In [54]:
''.join(map(str.upper, filter(lambda x: len(x) > 1, line.split())))

'AABBB'

In [56]:
res = ''
for x in line.split():      # Statement equivalent?
    if len(x) > 1:           # This is also a join
        res += x.upper()

In [57]:
res

'AABBB'

The equivalent generator function requires slightly more code, but as a multiple-statement
function it will be able to code more logic and use more state information if
needed.

In [58]:
def timesfour(s):  # Generator function
    for c in s:
        yield c * 4

In [59]:
g = timesfour('spam')

In [60]:
list(g)   # Iterate automatically

['ssss', 'pppp', 'aaaa', 'mmmm']

To clients, the two are more similar than different. Both expressions and functions
support both automatic and manual iteration—the prior list call iterates automatically,
and the following iterate manually:

In [61]:
g = (c * 4 for c in 'spam')

In [62]:
i = iter(g)   # Iterate manually (expression)

In [63]:
next(i)

'ssss'

In [64]:
next(i)

'pppp'

In [65]:
h = timesfour('spam')

In [66]:
i = iter(h)   # Iterate manually (function)

In [67]:
next(i)

'ssss'

In [68]:
next(i)

'pppp'

In [72]:
line = 'aa bbb c'

In [73]:
''.join(x.upper() for x in line.split() if len(x) > 1) # Expression

'AABBB'

In [74]:
def gensub(line):                            # Function
    for x in line.split():
        if len(x) > 1:
            yield x.upper()

In [75]:
''.join(gensub(line)) # But why generate?

'AABBB'

Generators Are Single-Iteration Objects

A subtle but important point: both generator functions and generator expressions are
their own iterators and thus support just one active iteration—unlike some built-in
types, you can’t have multiple iterators of either positioned at different locations in the
set of results. Because of this, a generator’s iterator is the generator itself; in fact, as
suggested earlier, calling iter on a generator expression or function is an optional noop:

In [76]:
G = (c * 4 for c in 'SPAM')

In [77]:
iter(G) is G # My iterator is myself: G has __next__

True

If you iterate over the results stream manually with multiple iterators, they will all point
to the same position.

Moreover, once any iteration runs to completion, all are exhausted—we have to make
a new generator to start again.

In [83]:
G = (c * 4 for c in 'SPAM') # Make a new generator    

In [84]:
I1 = iter(G) # Iterate manually

In [85]:
next(I1)

'SSSS'

In [86]:
next(I1)

'PPPP'

In [87]:
I2 = iter(G) # Second iterator at same position!

In [88]:
next(I2)

'AAAA'

This is different from the behavior of some built-in types, which support multiple iterators
and passes and reflect their in-place changes in active iterators:

In [99]:
L = [1, 2, 3, 4]

In [100]:
I1, I2 = iter(L), iter(L)

In [101]:
next(I1)

1

In [102]:
next(I1)

2

In [103]:
next(I2)   # Lists support multiple iterators

1

In [104]:
del L[2:] # Changes reflected in iterators

In [105]:
next(I1)

StopIteration: 

Generation in Built-in Types, Tools, and Classes:

dictionaries are iterables with iterators that produce keys on each
iteration:

In [106]:
D = {'a':1, 'b':2, 'c':3}

In [107]:
x = iter(D)

In [108]:
next(x)

'a'

In [109]:
next(x)

'b'

Generators and library tools: Directory walkers
    
many Python standard library tools generate values
today too, including email parsers, and the standard directory walker—which at each
level of a tree yields a tuple of the current directory, its subdirectories, and its files:

In [None]:
import os

for (root, subs, files) in os.walk('.'): # Directory walk generator
    for name in files: # A Python 'find' operation
        if name.startswith('call'):
            print(root, name)

User-defined iterables in classes

it is also possible to implement arbitrary
user-defined generator objects with classes that conform to the iteration protocol. Such
classes define a special __iter__ method run by the iter built-in function, which in
turn returns an object having a __next__ method (next in 2.X) run by the next built-in
function:


In [None]:
class SomeIterable:
    def __init__(...): ... # On iter(): return self or supplemental object
    def __next__(...): ... # On next(): coded here, or in another class

Generating Scrambled Sequences:

Scrambling sequences:

we can reorder a sequence with slicing and concatenation,
moving the front item to the end on each loop; slicing instead of indexing the item
allows + to work for arbitrary sequence types:
    
    

In [112]:
L, S = [1, 2, 3], 'spam'

In [113]:
for i in range(len(S)): # For repeat counts 0..3
    S = S[1:] + S[:1] # Move front item to the end
    print(S, end=' ')

pams amsp mspa spam 

In [114]:
for i in range(len(L)):
    L = L[1:] + L[:1] # Slice so any sequence type works
    print(L, end=' ')

[2, 3, 1] [3, 1, 2] [1, 2, 3] 

In [115]:
def scramble(seq):
    res = []
    for i in range(len(seq)):
        res.append(seq[i:] + seq[:i])
    return res

In [116]:
scramble('spam')

['spam', 'pams', 'amsp', 'mspa']

In [117]:
def scramble(seq):
    return [seq[i:] + seq[:i] for i in range(len(seq))]

In [118]:
scramble('spam')

['spam', 'pams', 'amsp', 'mspa']

Generator functions

The preceding section’s simple approach works, but must build an entire result list in
memory all at once (not great on memory usage if it’s massive), and requires the caller
to wait until the entire list is complete (less than ideal if this takes a substantial amount
of time). We can do better on both fronts by translating this to a generator function that
yields one result at a time, using either coding scheme:

In [119]:
def scramble(seq):
    for i in range(len(seq)):
        seq = seq[1:] + seq[:1] # Generator function
        yield seq # Assignments work here

In [120]:
def scramble(seq):
    for i in range(len(seq)): # Generator function
        yield seq[i:] + seq[:i] # Yield one item per iteration

In [121]:
list(scramble('spam')) # list()generates all results

['spam', 'pams', 'amsp', 'mspa']

In [122]:
list(scramble((1, 2, 3))) # Any sequence type works

[(1, 2, 3), (2, 3, 1), (3, 1, 2)]

In [123]:
for x in scramble((1, 2, 3)): # for loops generate results
    print(x, end=' ')

(1, 2, 3) (2, 3, 1) (3, 1, 2) 