### Generators

We'll look at generator expressions first.

Generator expressions, like list comprehensions, essentially use the same syntax - a comprehension syntax.

Let's see an example of a list comprehension:

We want to get a list of numbers whose square root and cube roots are both integers.

In [1]:
from math import sqrt, cbrt
# cbrt is only available in Python 3.11 and higher

lst = [
    (i, int(sq), int(cb))
    for i in range(2, 100)
    if (sq := sqrt(i)) == int(sq) and (cb := cbrt(i)) == int(cb)
]

In [2]:
lst

[(64, 8, 4)]

Now, I want the first 5 numbers, but I have no idea what range I should use, so I'm going to try some large number:

In [3]:
lst = [
    (i, int(sq), int(cb))
    for i in range(2, 10_000)
    if (sq := sqrt(i)) == int(sq) and (cb := cbrt(i)) == int(cb)
]
lst

Still not enough, let's go higher:

In [4]:
lst = [
    (i, int(sq), int(cb))
    for i in range(2, 100_000)
    if (sq := sqrt(i)) == int(sq) and (cb := cbrt(i)) == int(cb)
]
lst

[(64, 8, 4), (729, 27, 9), (4096, 64, 16), (15625, 125, 25), (46656, 216, 36)]

Ok, so I have 5 numbers. But notice how I ended up doing more calculations that I actually needed (I ran the loop `100_000` times, but I only really needed to run `46_656`).

Now, what if I now wanted 10? We'll have to go higher yet:

In [5]:
lst = [
    (i, int(sq), int(cb))
    for i in range(2, 5_000_000)
    if (sq := sqrt(i)) == int(sq) and (cb := cbrt(i)) == int(cb)
]
lst

[(64, 8, 4),
 (729, 27, 9),
 (4096, 64, 16),
 (15625, 125, 25),
 (46656, 216, 36),
 (117649, 343, 49),
 (262144, 512, 64),
 (531441, 729, 81),
 (1000000, 1000, 100),
 (1771561, 1331, 121),
 (2985984, 1728, 144),
 (4826809, 2197, 169)]

Ok, so I have 12, more than I actually needed, but that will work to get the first 10:

In [6]:
lst = [
    (i, int(sq), int(cb))
    for i in range(2, 5_000_000)
    if (sq := sqrt(i)) == int(sq) and (cb := cbrt(i)) == int(cb)
]
results = lst[:10]
results

[(64, 8, 4),
 (729, 27, 9),
 (4096, 64, 16),
 (15625, 125, 25),
 (46656, 216, 36),
 (117649, 343, 49),
 (262144, 512, 64),
 (531441, 729, 81),
 (1000000, 1000, 100),
 (1771561, 1331, 121)]

There are a number of problems with this approach:
- If I want the first 10, or 20, or 30, or N in general, I have no idea what that range should be
- I may have a program where I don't know ahead of time how many numbers I will need from that result - how to get around that?
- I could just pick a huger number for the range, and hope that it will cover all the cases I need - but that's wasteful.

So, instead of using a list comprehension, we could try a generator expression. The main advantage here is that generator expressions only calculate one "iteration" at a time, so we don't waste producing more results than we need.

But, we still need to specify some range - however that has no upfront cost:

In [7]:
results = (
    (i, int(sq), int(cb))
    for i in range(2, 50_000_000)
    if (sq := sqrt(i)) == int(sq) and (cb := cbrt(i)) == int(cb)
)

Now, we want the first 10 elements - we can iterate over `results` but we cannot slice it:

In [8]:
extract = [next(results) for _ in range(10)]
extract

[(64, 8, 4),
 (729, 27, 9),
 (4096, 64, 16),
 (15625, 125, 25),
 (46656, 216, 36),
 (117649, 343, 49),
 (262144, 512, 64),
 (531441, 729, 81),
 (1000000, 1000, 100),
 (1771561, 1331, 121)]

Of course, the generator has been iterated over, so if we want to "restart" we just need to redefine it.

We'll do that, and use the `itertools` module's `islice` to get the first 10 items:

In [9]:
from itertools import islice

In [10]:
results = (
    (i, int(sq), int(cb))
    for i in range(2, 50_000_000)
    if (sq := sqrt(i)) == int(sq) and (cb := cbrt(i)) == int(cb)
)
list(islice(results, 10))

[(64, 8, 4),
 (729, 27, 9),
 (4096, 64, 16),
 (15625, 125, 25),
 (46656, 216, 36),
 (117649, 343, 49),
 (262144, 512, 64),
 (531441, 729, 81),
 (1000000, 1000, 100),
 (1771561, 1331, 121)]

We still have the issue of specifying that huge range - it works, but maybe I need to get the first 50 numbers - in which case that range may not be large enough - so we do not have general solution.

So, in this case, a generator expression gives us a partially better solution than a list comprehension, but an even better solution can be achieved using a generator function.

In [11]:
def numbers():
    number = 2
    while True:
        sq = sqrt(number)
        cb = cbrt(number)
        if sq == int(sq) and cb == int(cb):
            yield number, int(sq), int(cb)
        number += 1

You'll notice that `numbers` will basically produce an **infinite** number of results.

We can now use the `islice` like before:

In [12]:
list(islice(numbers(), 10))

[(64, 8, 4),
 (729, 27, 9),
 (4096, 64, 16),
 (15625, 125, 25),
 (46656, 216, 36),
 (117649, 343, 49),
 (262144, 512, 64),
 (531441, 729, 81),
 (1000000, 1000, 100),
 (1771561, 1331, 121)]

And of course, this is a completely general solution:

In [13]:
list(islice(numbers(), 3))

[(64, 8, 4), (729, 27, 9), (4096, 64, 16)]

In [14]:
list(islice(numbers(), 15))

[(64, 8, 4),
 (729, 27, 9),
 (4096, 64, 16),
 (15625, 125, 25),
 (46656, 216, 36),
 (117649, 343, 49),
 (262144, 512, 64),
 (531441, 729, 81),
 (1000000, 1000, 100),
 (1771561, 1331, 121),
 (2985984, 1728, 144),
 (4826809, 2197, 169),
 (7529536, 2744, 196),
 (11390625, 3375, 225),
 (16777216, 4096, 256)]

So, although generators and generator expressions are "single-use" objects - they are cheap to create, so if you need to iterate through the same generator multiple times, you simply re-create it.

The advantage is that generators do not hog memory to store all the results, and do not incur upfront costs calculating more values that we actually need.

You will notice than in Python 3, most built-in functions that are used for iteration are generators - functions like `zip`, `islice`, `enumerate`, and many many more.

There's a good reason for that, and you should make use of generators and generator expressions a lot more!

But, there's even more you can do with generator functions.

You're probably aware of context managers - the most common one is the context manager we use to open (and automatically close) files.

In [15]:
with open("planets.txt", "w") as f:
    f.writelines(["Mercury\n", "Venus\n", "Earth\n", "Mars\n"])

In [16]:
with open("planets.txt") as f:
    for line in f:
        print(line, end="")

Mercury
Venus
Earth
Mars


You'll notice how we did not need to close the file - that happened automatically once the context manager was exited.

We can write our own context manager using a generator function. To do that, we'll use a generator function and the `contextmanager` decorator in the `contextlib` module:

In [17]:
from contextlib import contextmanager

Let's start by writing a context manager skeleton:

In [18]:
@contextmanager
def echo():
    try:
        print("Entering context manager")
        yield lambda x: f"echo says: {x}"
    except Exception as ex:
        print(f"An exception occurred - we may want to handle it, or not: {ex}")
    finally:
        print("Exiting context manager - runs whether an exception occurred or not")

In [19]:
with echo() as func:
    print(func("hello"))
    print(func("bye"))

Entering context manager
echo says: hello
echo says: bye
Exiting context manager - runs whether an exception occurred or not


And the exit happens whether an exception occurs inside the context manager or not:

In [20]:
with echo() as func:
    print(func("hello"))
    raise ValueError("Wrong value")

Entering context manager
echo says: hello
An exception occurred - we may want to handle it, or not: Wrong value
Exiting context manager - runs whether an exception occurred or not


The exception could even be caused by that returned function itself:

In [21]:
@contextmanager
def square():
    try:
        print("Entering context manager")
        yield lambda x: x * x
    except Exception as ex:
        print(f"An exception occurred - we may want to handle it, or not: {ex}")
    finally:
        print("Exiting context manager - runs whether an exception occurred or not")

In [22]:
with square() as func:
    print(func(10))

Entering context manager
100
Exiting context manager - runs whether an exception occurred or not


In [None]:
with square() as func:
    print(func('a'))

Now let's apply this to something more practical.

When we read/write CSV files we can use the `csv` module.

Let's start by creating a CSV file:

In [None]:
import csv

In [None]:
headers = ["first_name", "last_name"]
data = [
    ("Isaac", "Newton"),
    ("Gottfried", "Leibniz"),
    ("Joseph", "Fourier"),
    ("John", "von Neumann"),
]

In [None]:
with open("test.csv", "w") as f:
    csv_writer = csv.writer(f)
    csv_writer.writerow(headers)
    for row in data:
        csv_writer.writerow(row)

We can check the contens of that file:

In [None]:
with open("test.csv") as f:
    print(f.readlines())

And of course we can open that file using a CSV reader:

In [None]:
with open("test.csv") as f:
    csv_reader = csv.reader(f)
    for row in csv_reader:
        print(row)

So, you'll notice that every time we want to write a CSV file, we need to open the file, set up the csv writer, do whatever we want, then close the file. Same goes for the reader.

We could combine these differnet steps using a custom context manager.

In [None]:
@contextmanager
def csv_reader(file_path):
    with open(file_path) as f:
        yield csv.reader(f)    

Now we can use it this way:

In [None]:
with csv_reader("test.csv") as reader:
    for row in reader:
        print(row)

We could do the same with the writer:

In [None]:
@contextmanager
def csv_writer(file_path):
    with open(file_path, "w") as f:
        yield csv.writer(f)  

In [None]:
with csv_writer("test2.csv") as writer:
    writer.writerow(['a', 'b'])

In [None]:
with csv_reader("test2.csv") as reader:
    for row in reader:
        print(row)

We could even combine both of those into a single context manager:

In [None]:
@contextmanager
def csv_file(file_path, *, mode="r"):
    if mode == "r":
        with open(file_path) as f:
            yield csv.reader(f)
    elif mode in {'w', 'a'}:
        with open(file_path, mode) as f:
            yield csv.writer(f)
    else:
        raise ValueError("Unsuported mode - must be one of 'r', 'w', 'a'")

In [None]:
with csv_file("test3.csv", mode="w") as writer:
    writer.writerow(list('abc'))
    writer.writerow(list('def'))

In [None]:
with csv_file("test3.csv") as reader:
    for row in reader:
        print(row)

Of course, this is just one example of using a context manager. Oftentimes, context managers are used with databases to start and commit (or rollback) transactions for example.

In general, when you find yourself writing code that:

1. sets up some resource(s)
2. does something with those resources
3. cleans up the resource(s) after you're done

you should immediately think of context managers - and generator functions can be used to very quickly and simply create context managers.

I will cover context managers in more detail in upcoming videos in this channel, stay tuned!

In conclusion, generator functions and expressions are fundamental to Python, and you should learn how to use them appropriately and effectively for more Pythonic code.