# Generators - Overview

What is a generator? A generator is a specific type of function in Python, which has some special properties allowing it to be used as an iterator (ie. in a for loop). Let me quote from the nice reference https://realpython.com/introduction-to-python-generators/ here:

"
Introduced with PEP 255, generator functions are a special kind of function that return a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory.
"

Why would you need this?

"
Have you ever had to work with a dataset so large that it overwhelmed your machine’s memory? Or maybe you have a complex function that needs to maintain an internal state every time it’s called, but the function is too small to justify creating its own class. In these cases and more, generators and the Python yield statement are here to help.
"

Still not making sense? Let's try to explain with some examples.

As an example of this (following closely https://realpython.com/introduction-to-python-generators/), let's look at reading in our `stm.txt` file. This is on Moodle for those following the course, but we'll also generate a similar file here to illustrate the same points and make this more portable.

In [2]:
fpointer = open('stm.txt', 'w')
for i in range(1000):
    fpointer.write('2 2 4 5 6\n')
fpointer.close()

Let's first start by writing a function to read this in line-by-line using stuff we've already seen:

In [4]:
def txt_reader(file_name):
    file = open(file_name)
    lines = file.read().split("\n")
    result = [line.split(' ') for line in lines]
    # Result will be a list of lists. result[i] will refer to the ith row, result[i][j] will refer to the jth column
    # in the ith row.
    return result

txt_gen = txt_reader("stm.txt")
row_count = 0

for row in txt_gen:
    row_count += 1

print(f"Row count is {row_count}")

# Note that the entire file is stored in memory, so we can do:
print (txt_gen[2][4])
# To quickly get a specific value


Row count is 1001
6


However, if this file was extremely long, then it might not be possible to hold the whole thing in memory (or you just might not want to be so inefficient with memory usage). So if we just wanted to do something that involves reading the file in a linear order (for example counting the number of entries, or lines, in the file, or counting how often 123 occurs in the file), we can instead use a generator. This looks something like

In [5]:
def txt_reader(file_name):
    for row in open(file_name, "r"):
        yield row

txt_gen = txt_reader("stm.txt")
row_count = 0

for row in txt_gen:
    row_count += 1

print(f"Row count is {row_count}")

# Note that the entire file is NOT stored in memory, so we CANNOT do:
# print (txt_gen[2][10])
# To quickly get a specific value


Row count is 1000


So let's try and break down how this works. The magic here is the `yield` statement. If you run a function that evaluates a yield statement, it will be interpreted as a *generator*, and a generator object will be returned. Then if used as an iterator (ie. in a for loop) the first value of the iterator will be the value after `yield`, then it will continue until `yield` is reached again and this will be the second value of the iterator. When the function stops reaching `yield` (ie. when it stops) the iteration will stop. So as a simple example of generating integers between 1 and 100 you can do:

In [7]:
def integer_generator():
    i = 1
    while i <= 100:
        yield i
        i += 1

gen = integer_generator()

for curr_int in gen:
    print (curr_int)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100


I emphasize that this is just a simple example, for such a case the `range` function (which is actually a generator!) should be used. But you can make this an infinite integer generator. The following will just keep generating integers until you terminate the process (kernel->interrupt will stop this):

In [8]:
def integer_generator():
    i = 1
    while 1:
        yield i
        i += 1

gen = integer_generator()

for curr_int in gen:
    print (curr_int)
    # I have to include a break here or the HTML generation code gets stuck!!
    if curr_int > 67:
        break

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68


We can also use the `next` command, which will simply obtain the next value from the generator. So in the previous case we can do:

In [9]:
def integer_generator():
    i = 1
    while 1:
        yield i
        i += 1

gen = integer_generator()

print (next(gen))
print (next(gen))
print (next(gen))
print (next(gen))
print (next(gen))

1
2
3
4
5
