# Pagination with iterators and generators

We'll simulate fetching data from a server, with a 5-record per page limit.

First we need some data.

In [1]:
# characters in the movie versions of "Dune", and their respective actors
dune_casts = [
{"character": "Alia Atreides", "actor_1984": "Alicia Witt", "actor_2000": "Laura Burton", "actor_2021": "Anya Taylor-Joy"},
{"character": "Baron Vladimir Harkonnen", "actor_1984": "Kenneth McMillan", "actor_2000": "Ian McNeice", "actor_2021": "Stellan Skarsg\u00e5rd"},
{"character": "Chani", "actor_1984": "Sean Young", "actor_2000": "Barbora Kodetov\u00e1", "actor_2021": "Zendaya"},
{"character": "Dr. Liet-Kynes", "actor_1984": "Max von Sydow", "actor_2000": "Karel Dobr\u00fd", "actor_2021": "Sharon Duncan-Brewster"},
{"character": "Dr. Yueh", "actor_1984": "Dean Stockwell", "actor_2000": "Robert Russell", "actor_2021": "Chen Chang"},
{"character": "Duke Leto Atreides", "actor_1984": "J\u00fcrgen Prochnow", "actor_2000": "William Hurt", "actor_2021": "Oscar Isaac"},
{"character": "Duncan Idaho", "actor_1984": "Richard Jordan", "actor_2000": "James Watson", "actor_2021": "Jason Momoa"},
{"character": "Feyd-Rautha Harkonnen", "actor_1984": "Sting", "actor_2000": "Matt Keeslar", "actor_2021": "Austin Butler"},
{"character": "Glossu Beast Rabban", "actor_1984": "Paul L. Smith", "actor_2000": "L\u00e1szl\u00f3 I. Kish", "actor_2021": "Dave Bautista"},
{"character": "Gurney Halleck", "actor_1984": "Patrick Stewart", "actor_2000": "P.H. Moriarty", "actor_2021": "Josh Brolin"},
{"character": "Harah", "actor_1984": "Molly Wryn", "actor_2000": "...", "actor_2021": "Gloria Obianyo"},
{"character": "Jamis", "actor_1984": "Judd Omen", "actor_2000": "Christopher Lee Brown", "actor_2021": "Babs Olusanmokun"},
{"character": "Lady Jessica Atreides", "actor_1984": "Francesca Annis", "actor_2000": "Saskia Reeves", "actor_2021": "Rebecca Ferguson"},
{"character": "Otheym", "actor_1984": "Honorato Magaloni", "actor_2000": "Jakob Schwarz", "actor_2021": "..."},
{"character": "Padishah Emperor Shaddam IV", "actor_1984": "Jos\u00e9 Ferrer", "actor_2000": "Giancarlo Giannini", "actor_2021": "Christopher Walken"},
{"character": "Paul Atreides", "actor_1984": "Kyle MacLachlan", "actor_2000": "Alec Newman", "actor_2021": "Timoth\u00e9e Chalamet"},
{"character": "Piter De Vries", "actor_1984": "Brad Dourif", "actor_2000": "Jan Unger", "actor_2021": "David Dastmalchian"},
{"character": "Princess Irulan", "actor_1984": "Virginia Madsen", "actor_2000": "Julie Cox", "actor_2021": "Florence Pugh"},
{"character": "Reverend Mother Gaius Helen Mohiam", "actor_1984": "Si\u00e2n Phillips", "actor_2000": "Zuzana Geislerov\u00e1", "actor_2021": "Charlotte Rampling"},
{"character": "Reverend Mother Ramallo", "actor_1984": "Silvana Mangano", "actor_2000": "Drahom\u00edra Fialkov\u00e1", "actor_2021": "Giusi Merli"},
{"character": "Shadout Mapes", "actor_1984": "Linda Hunt", "actor_2000": "Jaroslava Siktancova", "actor_2021": "Golda Rosheuvel"},
{"character": "Stilgar", "actor_1984": "Everett McGill", "actor_2000": "Uwe Ochsenknecht", "actor_2021": "Javier Bardem"},
{"character": "Thufir Hawat", "actor_1984": "Freddie Jones", "actor_2000": "Jan Vlas\u00e1k", "actor_2021": "Stephen McKinley Henderson"}
]


This function simulates the server for this data. It returns the records in 5-record pages, and uses the Python `yield` statement to return each page in turn. Python 3.12 added the `batched` method to the standard library's `itertools` module - this method is itself a generator, that yields each 5-record batch of data.

In [2]:
def server_pages():
    """
    Return all records in pages of 5 at a time
    """
    import itertools

    for page in itertools.batched(dune_casts, 5):
        print(f"Server: send page of {len(page)} records")
        yield page

Just calling this function doesn't do anything yet, it just gives us a generator:

In [3]:
pages = server_pages()
print(pages)
print(type(pages))

<generator object server_pages at 0x000001E911560F40>
<class 'generator'>


We can manually iterate over a generator using `next()`.

In [4]:
print("First page")
print(next(pages))

print("Second page")
print(next(pages))

First page
Server: send page of 5 records
({'character': 'Alia Atreides', 'actor_1984': 'Alicia Witt', 'actor_2000': 'Laura Burton', 'actor_2021': 'Anya Taylor-Joy'}, {'character': 'Baron Vladimir Harkonnen', 'actor_1984': 'Kenneth McMillan', 'actor_2000': 'Ian McNeice', 'actor_2021': 'Stellan Skarsgård'}, {'character': 'Chani', 'actor_1984': 'Sean Young', 'actor_2000': 'Barbora Kodetová', 'actor_2021': 'Zendaya'}, {'character': 'Dr. Liet-Kynes', 'actor_1984': 'Max von Sydow', 'actor_2000': 'Karel Dobrý', 'actor_2021': 'Sharon Duncan-Brewster'}, {'character': 'Dr. Yueh', 'actor_1984': 'Dean Stockwell', 'actor_2000': 'Robert Russell', 'actor_2021': 'Chen Chang'})
Second page
Server: send page of 5 records
({'character': 'Duke Leto Atreides', 'actor_1984': 'Jürgen Prochnow', 'actor_2000': 'William Hurt', 'actor_2021': 'Oscar Isaac'}, {'character': 'Duncan Idaho', 'actor_1984': 'Richard Jordan', 'actor_2000': 'James Watson', 'actor_2021': 'Jason Momoa'}, {'character': 'Feyd-Rautha Harkonn

Note that the `server_pages()` method doesn't print out the message "send page of 5 records" until the client pulls another page by iterating the generator using `next()`.

If we keep calling next(), we eventually hit the end of the generator, at which point Python will raise the `StopIteration` exception.

In [5]:
print(next(pages))
print(next(pages))
print(next(pages))

# this is one iteration too far
print(next(pages))


Server: send page of 5 records
({'character': 'Harah', 'actor_1984': 'Molly Wryn', 'actor_2000': '...', 'actor_2021': 'Gloria Obianyo'}, {'character': 'Jamis', 'actor_1984': 'Judd Omen', 'actor_2000': 'Christopher Lee Brown', 'actor_2021': 'Babs Olusanmokun'}, {'character': 'Lady Jessica Atreides', 'actor_1984': 'Francesca Annis', 'actor_2000': 'Saskia Reeves', 'actor_2021': 'Rebecca Ferguson'}, {'character': 'Otheym', 'actor_1984': 'Honorato Magaloni', 'actor_2000': 'Jakob Schwarz', 'actor_2021': '...'}, {'character': 'Padishah Emperor Shaddam IV', 'actor_1984': 'José Ferrer', 'actor_2000': 'Giancarlo Giannini', 'actor_2021': 'Christopher Walken'})
Server: send page of 5 records
({'character': 'Paul Atreides', 'actor_1984': 'Kyle MacLachlan', 'actor_2000': 'Alec Newman', 'actor_2021': 'Timothée Chalamet'}, {'character': 'Piter De Vries', 'actor_1984': 'Brad Dourif', 'actor_2000': 'Jan Unger', 'actor_2021': 'David Dastmalchian'}, {'character': 'Princess Irulan', 'actor_1984': 'Virginia

StopIteration: 

At this point, all future calls to `next()` on the `pages` generator will continue to raise `StopIteration`.

In [6]:
print(next(pages))

StopIteration: 

But manually iterating over all the pages using explicit calls to next() is more often done using a basic for-loop. Python for loops that loop over a generator will:

    - implicitly call next() for each loop
    - catch the trailing StopIteration exception as an indication to stop looping and move on to the next statement to execute



In [8]:
# start by getting a new generator, since we've used up the previous one
pages = server_pages()

for page in server_pages():
    print(f"Client: got a page containing {len(page)} pages")
    print(f"Characters: {', '.join(rec['character'] for rec in page)}")


Server: send page of 5 records
Client: got a page containing 5 pages
Characters: Alia Atreides, Baron Vladimir Harkonnen, Chani, Dr. Liet-Kynes, Dr. Yueh
Server: send page of 5 records
Client: got a page containing 5 pages
Characters: Duke Leto Atreides, Duncan Idaho, Feyd-Rautha Harkonnen, Glossu Beast Rabban, Gurney Halleck
Server: send page of 5 records
Client: got a page containing 5 pages
Characters: Harah, Jamis, Lady Jessica Atreides, Otheym, Padishah Emperor Shaddam IV
Server: send page of 5 records
Client: got a page containing 5 pages
Characters: Paul Atreides, Piter De Vries, Princess Irulan, Reverend Mother Gaius Helen Mohiam, Reverend Mother Ramallo
Server: send page of 3 records
Client: got a page containing 3 pages
Characters: Shadout Mapes, Stilgar, Thufir Hawat


It would be good on the client side to be able to hide the actual pagination structure, and make this look like just a continuous stream of records. So we can write a client-side generator method that, for each page received, unpacks that page and yields each record to its caller.

In [9]:
def client_get_records():
    """
    Convert pages into a continuous stream of records.
    """
    for page in server_pages():
        print(f"Client: got a page containing {len(page)} records")
        for rec in page:
            yield rec


`client_get_records` is its own generator function, and it will send all the records in each page, and in turn get a new page when the current page is exhausted.

In [11]:
for rec in client_get_records():
    print(rec["character"])


Server: send page of 5 records
Client: got a page containing 5 records
Alia Atreides
Baron Vladimir Harkonnen
Chani
Dr. Liet-Kynes
Dr. Yueh
Server: send page of 5 records
Client: got a page containing 5 records
Duke Leto Atreides
Duncan Idaho
Feyd-Rautha Harkonnen
Glossu Beast Rabban
Gurney Halleck
Server: send page of 5 records
Client: got a page containing 5 records
Harah
Jamis
Lady Jessica Atreides
Otheym
Padishah Emperor Shaddam IV
Server: send page of 5 records
Client: got a page containing 5 records
Paul Atreides
Piter De Vries
Princess Irulan
Reverend Mother Gaius Helen Mohiam
Reverend Mother Ramallo
Server: send page of 3 records
Client: got a page containing 3 records
Shadout Mapes
Stilgar
Thufir Hawat


`client_get_records()` can yield each page-worth of records using a single statement instead of a for-loop:


In [12]:
def client_get_records():
    """
    Convert pages into a continuous stream of records.
    """
    for page in server_pages():
        print(f"Client: got a page containing {len(page)} records")
        yield from page

In [13]:
for rec in client_get_records():
    print(rec["character"])

Server: send page of 5 records
Client: got a page containing 5 records
Alia Atreides
Baron Vladimir Harkonnen
Chani
Dr. Liet-Kynes
Dr. Yueh
Server: send page of 5 records
Client: got a page containing 5 records
Duke Leto Atreides
Duncan Idaho
Feyd-Rautha Harkonnen
Glossu Beast Rabban
Gurney Halleck
Server: send page of 5 records
Client: got a page containing 5 records
Harah
Jamis
Lady Jessica Atreides
Otheym
Padishah Emperor Shaddam IV
Server: send page of 5 records
Client: got a page containing 5 records
Paul Atreides
Piter De Vries
Princess Irulan
Reverend Mother Gaius Helen Mohiam
Reverend Mother Ramallo
Server: send page of 3 records
Client: got a page containing 3 records
Shadout Mapes
Stilgar
Thufir Hawat


Generators can be economical on memory, since they don't instantiate a result until the caller pulls from the generator's yield statement. Generators can be passed around as well, or use for filters - this is a more functional style of programming.