# Iterator Pattern

- Topics
    - what design patterns are
    - the iterator protocol
    - list, set, and dictionary comprehensions
    - generator functions, and how they build on other patterns

## Design patterns

- analogy is engineers and architects designing to build bridges, towers, buildints and physical structures
    - they follow certain principles to ensure structural integrity
- design patterns are an attempt to bring this formal definition to software engineering
- design patterns are applied to solve a common problem faced by developers in some specific situations
    - a suggestion as to to the ideal solution for a problem, in terms of OOD
- central to a pattern is that it is reused often in unique contexts
    - one clever solution is a good idea
    - two similar solutions might be a coincidence
    - three or more reuses of an idea starts to look like a repeating pattern
- knowing design patterns and choosing to use them in our software doesn't guarantee a *correct* solution
    - in 1907, the Quebec Bridge collapsed before construction was completed
   
## Iterators

- one of the most powerful design patterns
- for loop (the foundational concept of programming) is a light weight wrapper around a set of object-oriented principles
- **iterator** is an object with a `next()` and a `done()` methods
- general idea of iterators without using iterators is as follows:

```python
while not iterator.done():
    item = iterator.next()
    # do something with the item
```

- iterators are objects with `__next__()` method that is called using `next(iterator)` built-in
- raises `StopIteration` exception to notify the client that the iterator has completed

## Iterator protocol

- `Iterator` abstract base class, in the `collections.abc` module, defines the *iterator* protocol
- the following diagram depicts the hierarchy of iterator protocol:
![Iterator Protocol](resources/iterator_abstraction.png)

In [2]:
print(range(20))

range(0, 20)


In [3]:
for i in range(1, 20, 2):
    print(i)

1
3
5
7
9
11
13
15
17
19


In [4]:
# example of how an iterable is implemented in a detailed and verbose way...
from typing import Iterable, Iterator

# create an Iterable class that we can iterate over
class CapitalIterable(Iterable[str]):
    def __init__(self, string: str) -> None:
        self.string = string
        
    def __iter__(self) -> Iterator[str]:
        return CapitalIterator(self.string)

In [5]:
class CapitalIterator(Iterator[str]):
    def __init__(self, string: str) -> None:
        # for char of each word is capitalized
        self.words = [w.capitalize() for w in string.split()]
        self.index = 0
        
    def __next__(self) -> str:
        if self.index == len(self.words):
            raise StopIteration()
        
        word = self.words[self.index]
        self.index += 1
        return word

In [9]:
iterable = CapitalIterable('the quick brown fox jumps over the lazy dog')

In [10]:
iterator = iter(iterable)

In [11]:
while True:
    try:
        print(next(iterator))
    except StopIteration:
        break

The
Quick
Brown
Fox
Jumps
Over
The
Lazy
Dog


In [12]:
# behind the scene, for loop uses iterator OOD pattern
for word in iterable:
    print(word)

The
Quick
Brown
Fox
Jumps
Over
The
Lazy
Dog


## Comprehensions
- comphrehensions are simple, but powerful, syntaxes that allow us to transform or filter an iterable object
- the result can be list, set, or dictionary or a *generator expression*

### List comprehension
- one of the most powerful and fundamental tools in Python
- can be very useful in solving a lot of real-world software applications as well as Kattis problems

In [13]:
numbers = input('Enter some integers separated by space: ')

Enter some numbers separated by space: 1 2 10 15 11


In [14]:
numbers

'1 2 10 15 11'

In [15]:
lst_numbers = numbers.split()

In [17]:
lst_numbers

['1', '2', '10', '15', '11']

In [19]:
lst_integers = [int(num) for num in lst_numbers]

In [20]:
lst_integers

[1, 2, 10, 15, 11]

In [22]:
# another technique using map iterators
integers = list(map(int, lst_numbers))

In [23]:
integers

[1, 2, 10, 15, 11]

## Generator expressions

- sometimes we want to process a large sequence without creating a new list, set or dictionary data structures in memory
    - having all the contents in some data structure in memory may not be needed
    - it can be CPU and memory intensive
- e.g., processing a large log file and looking for specific information
    - finding max, min, range, average, etc. of a large list of numbers
- generator expression uses the same syntax as comphrehensions; but they don't create a final container object
    - called **lazy** evaluation; reluctantly produce values on demand
- to create a generator expression, warp the comprehension in `( )` INSTEAD of `[ ]` or `{ }`
- wrapping a generator in `{ }` creates a set
- warpping a generator in `{ : }` creaes a dict
- warpping a generator in `[ ]` creates a list
- wrapping a generator in `( )` creates a generator expression NOT a tuple

In [24]:
numbers = ['1', '10', '5', '5', '11', '1']

In [25]:
set_ints = {int(n) for n in numbers}

In [26]:
set_ints

{1, 5, 10, 11}

In [28]:
# converting a list into dictionary
# create a dictionary using index as key and value as value
some_dict = {i: v for i, v in enumerate(numbers)}

In [29]:
some_dict

{0: '1', 1: '10', 2: '5', 3: '5', 4: '11', 5: '1'}

In [40]:
# creating a generator expression
int_gen = (int(n) for n in numbers)

In [41]:
int_gen

<generator object <genexpr> at 0x7fbc565c6f80>

In [42]:
# let's get the next integer
next(int_gen)

1

In [43]:
for n in int_gen:
    print(n, n**2)

10 100
5 25
5 25
11 121
1 1


In [4]:
# let's look at a more useful example
from pathlib import Path

In [5]:
! cat data/system.log

Apr 05, 2021 20:03:29 DEBUG This is a debugging message.
Apr 05, 2021 20:03:41 INFO This is an information method.
Apr 05, 2021 20:04:05 INFO Here's some information.
Apr 05, 2021 20:04:17 DEBUG Debug messages are only useful if you want
to figure something out.
Apr 05, 2021 20:04:29 INFO Information is usually harmless, but
helpful.


In [6]:
full_log_path = Path.cwd() / "data" / "system.log"

In [7]:
# create a log file containing power keyword
warn_log_path = Path.cwd() / "data" / "warn.log"

In [8]:
full_log_path

PosixPath('/Users/rbasnet/Sp23/OOP/Python-Object-Oriented-Programming/data/system.log')

In [9]:
with full_log_path.open() as source:
    power_lines = (line for line in source if 'WARN' in line)
    with warn_log_path.open('w') as target:
        for line in power_lines:
            target.write(line)

In [10]:
! cat data/warn.log



## Generator functions

- generator functions embody the essential features of a generator expression, which is the generalization of a comprehension
- use regular expression to parse line into different columns of values
- special functions that return an iterator which returns a stream of values
- resumable functions that can retain local variable information to resume the function where it left off
- uses `yield` keyword to yield the next value as opposed to `return`
    - when `yield` is reached, the generator's state of execution is suspended and local variables are preserved
- `next(genObject)` calls the built_in `__next__()` method to resume executing, when the function is called again
- similar in concept to `range()`, however it returns list iterator object 

In [11]:
def generate_ints(N):
    for i in range(N):
        yield i # this makes the function a generator!

In [12]:
gen = generate_ints(10)

In [13]:
print(gen)

<generator object generate_ints at 0x7f917bf75e70>


In [14]:
next(gen)

0

In [15]:
next(gen)

1

In [16]:
# iterate over the next of the values
for n in gen:
    print(n)

2
3
4
5
6
7
8
9


In [62]:
# function reads full_log_path and writes all the WARNING logs to warning_log_path
import csv
import re
from pathlib import Path

from typing import Match, cast

# regular function without iterator
def extract_and_parse(
        full_log_path: Path, warning_log_path: Path
        ) -> None:
    with warning_log_path.open("w") as target:
        writer = csv.writer(target, delimiter="\t")
        pattern = re.compile(
            # Apr 05, 2021 20:04:41
            r"(\w\w\w \d\d, \d\d\d\d \d\d:\d\d:\d\d) (\w+) (.*)")
        with full_log_path.open() as source:
            for line in source:
                if "WARN" in line:
                    line_groups = cast(Match[str], pattern.match(line)).groups()
                    writer.writerow(line_groups)
                        

In [63]:
# create a log file containing WARNING keyword
warn1_log_path = Path.cwd() / "data" / "warn1.log"

In [64]:
extract_and_parse(full_log_path, warn1_log_path)

In [65]:
! cat data/warn1.log



In [74]:
from typing import Iterable, Sequence, Iterator

# generator function
def warnings_filter(source: Iterable[str]) -> Iterator[Sequence[str]]:
    pattern = re.compile(
        r"(\w\w\w \d\d, \d\d\d\d \d\d:\d\d:\d\d) (\w+) (.*)")
    for line in source:
        if match := pattern.match(line):
            if "WARN" in match.group(2):
                yield match.groups()

In [75]:
for t in warnings_filter(full_log_path.open()):
    print(t)



In [76]:
# the same can be achieved by comprehension short-cut
# generator expression
source = full_log_path.open()
pattern = re.compile(
        r"(\w\w\w \d\d, \d\d\d\d \d\d:\d\d:\d\d) (\w+) (.*)")
warnings_filter_gen = (
    tuple(cast(Match[str], pattern.match(line)).groups())
    for line in source
    if "WARN" in line
)

In [77]:
next(warnings_filter_gen)



In [78]:
next(warnings_filter_gen)



In [79]:
for t in warnings_filter_gen:
    print(t)

