## Recap: Iteration and Sequences

RECAP: We started with sequences...

In [2]:
import reprlib
class Sentence:
    def __init__(self, text): 
        self.text = text
        self.words = text.split()
        
    def __getitem__(self, index):
        return self.words[index] 
    
    def __len__(self):
        #completes sequence protocol, but not needed for iterable
        return len(self.words) 
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

In [3]:
a= Sentence("Mary had a little lamb whose fleece was white as snow.")
len(a), a[3], a

(11, 'little', Sentence('Mary had a l...hite as snow.'))

Sentence is an *iterable* object, so it can be used to build lists, for example:

In [4]:
list(a)

['Mary',
 'had',
 'a',
 'little',
 'lamb',
 'whose',
 'fleece',
 'was',
 'white',
 'as',
 'snow.']

To iterate over an object x, python automatically calls `iter(x)`. An **iterable** is something which, when `iter` is called on it, returns an **iterator**.

1. if `__iter__` is defined, calls that to implement an iterator.
2. if not  `__getitem__` starting from index 0
3.  otherwise raise `TypeError`

Any Python sequence is iterable because they implement __getitem__. The standard sequences also implement __iter__; for future proofing you should too because (2) might be deprecated in a future version of python.

Remember the basic idea: that you can access elements in an array by POSITION, without ever indexing. In a similar but not-identical fashion, one can simply follow the next pointers to the next POSITION in a linked list. So we abstract position or pointer to an iterator, and treat arrays and linked lists with an identical interface. The salient points of this abstraction are:

- `next`...the notion of a next thing, and
- dereferencing works as long as u didnt hit the end: a `StopIteration`.

Remember how for loops work:

In [5]:
for item in a:
    print(item)

Mary
had
a
little
lamb
whose
fleece
was
white
as
snow.


is equivalent to:

In [6]:
it = iter(a) #it is an iterator
while True:
    try:
        nextval = next(it)
        print(nextval)
    except StopIteration:
        del it
        break

Mary
had
a
little
lamb
whose
fleece
was
white
as
snow.


**EVERY collection in Python is iterable.**

Lets pause to let that sink in.

We have already seen iterators are used to make `for` loops.

They are also used to 

- make other collections
- to loop over a file line by line from disk
- in the making of list, dict, and set comprehensions
- in unpacking tuples
- in parameter unpacking in function calls (`*args` syntax)


An **iterator** defines both `__iter__` and a `__next__` (the first one is only required to make sure an iterator IS an iterable).

Calling `next` on an iterator will trigger the calling of `__next__`. Above, `__next__` is not defined, and `__getitem__` is used.

In [7]:
it=iter(a)
next(it)

'Mary'

In [8]:
next(it)

'had'

now we can completely abstract away a sequence in favor an iterable (ie we dont need to support indexing anymore). From Fluent:

In [10]:
class SentenceIterator:
    def __init__(self, words): 
        self.words = words 
        self.index = 0
        
    def __next__(self): 
        try:
            word = self.words[self.index] 
        except IndexError:
            raise StopIteration() 
        self.index += 1
        return word 

    def __iter__(self):
        return self

class Sentence:#an iterable
    def __init__(self, text): 
        self.text = text
        self.words = text.split()
        
    def __iter__(self):
        return SentenceIterator(self.words)
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

In [12]:
s2 = Sentence("While we could have implemented `__next__` in Sentence itself, making it an iterator, we will run into the problem of 'exhausting an iterator'.")
s2it=iter(s2)
print(next(s2it))
s2it2=iter(s2)
next(s2it),next(s2it2)

While


('we', 'While')

In [14]:
list(s2it)

['could',
 'have',
 'implemented',
 '`__next__`',
 'in',
 'Sentence',
 'itself,',
 'making',
 'it',
 'an',
 'iterator,',
 'we',
 'will',
 'run',
 'into',
 'the',
 'problem',
 'of',
 "'exhausting",
 'an',
 "iterator'."]

SO FAR: Iterator: retrieves items from a collection. The collection must implement `__iter__`. 

## Yield and generators

A generator function looks like a normal function, but instead of `return`ing values, it `yield`s them. The syntax is (unfortunately) the same otherwise.

Unfortunate as a generator is a different beast. When the function runs, it creates a generator.

**The generator is an iterator.**. It gets an internal implementation of `__next__` and `__iter__`, almost magically.

When `next` is called on it, the function goes until the first `yield`. The function body is now suspended and the value in the yield is then passed to the calling scope as the outcome of the `next`.

When `next` is called again, it gets `__next__` called again in the generator, ad the next value is yielded..., and so on...

...until we reach the end of the function, the return of which creates a `StopIteration` in `next`.


Any Python function that has the yield keyword in its body is a generator function.

In [72]:
def gen123():
    print("Hi")
    yield 1
    print("Little")
    yield 2
    print("Baby")
    yield 3

In [26]:
gen123, type(gen123)

(<function __main__.gen123>, function)

In [27]:
g = gen123()
type(g)

generator

In [28]:
#a generator is an iterator
g.__iter__

<method-wrapper '__iter__' of generator object at 0x1051ab9e8>

In [29]:
g.__next__

<method-wrapper '__next__' of generator object at 0x1051ab9e8>

In [30]:
next(g)

Hi


1

In [31]:
next(g),next(g)

Little
Baby


(2, 3)

In [32]:
next(g)

StopIteration: 

In [33]:
for i in gen123():
    print(i)

Hi
1
Little
2
Baby
3


Fluent recommends language to use, which I like:

>I find it helpful to be strict when talking about the results ob‐ tained from a generator: I say that a generator yields or produces values.

In [35]:
class Sentence:#an iterable
    def __init__(self, text): 
        self.text = text
        self.words = text.split()
        
    def __iter__(self):#one could also return iter(self.words)
        for w in self.words:#note this is implicitly making an iter from the list
            yield w
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

In [36]:
a = Sentence("the mad dog went home to his cat")

In [38]:
for w in a:
    print(w)

the
mad
dog
went
home
to
his
cat


## Streams and Lazy evaluation

Upto now, it might just seem that we have just represented existing sequences in a different fashion. But notice above, with the use of `yield`, that we do not have to define the entire sequence ahead of time. Indeed we talked about this a bit when we talked about iterators, but we can see this "lazy behavior" more explicitly now. We see it in the generation of infinite sequences, where there is no data per se!

So, because of generators, we can go from fetching items from a collection to "generate"ing iteration over arbitrary, possibly infinite series...

In [34]:
def fibonacci(): 
    i,j=0,1 
    while True: 
        yield j
        i,j=j,i+j

In [35]:
f = fibonacci()
for i in range(10):
    print(next(f))

1
1
2
3
5
8
13
21
34
55


In [38]:
f = fibonacci()
counter = 0
for i in f:#calls iter on itself but thats ok.
    print(i)
    counter += 1
    if counter > 9:
        break

1
1
2
3
5
8
13
21
34
55


### Lazy Evaluation and Streams

The basic idea in progressions such as the above is that the next element in the progression (the next fibonacci) is not computed until requested. This can also be achieved via Streams

A stream is a lazily computed linked list. Remember how we created a first and a rest in linked lists? now we do the same, except that we compute the "rest" only when we access it. Basically we store a function to compute, and the state of our iteration, and compute-on-access

This example is taken from "Composing Programs":

In [40]:
class Stream:
    """A lazily computed linked list."""
    class empty:
        def __repr__(self):
            return 'Stream.empty'
    
    def __init__(self, first, compute_rest=empty):
        assert callable(compute_rest), 'compute_rest must be callable.'
        self.first = first
        self._compute_rest = compute_rest
        
    @property
    def rest(self):
        """Return the rest of the stream, computing it if necessary."""
        if self._compute_rest is not None:
            self._rest = self._compute_rest()
            self._compute_rest = None
        return self._rest
    
    def __repr__(self):
        return 'Stream({0}, <...>)'.format(repr(self.first))

In [41]:
s = Stream(0, lambda: Stream(2, lambda: Stream(4)))
s

Stream(0, <...>)

In [44]:
print(s.first, "\n",
s.rest, "\n",
s.rest.first, "\n", 
s.rest.rest, "\n", 
s.rest.rest.first, "\n", 
s.rest.rest.rest)

0 
 Stream(2, <...>) 
 2 
 Stream(4, <...>) 
 4 
 Stream.empty


The same thing can be done with a generator function:

In [46]:
def progression(begin, end=None): #could be done as __iter__ in a class
    result = begin 
    forever = end is None
    index = 0
    while forever or result < end: 
        yield result
        index += 1
        result = begin + 2 * index
ag = progression(0, 20)
list(ag)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [47]:
ag

<generator object progression at 0x1051abe08>

### But we do have state

A generator is stateful, has a point in the iteration, but a stream does not. On the other hand a stream uses a lot of the call stack

In [49]:
next(ag)

StopIteration: 

In [50]:
def get_evens(start):
    def compute_rest():
        return get_evens(start + 2)
    return Stream(start, compute_rest)

In [51]:
e=get_evens(0)
e.first, e.rest, e.rest.first, e.rest.rest

(0, Stream(2, <...>), 2, Stream(4, <...>))

Notice the stream does not have state like the generator: the same instance can be used again

In [29]:
e.first, e.rest, e.rest.first, e.rest.rest

(0, Stream(2, <...>), 2, Stream(4, <...>))

You can implement map and filter on streams, as on iterators

In [6]:
def map_stream(fn, s):
    if s is Stream.empty:
        return s
    def compute_rest():
        return map_stream(fn, s.rest)
    return Stream(fn(s.first), compute_rest)

In [30]:
e2=map_stream(lambda x: x*x, e)
e2.first, e2.rest, e2.rest.first, e2.rest.rest

(0, Stream(4, <...>), 4, Stream(16, <...>))

In [7]:
def filter_stream(fn, s):
        if s is Stream.empty:
            return s
        def compute_rest():
            return filter_stream(fn, s.rest)
        if fn(s.first):
            return Stream(s.first, compute_rest)
        else:
            return compute_rest()

In [31]:
e3=filter_stream(lambda x: x%4==0, e)
e3.first, e3.rest, e3.rest.first, e3.rest.rest

(0, Stream(4, <...>), 4, Stream(8, <...>))

### Lazy implementation for Sequences using generators

Despite all our talk of lazy implementation, our Sentence implementations so far have not been lazy because the __init__ eagerly builds a list of all words in the text, binding it to the self.words attribute. This will entail processing the entire text, and the list may use as much memory as the text itself

In [53]:
import re
WORD_REGEXP = re.compile('\w+')
class Sentence:#an iterable
    def __init__(self, text): 
        self.text = text
        
    def __iter__(self):
        for match in WORD_REGEXP.finditer(self.text):
            yield match.group()
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

In [54]:
list(Sentence("the mad dog went home to his cat"))

['the', 'mad', 'dog', 'went', 'home', 'to', 'his', 'cat']

### Generator Expressions of data sequences.

There is an even simpler way: use a generator expression, which is just a lazy version of a list comprehension. (itrs really just sugar for a generator function, but its a nice bit of sugar)


In [55]:
RE_WORD = re.compile('\w+')
class Sentence:#an iterable
    def __init__(self, text): 
        self.text = text
        
    def __iter__(self):#one could also return iter(self.words)
        return (match.group() for match in RE_WORD.finditer(self.text))
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
list(Sentence("the mad dog went home to his cat"))

['the', 'mad', 'dog', 'went', 'home', 'to', 'his', 'cat']

Which syntax to choose?

Write a generator function if the code takes more than 2 lines.

Some syntax that might trip you up: double brackets are not necessary

In [56]:
(i*i for i in range(5))

<generator object <genexpr> at 0x1051abd00>

In [57]:
list((i*i for i in range(5)))

[0, 1, 4, 9, 16]

In [58]:
list(i*i for i in range(5))

[0, 1, 4, 9, 16]

### yield from

This syntax can be used to combine iterators

In [8]:
def mychain(*iterables):
    for it in iterables:
        for i in it: 
            yield i
        

In [9]:
def chain(*iterables):
    for it in iterables:
        yield from it# for i in it: yield i

In [10]:
s="ABC"
l=[1,2,3]
chain(s,l)

<generator object chain at 0x105180d00>

In [11]:
list(mychain(s,l))

['A', 'B', 'C', 1, 2, 3]

In [12]:
list(chain(s,l))

['A', 'B', 'C', 1, 2, 3]

### Functools, itertools and reduction functions

Just as we defined `map` and `filter` for streams, such definitions exist for iterators.

Go look at the documentation for `functools` and `itertools`. There are many functions there, like reduce, that are useful on ieterators and iterables, and many common patterns of usage. Here we just showcase some

In [59]:
r = range(1000)
type(r)

range

In [60]:
map(lambda x: x*x, range(10))

<map at 0x1051c6c50>

In [61]:
list(map(lambda x: x*x, range(10)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [62]:
filter(lambda x: x % 2, range(10))

<filter at 0x1051c19e8>

In [63]:
list(filter(lambda x: x % 2, range(10)))

[1, 3, 5, 7, 9]

In [64]:
import itertools
itertools.islice(r,4)

<itertools.islice at 0x1051bba98>

In [65]:
list(itertools.islice(r,4))

[0, 1, 2, 3]

In [66]:
list(itertools.takewhile(lambda x: x<5, range(10)))

[0, 1, 2, 3, 4]

In [67]:
list(itertools.dropwhile(lambda x: x<5, range(10)))

[5, 6, 7, 8, 9]

In [68]:
list(itertools.accumulate(itertools.islice(r,40)))

[0,
 1,
 3,
 6,
 10,
 15,
 21,
 28,
 36,
 45,
 55,
 66,
 78,
 91,
 105,
 120,
 136,
 153,
 171,
 190,
 210,
 231,
 253,
 276,
 300,
 325,
 351,
 378,
 406,
 435,
 465,
 496,
 528,
 561,
 595,
 630,
 666,
 703,
 741,
 780]

In [69]:
list(itertools.accumulate(itertools.islice(r,40), min))

[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0]

In [70]:
list(enumerate(range(10)))

[(0, 0),
 (1, 1),
 (2, 2),
 (3, 3),
 (4, 4),
 (5, 5),
 (6, 6),
 (7, 7),
 (8, 8),
 (9, 9)]

In [71]:
list(itertools.groupby('LLLLAAGGG'))

[('L', <itertools._grouper at 0x1051c1c50>),
 ('A', <itertools._grouper at 0x1051c1208>),
 ('G', <itertools._grouper at 0x1051c1ef0>)]