<a href="https://colab.research.google.com/github/farshidbalan/FluentPython/blob/master/Chapter14.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Iterables, Iterators, and Generators

## Sentence Take #1: A Sequence of Words

We’ll start our exploration of iterables by implementing a Sentence class: you give its
constructor a string with some text, and then you can iterate word by word. The first
version will implement the sequence protocol, and it’s iterable because all sequences are
iterable, as we’ve seen before, but now we’ll see exactly why.

In [0]:
# Example 14-1 shows a Sentence class that extracts words from a text by index.
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text) # re.findall returns a list with all nonoverlapping matches of the regular
                                      # expression, as a list of strings.
    
  def __getitem__(self, index):
    return self.words[index]  # self.words holds the result of .findall, so we simply return the word at the given index
  
  def __len__(self):
    return len(self.words) # To complete the sequence protocol, we implement __len__—but it is not needed to make an iterable object.
  
  def __repr__(self):
    return f'Sentence({reprlib.repr(self.text)})' # reprlib.repr is a utility function to generate abbreviated string representations
                                                  # of data structures that can be very large.


In [0]:
s = Sentence('The time has come, the walrus said')
s

Sentence('The time has...e walrus said')

## Why Sequences Are Iterable: The iter Function

Whenever the interpreter needs to iterate over an object x, it automatically calls iter(x).
The iter built-in function:
1. Checks whether the object implements \_\_iter\_\_, and calls that to obtain an iterator.
2. If \_\_iter\_\_ is not implemented, but \_\_getitem\_\_ is implemented, Python creates
an iterator that attempts to fetch items in order, starting from index 0 (zero).
3. If that fails, Python raises TypeError, usually saying “C object is not iterable,” where
C is the class of the target object.

That is why any Python sequence is iterable: they all implement \_\_getitem\_\_. In fact,
the standard sequences also implement \_\_iter\_\_, and yours should too, because the
special handling of \_\_getitem\_\_ exists for backward compatibility reasons and may be
gone in the future (although it is not deprecated as I write this).

### Remark

As of Python 3.4, the most accurate way to check whether an ob‐
ject x is iterable is to call iter(x) and handle a TypeError excep‐
tion if it isn’t. This is more accurate than using isinstance(x,
abc.Iterable), because iter(x) also considers the legacy
\_\_getitem\_\_ method, while the Iterable ABC does not.

## Iterator and Iterable

iterable
Any object from which the iter built-in function can obtain an iterator. Objects
implementing an \_\_iter\_\_ method returning an iterator are iterable. Sequences are always iterable; as are objects implementing a \_\_getitem\_\_ method that takes 0-based indexes.

In [0]:
# If there was no for statement and we had to emulate the for machinery by hand with a while loop, this is what we’d have to write:
s = 'ABC'
it = iter(s)

while True:
  try:
    print(next(it))
  except StopIteration:
    del it
    break

A
B
C


The standard interface for an iterator has two methods:
1. \_\_next\_\_
Returns the next available item, raising StopIteration when there are no more
items.

2. \_\_iter\_\_ Returns self; this allows iterators to be used where an iterable is expected, for
example, in a for loop.

In [0]:
# Example 14-3. abc.Iterator class; extracted from Lib/_collections_abc.py
class Iterator(Iterable):
  
  __slots__ = ()
  @abstractmethod
  def __next__(self):
    # 'Return the next item from the iterator. When exhausted, raise StopIteration'
    raise StopIteration
  
  def __iter__(self):
    return self
  
  @classmethod
  def __subclasshook__(cls, C): understand
    if cls is Iterator:
      if (any("__next__" in B.__dict__ for B in C.__mro__) and
          any("__iter__" in B.__dict__ for B in C.__mro__)):
        return True
    return NotImplemented

### remark

Iterators in Python aren't a matter of type but of protocol. A large and changing number of builtin types implement *some* flavor of
iterator. Don't check the type! Use hasattr to check for both "\_\_iter\_\_" and "\_\_next\_\_" attributes instead.

## Best way to check if an object is iterable

The best way to check if an
object x is an iterator is to call isinstance(x, abc.Iterator).
Thanks to Iterator.\_\_subclasshook\_\_, this test works even if the
class of x is not a real or virtual subclass of Iterator.

## Sentence Take #2: A Classic Iterator

In [0]:
# Example 14-4. sentence_iter.py: Sentence implemented using the Iterator pattern
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
  
  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)
    
  def __repr(self):
    return f'Sentence({reprlib.repr(self.text)})' # The __iter__ method is the only addition to the previous Sentence
                                                  # implementation. This version has no __getitem__, to make it clear that the class
                                                  # is iterable because it implements __iter__
  
  def __iter__(self):
    return SentenceIterator(self.words)  # __iter__ fulfills the iterable protocol by instantiating and returning an iterator.
  
  
  class SentenceIterator(self, words):
    
    def __init__(self, words):
      self.words = words
      self.index = index
      
    def __next__(self):
      try:
        word = self.words[self.index]
        except IndexError:
          raise StopIteration()
        self.index += 1
        return word
      
    def __iter__(self):
      return self

### Remark

An iterable should never act as an iterator over itself. In other
words, iterables must implement \_\_iter\_\_, but not \_\_next\_\_.
On the other hand, for convenience, iterators should be iterable.
An iterator’s \_\_iter\_\_ should just return self

## Sentence Take #3: A Generator Function

In [0]:
#Example 14-5. sentence_gen.py: Sentence implemented using a generator function
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
  
  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)
    
  def __repr__(self):
    return f'Sentence(reprlib.repr(self.text))'
  
  def __iter__(self):
    for word in self.words:  # Iterate over self.word
      yield word             # yield word 
    return

### Remark
Back in the Sentence code in Example 14-4, \_\_iter\_\_ called the SentenceIterator
constructor to build an iterator and return it. Now the iterator in Example 14-5 is in
fact a generator object, built automatically when the \_\_iter\_\_ method is called, because
\_\_iter\_\_ here is a generator function.

In [2]:
def gen_AB():
  print('start')
  yield 'A'  # yield 'A' in the generator function body produces the value A consumed by
             # the for loop, which gets assigned to the c variable and results in the output --> A.
    
  print('continue')  # Iteration continues with a second call next(g), advancing the generator function
                     # body from yield 'A' to yield 'B'. The text continue is output because of the
                     # second print in the generator function body.
  yield 'B'
  print('end')
  
for c in gen_AB():
  print('-->', c)

start
--> A
continue
--> B
end


# Sentence Take #4: A Lazy Implementation