# Iterators

## Resources

 * [programiz.com](https://www.programiz.com/python-programming/iterator)
 * [datacamp.com](https://www.datacamp.com)

## What is Iterator?

**Iterator** is an object which returns data from an **iterable object**, one element at a time. We can run through iterator only once, then we have to reinitialize it.

Technically speaking, to make object iterable we must implement two special methods, `__iter__()` and `__next__()`, collectively called the iterator protocol.

Most of built-in containers in Python like: `tuple`, `list`, `dictionary`, `string` etc. are iterable objects.

We use the `iter()` function or `__iter__()` method to create iterator object from an iterable object. Let's start with one of the simples iterable object - `list`:

In [9]:
lst = [1,2,3,4]
it1 = iter(lst)
it2 = lst.__iter__()

print(it1)
print(it2)

<list_iterator object at 0x000002CE91E35748>
<list_iterator object at 0x000002CE91E35888>


Note, function `iter` or method `__iter__` make iteratos with the same properties, `it1` and `it2` in the example above are similar objects (although not the same).

## How to Iterate Through an Iterator?

We use the `next()` function or `__next__()` method to manually iterate through all the items - one element at a time. When we reach the end of the iterator and there is no more data to be returned, it will raise `StopIteration`:

In [3]:
lst = [1,2,3,4]
it = iter(lst)

print(next(it))
print(next(it))
# or
print(it.__next__())
print(it.__next__())

print(next(it)) # <-- no items left, will raise error `StopIteration`

1
2
3
4


StopIteration: 

**THIS IS IMPORTANT, we can iterate over items of an iterator only once, to repeat the iterative process we have to reinitialize it!**

## Iterators as Function Arguments

There are functions that accept iterator objects as arguments, for example built-in `sum()`:

In [84]:
lst = [1,2,3,4]
it = iter(lst)
print(type(it))
print(sum(it))

<class 'list_iterator'>
10


Note, that `sum()` accepts both, iterable object or iterator object, and produces the same result:

In [85]:
lst = [1,2,3,4]
it = iter(lst)
print(sum(lst))
print(sum(it))

10
10


Be very careful, iterator could be accessed only **once** or we need to reinitialize it:

In [95]:
lst = [1,2,3,4]
it = iter(lst)
print(type(it)) # <-- this is OK, we don't iterate through items of iterator when call `type` function
print("1st access: ", sum(it)) # <-- here we go through all items of iterator till the end and summ them all up
print("2nd access: ", sum(it)) # <-- we already reached end of iterator, no data left

<class 'list_iterator'>
1st access:  10
2nd access:  0


It is frequently convinient to convert iterator object to a `list`, for example to visualize content of it:

In [87]:
lst = [1,2,3,4]
it = iter(lst)
print(type(it))
print(list(it))

<class 'list_iterator'>
[1, 2, 3, 4]


Again, be very careful, **don't access same iterator more then once** or we need to reinitialize it:

In [88]:
lst = [1,2,3,4]
it = iter(lst)
print("1st access: ", list(it))
print("2nd access: ", list(it))

1st access:  [1, 2, 3, 4]
2nd access:  []


## Unpack Iterator

Iterator could be unpacked with asterisk `*` prefix - it returns all elements of iterable object, it is especially useful if we want to print content of iterator:

In [89]:
lst = [1,2,3,4]
it = iter(lst)
print(*it)

1 2 3 4


Again, note that we can do this only **once** or we need to reinitialize iterator:

In [90]:
lst = [1,2,3,4]
it = iter(lst)
print("1st access: ",*it)
print("2nd access",*it)

1st access:  1 2 3 4
2nd access


## Iterating in a Loop

`For` loop allows more short and elegant access to items of an iterable object. Also, we can loop through items of iterable object multiple times since iterator is initialized in first line of the loop every time we call it:

In [96]:
lst = [1,2,3,4]
for i in lst:
    print(i)
# and repeat loop again
for i in lst:
    print(i)

1
2
3
4
1
2
3
4


This loop is actually implemented as follows:

In [6]:
it = iter(lst)
while True:
    try:
        i = next(it)
        print(i)
    except StopIteration:
        break

1
2
3
4


## Iterate over Range

It is frequently convinient to iterate over iterable sequence of numbers produced by built-in function `range()` which returns iterable object:

In [92]:
r = range(10)
it = iter(r)
print(*it)

0 1 2 3 4 5 6 7 8 9


Or do the same in a loop:

In [93]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


## Iterate over Dictionary

If we apply `iter()` to dictionary then we will get only keys, so we need to apply it to `dictionary.items()` to iterate through all pairs:

In [29]:
dct = {'number': 1, "string": "abc"}
it = iter(dct) # <-- note `dct.items()` here
print(*it)

number string


In [30]:
dct = {'number': 1, "string": "abc"}
it = iter(dct.items()) # <-- note `dct.items()` here
print(*it)

('number', 1) ('string', 'abc')


Iterate over dictionary in a loop:

In [32]:
dct = {'number': 1, "string": "abc"}
for key, val in dct.items():
    print(key, val)

number 1
string abc


## Iterate over File Connection

If we apply `iter()` to file connection then we will get individual lines:

In [39]:
f = open('../data/test/test_14L.txt')
it = iter(f)

print(next(it))
print(next(it))
# or
print(it.__next__())
print(it.__next__())

Shall I compare thee to a summer's day?

Thou art more lovely and more temperate:

Rough winds do shake the darling buds of May,

And summer's lease hath all too short a date;



Iterate over file connection in a loop:

In [40]:
f = open('../data/test/test_14L.txt')

for line in f:
    print(line)

Shall I compare thee to a summer's day?

Thou art more lovely and more temperate:

Rough winds do shake the darling buds of May,

And summer's lease hath all too short a date;

Sometime too hot the eye of heaven shines,

And often is his gold complexion dimm'd;

And every fair from fair sometime declines,

By chance or nature's changing course untrimm'd;

But thy eternal summer shall not fade,

Nor lose possession of that fair thou ow'st;

Nor shall death brag thou wander'st in his shade,

When in eternal lines to time thou grow'st:

So long as men can breathe or eyes can see,

So long lives this, and this gives life to thee.


## Build Custom Iterator

Building an iterator from scratch is easy in Python. We just have to implement the methods `__iter__()` and `__next__()`.

The `__iter__()` method returns the iterator object itself. If required, some initialization can be performed.

The `__next__()` method must return the next item in the sequence. On reaching the end, and in subsequent calls, it must raise `StopIteration`.

Here, we show an example that will give us next power of 2 in each iteration. Power exponent starts from zero up to a user set number. The example is taken from [programiz.com](https://www.programiz.com/python-programming/iterator).

In [17]:
class PowTwo:
    """Class to implement an iterator
    of powers of two"""

    def __init__(self, max = 0):
        self.max = max

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n <= self.max:
            result = 2 ** self.n
            self.n += 1
            return result
        else:
            raise StopIteration

First, let's go through it manually, one by one:

In [21]:
x = PowTwo(3)
it = iter(x)

print(next(it))
print(next(it))
# or
print(it.__next__())
print(it.__next__())
print(it.__next__()) # <-- this will raise error

1
2
4
8


StopIteration: 

Second, let's go through it in a loop:

In [22]:
for i in PowTwo(3):
    print(i)

1
2
4
8


Note, if `__next__()` method doesn't have `raise StopIteration` command then it could easily lead to an **infinite** loop, so be careful if you call infinite iterator with `for` loop!

Let's implement **infinite iterable object** - the same class but without upper boundary and `raise StopIteration` command:

In [23]:
class PowTwo:
    """Class to implement an iterator
    of powers of two"""

    def __init__(self):
        pass

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        result = 2 ** self.n
        self.n += 1
        return result

In [97]:
x = PowTwo()
it = iter(x)

print(next(it))
print(next(it))
# or
print(it.__next__())
print(it.__next__())
print(it.__next__()) # <-- this will NOT raise error anymore, we can iterate to infinity

1
2
4
8
16


## Efficient Memory Management with Iterators

Note, iterator items are not created at the time of definition, instead iterator produces items every time we call `next()` until it reaches the limit.

Example from [datacamp.com](https://datacamp.com):
 * `range()` doesn't create actual list, but creates set of rules how items are defined - iterable object
 * `iter()` prepares to create first item
 * `next()` creates next item and discards previous item, so it leaves very small memory foot print

If `range()` would create the full actual list, calling it with a value. for instance, of $10^{100}$ may not work, especially since the number as big as that may go over a regular computer's memory. But since it doesn't create the list, we can easily define such `range` object and iterate over it manually:

In [43]:
r = range(10 ** 100) # <-- 10^100 is used here without problem
it = iter(r)
print(next(it))
print(next(it))
# or
print(it.__next__())
print(it.__next__())

0
1
2
3


Note, we don't want to convert this iterator to `list`, or unpack it with `*it`, or loop through it, since all of such access methods will infinitly run.

## Enumerate

The `enumerate()` adds counter to an iterable object and return new object of `enumerate` type which contains pairs of index of item from iterable and item itself:

In [78]:
r = range(10)
en = enumerate(r)
print(type(r))
print(type(en))
print(en) # <-- note, this returns `enumerate` object but not its content
print(next(en))
print(next(en))
# or
print(en.__next__())
print(en.__next__())

<class 'range'>
<class 'enumerate'>
<enumerate object at 0x000002CE91E61BD8>
(0, 0)
(1, 1)
(2, 2)
(3, 3)


By default `enumerate` starts counting from `0`, but it could be changed with second argument `start`: `enumerate(iterable, start = 0)`.

Note, as always with iterators, we can enumerate any iterable object and print it out after converting to a list of tuples:

In [55]:
lst = [1,2, "a", "b"]
en = enumerate(lst, start = 1)
print(list(en)) # <-- note, this converts `enumerate` object into a list of tuples

[(1, 1), (2, 2), (3, 'a'), (4, 'b')]


or we can use upnpack asterisk `*` operator for printing:

In [56]:
lst = [1,2, "a", "b"]
en = enumerate(lst, start = 1)
print(*en)

(1, 1) (2, 2) (3, 'a') (4, 'b')


Remember, we can iterate over items of an iterator only once or we have to reinitialize it:

In [108]:
lst = [1,2, "a", "b"]
en = enumerate(lst, start = 1)
print("1st access: ", list(en)) # <-- note, this converts `enumerate` object into a list of tuples
print("2nd access: ", list(en))

1st access:  [(1, 1), (2, 2), (3, 'a'), (4, 'b')]
2nd access:  []


We can loop over `enumerate` object:

In [103]:
lst = [1,2, "a", "b"]

for index, value in enumerate(lst, start = 1):
    print(index, value)

1 1
2 2
3 a
4 b


## Zip

The `zip()` function take iterables, aggregates items from iterables based on the order, and returns an iterator of tuples:

In [98]:
lst1 = [1,2,3,4]
lst2 = ["a","b","c","d"]
z = zip(lst1, lst2)
print(type(z))
print(z) # <-- note, this returns `zip` object but not its content
print(next(z)) # <-- access first item with `next`
print(list(z)) # <-- note, this converts `zip` object into a list of tuples starting from second item

<class 'zip'>
<zip object at 0x000002CE91E5F508>
(1, 'a')
[(2, 'b'), (3, 'c'), (4, 'd')]


Unpacking also works for `zip` object:

In [100]:
lst1 = [1,2,3,4]
lst2 = ["a","b","c","d"]
z = zip(lst1, lst2)
print(*z) # <-- note, this unpacks `zip` object

(1, 'a') (2, 'b') (3, 'c') (4, 'd')


`zip()` with zero arguments:

In [70]:
z = zip()
print(type(z))
print(z) # <-- note, this returns `zip` object but not its content
print(list(z)) # <-- note, this converts `zip` object into a list of tuples

<class 'zip'>
<zip object at 0x000002CE91E70308>
[]


`zip()` with 1 argument:

In [101]:
lst = [1,2,3,4]
z = zip(lst)
print(type(z))
print(z) # <-- note, this returns `zip` object but not its content
print(list(z)) # <-- note, this converts `zip` object into a list of tuples

<class 'zip'>
<zip object at 0x000002CE91E672C8>
[(1,), (2,), (3,), (4,)]


`zip()` with 3 (or more) arguments:

In [102]:
lst1 = [1,2,3,4]
lst2 = ["a","b","c","d"]
lst3 = [1.0, 2.1, 3.2, 4.3]
z = zip(lst1, lst2, lst3)
print(type(z))
print(z) # <-- note, this returns `zip` object but not its content
print(*z) # <-- note, this unpacks `zip` object

<class 'zip'>
<zip object at 0x000002CE91E76708>
(1, 'a', 1.0) (2, 'b', 2.1) (3, 'c', 3.2) (4, 'd', 4.3)


`zip()` with arguments of different length will be truncated to match smalles length:

In [106]:
lst1 = [1,2,3]
lst2 = ["a","b","c","d"]
lst3 = [1.0, 2.1, 3.2, 4.3,5.4]
z = zip(lst1, lst2, lst3)
print(type(z))
print(z) # <-- note, this returns `zip` object but not its content
print(list(z)) # <-- note, this converts `zip` object into a list of tuples

<class 'zip'>
<zip object at 0x000002CE91E84F08>
[(1, 'a', 1.0), (2, 'b', 2.1), (3, 'c', 3.2)]


We can loop over `zip` object:

In [104]:
lst1 = [1,2,3]
lst2 = ["a","b","c","d"]
lst3 = [1.0, 2.1, 3.2, 4.3,5.4]
for i in zip(lst1, lst2, lst3):
    print(i)

(1, 'a', 1.0)
(2, 'b', 2.1)
(3, 'c', 3.2)
