We would've seen most of these iteration tools in the previous sections and you'll recognise that they're ***all*** **lazy iterators** and not **iterables**, making them highly efficient.

# 01 - Aggregators

#### Lecture 

Aggregators are functions that iterate through an iterable and returns a single value that (usually) takes into account every element of the iterable.

For example `min(iterable)`, `max(iterable)`, `sum(iterable)`

Also: `any(iterable)` and `all(iterable)`. Regarding these two, remember that **every** object has a truth value. The rule for all objects is the following:

Every object has a **`True`** truth value, **except**:

- None
- False
- 0 in any numeric type (e.g. int, float etc)
- empty sequences (e.g. list, tuple, string)
- empty mapping types (e.g. dictionaries, sets)
- custom classes that implement a `__bool__` or `__len__` method that returns False or 0. If neither is present, we default to `True`.

**Definition**

Predicate: A predicate is a function that takes a single argument and returns `True` or `False`. For example, `bool()`.

We can make `all()` and `any()` more useful by first **applying a predicate** to each element of the iterable.

#### Example 1

For example, say we want to know if **every element is less that 10**:

In [18]:
l = [1, 2, 3, 4, 100, 5, 6]

pred = lambda x: x < 10

result = [pred(item) for item in l]

all(result)

False

The neater way to do this is using `map(fn, iterable)` which applies a **predicate** to each element in the iterable:

In [19]:
def pred_new(x):
    print(f'Mapping {x}', end=', ')
    return x < 10

all(map(pred_new, l))

Mapping 1, Mapping 2, Mapping 3, Mapping 4, Mapping 100, 

False

Because `map` is an iterator, it doesn't have to map every element first and then pass the result to `all()`. Instead, `all()` will lazily request a result from `map`; if it receives a `False` it will terminate immediately. This is why we don't see the final two elements (5 and 6) of `l` being printed.

If we do not want to use `map`, we should use a `generator comprehension`, *NOT* a list comprehension, so that we get the benefits of lazy evaluation.

In [20]:
result = (pred_new(item) for item in l)
all(result)

Mapping 1, Mapping 2, Mapping 3, Mapping 4, Mapping 100, 

False

#### Example 2

We have a file call `car-brands.txt`. We want to know if **every** car name is longer than 3 (4 including the \n character at the end):

Recall that `open(<file>)` (or `f`) is a lazy iterator - we don't need to loop through each line. We can pass `f` to another iterator such as `map`, and pass that output to another iterator such as `all` or `any` and they'll terminate at the right time. 

In [26]:
filename = '../Section 08 - Iteration Tools/01 - Aggregators/car-brands.txt'

with open(filename) as f:
    result = all(map(lambda row: len(row) >= 4, f))
    print(result)

True


# 02 - Slicing Iterables

Recall that we could slice sequences with `[]` notation as well as the `slice` object:

In [32]:
seq = list(range(0, 10, 1))
print(seq[2:8:2])
print(seq[slice(2, 8, 2)])

[2, 4, 6]
[2, 4, 6]


But we can slice **iterables** (including **iterators**) with `islice` from `itertools`:
```python
islice(iterable, start, stop, step)
```

- `islice` will iterate through the iterable until it has met the conditions of the slice. For example, if we only want a slice of the first 5 objects of an infinite iterable, it will raise the `StopIteration` error after the 5th element
- Recalling that all itertools are **iterators**, `islice()` will *yield* a value, not return it, and hence `islice` returns a **lazy iterator**.

In the example below `factorials()` is an infinite iterable (more specifically an iterator), not a sequence, so it cannot be sliced regularly - we need to use `islice`. 

In [46]:
import math
from itertools import islice

def factorials():
    idx = 0
    while True:
        yield math.factorial(idx)
        idx += 1

result = islice(factorials(), 2, 9, 2)
list(result)

[2, 24, 720, 40320]

We've exhausted the `islice` iterator, so we can't reuse it.

In [41]:
list(result)

[]

# 03 - Selecting and Filtering

#### `filter`

The `filter` function takes an iterable and applies a predicate to it. If the predicate returns `True`, it will retain that element; otherwise, it'll throw it away.
```python
filter(predicate, iterable)
```

The equivalent way to get the same functionality of `filter` is with the following:
```python
(item for item in iterable if pred(item))
```

##### Example 1

In [1]:
l = [2, 1, 10, 5, 3, 6, 1, 10]
result = filter(lambda x: x < 4, l)
result

<filter at 0x17bd3bda430>

In [2]:
list(result)

[2, 1, 3, 1]

##### Example 2

In this example, we are going to iterate through a list of cube values and throw away all the values that are even, but we're going to do it lazily.

In [15]:
def gen_cubes(n):
    for i in range(n):
        print(f'yielding {i}')
        yield i**3

In [16]:
def is_odd(x):
    return x % 2 == 1

In [17]:
filtered = filter(is_odd, gen_cubes(10))

In [18]:
list(filtered)

yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9


[1, 27, 125, 343, 729]

##### `filterfalse` 

This is not builtin but it's in the standard library. It does what it says on the tin:

##### Example 1

In [3]:
from itertools import filterfalse

l = [2, 1, 10, 5, 3, 6, 1, 10]
result = filterfalse(lambda x: x < 4, l)
list(result)

[10, 5, 6, 10]

##### Example 2

This is the same as the example in `filter` but the inverse.

In [19]:
def gen_cubes(n):
    for i in range(n):
        print(f'yielding {i}')
        yield i**3

In [20]:
def is_odd(x):
    return x % 2 == 1

In [21]:
filtered = filterfalse(is_odd, gen_cubes(10))

In [22]:
list(filtered)

yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9


[0, 8, 64, 216, 512]

#### `compress`

This is not a compressor in the sense of say a zip archive.

It is basically a way of *filtering* one iterable, using the truthiness of items in another iterable, pairwise.

```python
data =      ['a',   'b', 'c', 'd',   'e']
              ^      ^    ^    ^      ^
              |      |    |    |      |
selectors = [True, False, 1,   0]  # None

compress(data, selectors) -> 'a', 'c'
```
Since the first and third element of `selectors` are truthy, we will only yield the first and third elements of `data`.

In [23]:
from itertools import compress

data = ['a', 'b', 'c', 'd', 'e']
selectors = [True, False, 1, 0]

compressed = compress(data, selectors)
list(compressed)

['a', 'c']

#### `takewhile`

```python
takewhile(pred, iterable)
```
This function returns an iterator that will yield while `predicate(item)` is Truthy. Once we run into a Falsy value, the iterator becomes exhausted.

##### Example 1

In [4]:
from itertools import takewhile

result = takewhile(lambda x: x < 5, [1, 2, 10, 3, 4])

for i in result:
    print(i)

1
2


##### Example 2

In this example, we will calculate numerous `sin(x)` values for evenly spaced `x`. 

In [11]:
from math import sin, pi

def sine_wave(n):
    start = 0
    max_ = 2 * pi
    step = (max_ - start) / (n-1)
    for _ in range(n):
        yield round(sin(start), 2)
        start += step    

In [12]:
list(sine_wave(15))

[0.0,
 0.43,
 0.78,
 0.97,
 0.97,
 0.78,
 0.43,
 0.0,
 -0.43,
 -0.78,
 -0.97,
 -0.97,
 -0.78,
 -0.43,
 -0.0]

In [13]:
from itertools import takewhile

list(takewhile(lambda x: 0 <= x <= 0.9, sine_wave(15)))

[0.0, 0.43, 0.78]

#### `dropwhile`

```python
dropwhile(pred, iterable)
```
This function is the inverse of the above. It returns an iterator that will start iterating and `yield` *all* remaining items unconditionally only once `predicate(item)` becomes Falsy.

##### Example 1

In [5]:
from itertools import dropwhile

result = dropwhile(lambda x: x < 5, [1, 2, 10, 3, 4])

for i in result:
    print(i)

10
3
4


# 04 - Infinite Iterators

#### `itertools.count`

This is a lazy iterator similar to range as it has `start`, `step` but no `stop`.

Also, `start` and `step` do not have to be integers unlike with `range()` - they can be any numeric type.

As these iterators are infinite, it can be quite useful to pair them with a `takewhile` so that we can control when they stop

##### Example 1

In [2]:
from itertools import count, cycle, repeat, islice

g = count(10)

list(islice(g, 5))

[10, 11, 12, 13, 14]

Remember, `count` is infinite so we can't `list(g)`.

##### Example 2

In [8]:
from decimal import Decimal

g = count(Decimal('1.0'), Decimal('0.2'))

list(islice(g, 5))

[Decimal('1.0'),
 Decimal('1.2'),
 Decimal('1.4'),
 Decimal('1.6'),
 Decimal('1.8')]

#### `itertools.cycle`

This cycles over a finite iterable (including iterators) indefinitely.
```python
cycle(['a', 'b', 'c']) -> 'a', 'b', 'c', 'a', 'b', 'c', 'a', ...
```
**One important thing to note:** If an exhaustible iterator is passed as an argument to `cycle`, the iterator won't ever exhaust - `cycle` will manage to return back to the start of the iterator and keep on cycling.

##### Example 1

In [12]:
g = cycle(('red', 'green', 'blue'))
list(islice(g, 10))

['red', 'green', 'blue', 'red', 'green', 'blue', 'red', 'green', 'blue', 'red']

##### Example 2

To show that an exhaustible iterator can be passed to `cycle` and not be exhausted:

In [14]:
def colours():
    yield 'red'
    yield 'green'
    yield 'blue'

g = cycle(colours())
list(islice(g, 10))

['red', 'green', 'blue', 'red', 'green', 'blue', 'red', 'green', 'blue', 'red']

##### Example 3

A more real-life example of `cycle` could be if you had a card deck that you want to deal out to 4 players. 

In [22]:
from collections import namedtuple

Card = namedtuple('Card', 'rank suit')

def card_deck():
    RANKS = tuple(str(i) for i in range(2, 11)) + tuple('JQKA')
    SUITS = ('Spades', 'Hearts', 'Diamonds', 'Clubs')

    for suit in SUITS:
        for rank in RANKS:
            yield Card(rank, suit)

hands = [list() for _ in range(4)]

index_cycle = cycle([0, 1, 2, 3])
for card in card_deck():
    hands[next(index_cycle)].append(card)

hands

But we can improve upon this...

The `index.cycle = cycle([0, 1, 2, 3])` is just cycling through like so: `hands[0]`, `hands[1]`, `hands[2]`, `hands[3]`, `hands[0]`, ...

So what we're actually doing is just cycling through the `hands` iterable.

Why not make `hands` into a cycle object?

In [24]:
from collections import namedtuple

Card = namedtuple('Card', 'rank suit')

def card_deck():
    RANKS = tuple(str(i) for i in range(2, 11)) + tuple('JQKA')
    SUITS = ('Spades', 'Hearts', 'Diamonds', 'Clubs')

    for suit in SUITS:
        for rank in RANKS:
            yield Card(rank, suit)

hands = [list() for _ in range(4)]
hands_cycle = cycle(hands)

for card in card_deck():
    next(hands_cycle).append(card)

hands

[[Card(rank='2', suit='Spades'),
  Card(rank='6', suit='Spades'),
  Card(rank='10', suit='Spades'),
  Card(rank='A', suit='Spades'),
  Card(rank='5', suit='Hearts'),
  Card(rank='9', suit='Hearts'),
  Card(rank='K', suit='Hearts'),
  Card(rank='4', suit='Diamonds'),
  Card(rank='8', suit='Diamonds'),
  Card(rank='Q', suit='Diamonds'),
  Card(rank='3', suit='Clubs'),
  Card(rank='7', suit='Clubs'),
  Card(rank='J', suit='Clubs')],
 [Card(rank='3', suit='Spades'),
  Card(rank='7', suit='Spades'),
  Card(rank='J', suit='Spades'),
  Card(rank='2', suit='Hearts'),
  Card(rank='6', suit='Hearts'),
  Card(rank='10', suit='Hearts'),
  Card(rank='A', suit='Hearts'),
  Card(rank='5', suit='Diamonds'),
  Card(rank='9', suit='Diamonds'),
  Card(rank='K', suit='Diamonds'),
  Card(rank='4', suit='Clubs'),
  Card(rank='8', suit='Clubs'),
  Card(rank='Q', suit='Clubs')],
 [Card(rank='4', suit='Spades'),
  Card(rank='8', suit='Spades'),
  Card(rank='Q', suit='Spades'),
  Card(rank='3', suit='Hearts'),


#### `itertools.repeat`

This function simply yields the same value indefinitely, but an additional argument can be specified to make the count finite.

##### Example 1

In [28]:
from itertools import repeat

g = repeat('Python')

for _ in range(3):
    print(next(g))

Python
Python
Python


In [29]:
g = repeat('Python', 4)
list(g)

['Python', 'Python', 'Python', 'Python']

**One important thing to note:** The items yielded by `repeat` are the **same** object. So, if that object is a mutable and it is mutated between repeats, that mutation will be observed.

# 05 - Chaining and Teeing Iterators

#### `chain(*args)` and `chain.from_iterable(it)` 

`itertools.chain(*args)` takes a number of args that can be **iterables** and returns a lazy **iterator**.

`chain` is analogous to sequence concatenation. To implement the functionality of `chain` without using `chain`, you can do:

```python
for it in (iter1, iter2, iter3):
    yield from it
```

This will `yield` everything from `iter1` before moving onto `iter2`, and then finally from `iter3`.

With `chain` you will get:

```python
l = [iter1, iter2, iter3]
for item in chain(*l):
    print(item)
```
The items are still `yield`ed but there's a caveat!

**Caveat: unpacking an iterable of iterators is _eager_ not _lazy_. So above, we have iterated through `l` (iterable) eagerly, but we haven't touched the iterators yet. If `l` was an iterable of 1 million iterators, we would have unpacked all of them eagerly before starting the loop and getting the first item out of the first iterator.** 

Instead, if we really need **lazy** evaluation, we can use `itertools.chain.from_iterable(it)`

This takes an iterable like `l` above which contains a list of iterators. Python will lazily iterate through this iterable and `yield from` the first iterator lazily of course.

In [41]:
from itertools import chain

l1 = (i**2 for i in range(4))
l2 = (i**2 for i in range(4, 8))
l3 = (i**2 for i in range(8, 12))

lists = [l1, l2, l3]

for item in chain(*lists):
    print(item)

0
1
4
9
16
25
36
49
64
81
100
121


We can observe the **eager** unpacking by storing all our iterators in a generator function.

In [44]:
def squares():
    print('yielding 1st item')
    yield (i**2 for i in range(2))
    print('yielding 2nd item')
    yield (i**2 for i in range(2, 5))
    print('yielding 3rd item')
    yield (i**2 for i in range(5, 8))


for item in chain(*squares()):
    print(item)

yielding 1st item
yielding 2nd item
yielding 3rd item
0
1
4
9
16
25
36
49


As you can see in the print statements above, all iterators are yielded before the main loop starts.

But we can get around that using `chain.from_iterable()`

In [45]:
def squares():
    print('yielding 1st item')
    yield (i**2 for i in range(2))
    print('yielding 2nd item')
    yield (i**2 for i in range(2, 5))
    print('yielding 3rd item')
    yield (i**2 for i in range(5, 8))

for item in chain.from_iterable(squares()):
    print(item)

yielding 1st item
0
1
yielding 2nd item
4
9
16
yielding 3rd item
25
36
49


#### `tee(iterable, n)`

Let's say we are sent an iterator (e.g. via a `return` of a function) that is very difficult/time-consuming to acquire. How would we make a copy of it? 

One way that we've seen is just to manually create the iterator multiple times:
```python
iters = []
for _ in range(10):
    iters.append(create_iterator())
```
but if it's time-consuming we may not want to do that.

The other solution is `itertools.tee` which returns **independent iterators in a tuple**. This can let us iterate through the same iterator **multiple times** or even **in parallel**.
```python
tee(iterable, 10) -> (iter1, iter2, ..., iter10)
```
`iter1` through `iter10` are all different objects, but they are all **lazy iterators**. *Always. Even if the original `iterable` argument was not.*

```python
l = [1, 2, 3, 4]
tee(l, 3) -> (iter1, iter2, iter3)
```
`iter1` and the rest are all **lazy iterators** despite `l` being an **iterable**.

##### Example 1

Let's make 3 **independent** copies of an iterator:

In [46]:
from itertools import tee

def squares(n):
    for i in range(n):
        yield i**2

gen = squares(5)

iters = tee(gen, 3)
iters  

(<itertools._tee at 0x2732a6dbd80>,
 <itertools._tee at 0x2732a6dbe00>,
 <itertools._tee at 0x2732a6dbe80>)

As you can see the memory addresses are different, therefore, iterating through one has no impact on the other iterators. We have *tee'd up our iterator 3 times*.

Let's tee up a list (**iterable**) multiple times and look for a return value of multiple **iterators**. 

In [51]:
l = [1, 2, 3, 4]

lists = tee(l, 2)
lists

(<itertools._tee at 0x2732a6e8e00>, <itertools._tee at 0x273291e4dc0>)

In [52]:
list(lists[0])

[1, 2, 3, 4]

`lists[0]` is now exhausted:

In [53]:
list(lists[0])

[]

In [54]:
lists[0] is iter(lists[0])

True

**Important Limitation**:

This is also mentioned in the `itertools.groupby` section as it's often observed with it.

Often times, we may want to `tee` our `groupby` iterator for each key in it. This is so that we can consume our keys' sub iterators when we wish.

For example, if we have the following `groupby` iterator: `it = [("Male", <subiter_obj_1>), ("Female", <subiter_obj_2>)]` and we make two copies so that we can consume the "female" subiterator and the "male" subiterator independently:

```python
it_male, it_female = itertools.tee(it, 2)

it_male -> [("Male", <subiter_obj_1>), ("Female", <subiter_obj_2>)]
it_female -> [("Male", <subiter_obj_1_copy>), ("Female", <subiter_obj_2_copy>)]
```

The issue is that **`itertools.tee` performs SHALLOW copy**. Therefore, `subiter_obj_1 = subiter_obj_1_copy`. So, if we consume `subiter_obj_1`, that will consume its copy. And the same goes for `subiter_obj_2`.   

# 06 - Mapping and Reducing

#### Mapping

```python
map(fn, iterable)
```
`map` applies `fn` to every element of the `iterable` and returns a lazy iterator.
`fn` must be a callable that requires a **single** argument because python will do `fn(<val iterated from iterable>)`

It is basically equivalent to:
```python
maps = (fn(item) for item in iterable)
```

#### Reducing

```python
reduce(fn, iterable, [initialiser])
```
`reduce` applies `fn` cumulatively to elements of an iterable, pairwise. `fn` must be a callable that requires **two** arguments.

`sum` can be implemented with `reduce`:

In [29]:
from functools import reduce

l = [1, 2, 3, 4]
reduce(lambda x, y: x + y, l)

10

What Python is doing is:

-> 1 = 1\
-> 1 + 2 = 3\
-> 3 + 3 = 6\
-> 6 + 4 = 10
-

Notice how on the first iteration, Python does nothing but takes the first value in the iterable and stores that as 'previous value'.

Then, in all following iterations, it passes to the `lambda`: `x = 'previous value'` and `y = 'current_value'`

We can set this first value using the initialiser:

In [30]:
from functools import reduce

l = [1, 2, 3, 4]
reduce(lambda x, y: x + y, l, 100)

110

#### `itertools.starmap`

If you have an iterable of iterables, such as `[ [1, 2], [3, 4] ]` and you want to apply `map` to values in each subiterable, you can use `starmap`.

This unpacks every element of the subiterable and passes that to the regular map function.

For example:

In [31]:
from itertools import starmap

l = [ [1, 2, 3], [10, 20, 30], [100, 200, 300] ]

result = starmap(lambda x, y, z: x + y + z, l)
list(result)

[6, 60, 600]

Our `lambda` function is very simple but the `starmap` handles the awkwardness of unpacking. 

It is still similar to `map` in that we had an iterable of three items (subiterables) and we returned an iterator of exactly 3 items.

There are many ways of doing it. We could instead iterate through each iterable and reduce it to a single value with `sum`:

In [32]:
list((sum(item) for item in l))

[6, 60, 600]

Or even explicitly using `reduce`:

In [33]:
from functools import reduce
import operator

l = [ [1, 2, 3], [10, 20, 30], [100, 200, 300] ]

list((reduce(operator.add, item) for item in l))

[6, 60, 600]

#### `itertools.accumulate(iterable, fn)`

This is very similar to `reduce`. It takes an `iterable` and produces one value.

But, unlike `reduce`, **it returns a lazy iterator that produces all intermediate results**.

Also, quite confusingly, the arguments of `accumulate` are reversed compared to `reduce`:

`functools.reduce(fn, iterable)`\
`itertools.accumulate(iterable, fn)`


The reason is because in `accumulate`, `fn` is an optional parameter. If it's left out, it defaults to addition.

In [11]:
from itertools import accumulate

l = [1, 2, 3, 4]
g = accumulate(l, lambda x, y: x * y)

for item in g:
    print(item)

1
2
6
24


`accumulate` doesn't let us provide an initialiser value. But we can mimic that in many ways:

- Using `insert()` Method
- Using `[ ]` and `+` Operator
- Using List Slicing
- Using `collections.deque.appendleft` (this is often the best (O(1) time and space complexity) 
- Using `itertools.chain()`

I'll quickly demonstrate 3 of those

In [14]:
from itertools import accumulate
import operator

res = accumulate([10] + [1, 2, 3, 4], operator.mul)
list(res)

[10, 10, 20, 60, 240]

In [17]:
l = [1, 2, 3, 4]
l.insert(0, 10)

res = accumulate(l, operator.mul)
list(res)

[10, 10, 20, 60, 240]

In [24]:
from itertools import chain

res = accumulate( chain( (10,), [1, 2, 3, 4] ) , operator.mul)
list(res)

[10, 10, 20, 60, 240]

# 07 - Zipping

We've covered this multiple times. `zip` produces a lazy iterator. Be aware that `zip` stops based on the shortest iterable provided to it. But we can stop on the longest iterable using `itertools.zip_longest(*args, [fillvalue=None])`

We can also use iterators as the arguments of `zip`:

In [26]:
def integers(n):
    for i in range(n):
        yield i

def squares(n):
    for i in range(n):
        yield i**2

def cubes(n):
    for i in range(n):
        yield i**3

iter1 = integers(3)
iter2 = squares(4)
iter3 = cubes(5)

res = zip(iter1, iter2, iter3)
list(res)

[(0, 0, 0), (1, 1, 1), (2, 4, 8)]

# 08 - Grouping

Suppose we have an iterable containing a number of tuples. How would we group those tuples into different groups based off their first element. 

For example, all tuples with the first element as `1` are grouped together and all tuples with the first element as `2` are grouped together.

![8.1.png](s8-images/8.1.png)

**`itertools.groupby(data, [keyfunc])`** 

This is a **lazy iterator**. `data` is the iterable that we want to divide into numerous groups. `keyfunc` is optional - by default, it will use the data element itself; this is rare as we'd normally want to provide a key.

**Return value**

We get back a lazy iterator that produces **tuples** of structure: `(key, sub_iterator)`. In our example above, `key=1` and the sub_iterator will contain all items in that group i.e. `(1, 10, 100)`, `(1, 11, 101)`, `(1, 12, 102)`. But to get those items out, we have to iterate through the sub_iterator.

**Note**:

In SQL, after grouping, the data is sorted by that key, so in the above example, all items in group 1 come first because the element that we sort by, i.e. `1`, is smaller than `2` and `3`.

In Python, **we have to do the sorting ourselves** if we want it.

If we have some data that looks like:
```python
data = (1, 2, 2, 2, 3, 1)
```
and we `list(itertools.groupby(data)`, then:
- firstly, no key was specified so the actual element is used as the key.
- we will have a list of **4** tuples: `[(1, <groupby_obj_1>), (2, <groupby_obj_2>), (3, <groupby_obj_3>), (1, <groupby_obj_4>)]`, instead of **3**.

**Important Note:**

![8.2.png](s8-images/8.2.png)

To further explain:

- We've first created our `groups` lazy iterator by passing our iterable of tuples and a `lambda`. **Our iterable is converted into an iterator** which I shall call **main iterator**.
- Calling `next(groups)` produces the first tuple -> `(1, <sub_iterator_1>)`.
- Let's say we iterate through `<sub_iterator_1>` three times by doing `next(<sub_iterator_1>)` three times. Each time we do that, Python goes to our **main iterator** and calls `next()`.
- Let's say we call `next(groups)` to produce the second tuple -> `(2, <sub_iterator_2>)`.
- But then we realise we don't care about group 2 so we don't ever touch the `<sub_iterator_2>` and instead we want to move onto group 3.
- So we run `next(groups)`to get `(3, <sub_iterator_3>)`.
- Python will *immediately* consume all of the `<sub_iterator_2>` by calling `next(<sub_iterator_2>)`. This will internally call `next()` on our **main iterator** until all of group 2 has been consumed.
- Why does it do this? Because, working our way down our **main iterator**, we can only get to group 3 tuples by getting past all of the group 2 elements.  
- Recalling that we've got `(3, <sub_iterator_3>)` currently, if we start iterating through `<sub_iterator_3>`, we will call `next()` on our **main iterator** and produce the expected items.

The key takeaway is that **all sub-iterators internally use the same main iterator**. Skipping a group because you don't care about its items does **not** save time because Python still has to iterate through them to get to the groups you do care about.

Let's quickly do two examples to remind ourselves of the output structure.

In [33]:
data = (1, 2, 2, 2, 3, 3, 1)

it = itertools.groupby(data)

for tuple_group in it:
    print(tuple_group[0], list(tuple_group[1]))

1 [1]
2 [2, 2, 2]
3 [3, 3]
1 [1]


In [38]:
data = (
    (1, 'abc'),
    (2, 'def'),
    (2, 'ghi'),
    (2, 'jkl'),
    (3, 'mno'),
    (3, 'pqr')
)

it = itertools.groupby(data, lambda x: x[0])

for group_key, sub_iter in it:
    print(group_key, list(sub_iter))

1 [(1, 'abc')]
2 [(2, 'def'), (2, 'ghi'), (2, 'jkl')]
3 [(3, 'mno'), (3, 'pqr')]


Another reminder is `it` is an **iterator**. 

If you `list(it)`, the output might look like `[(1, <groupby_obj_1>), (2, <groupby_obj_2>), (3, <groupby_obj_3>), (1, <groupby_obj_4>)]`, but `it` is now exhausted so you can't get a handle on those `<groupby_obj>`'s.

**Important Limitation**:

This is also mentioned in the `itertools.tee` section but it is just as pertinent in this section as it's often observed here.

Often times, we may want to `tee` our `groupby` iterator for each key in it. This is so that we can consume our keys' sub iterators when we wish.

For example, if we have the following `groupby` iterator: `it = [("Male", <subiter_obj_1>), ("Female", <subiter_obj_2>)]` and we make two copies so that we can consume the "female" subiterator and the "male" subiterator independently:

```python
it_male, it_female = itertools.tee(it, 2)

it_male -> [("Male", <subiter_obj_1>), ("Female", <subiter_obj_2>)]
it_female -> [("Male", <subiter_obj_1_copy>), ("Female", <subiter_obj_2_copy>)]
```

The issue is that **`itertools.tee` performs SHALLOW copy**. Therefore, `subiter_obj_1 = subiter_obj_1_copy`. So, if we consume `subiter_obj_1`, that will consume its copy. And the same goes for `subiter_obj_2`.   

##### `cars.csv` Example

Let's get some rows from the data. 

**Useful Note: We are using `islice` because we don't want to load the entire file's content into memory and `islice` lazily evaluates from a start to a stop point.**

In [17]:
import itertools

with open('../Section 08 - Iteration Tools/08 - Grouping/cars_2014.csv') as f:
    for row in itertools.islice(f, 0, 20):
        print(row, end='')

make,model
ACURA,ILX
ACURA,MDX
ACURA,RDX
ACURA,RLX
ACURA,TL
ACURA,TSX
ALFA ROMEO,4C
ALFA ROMEO,GIULIETTA
APRILIA,CAPONORD 1200
APRILIA,RSV4 FACTORY APRC ABS
APRILIA,RSV4 R APRC ABS
APRILIA,SHIVER 750
ARCTIC CAT,1000 XT
ARCTIC CAT,500 XT
ARCTIC CAT,550 XT
ARCTIC CAT,700 LTD
ARCTIC CAT,700 SUPER DUTY DIESEL
ARCTIC CAT,700 XT
ARCTIC CAT,90 2X4 4-STROKE


**Question: How many models exist for each make?** 

Remember, `defaultdict` allows you to set a default value for a key based on a type if that key doesn't exist.

- If the type is `str`, the default value is `''`.
- If the type is `int`, the default value is `0`.
- If the type is `list`, the default value is `[]` etc. 

In [20]:
from collections import defaultdict

makes = defaultdict(int)  # if makes[<key>] doesn't exist, create the key and set its value to 0

with open('../Section 08 - Iteration Tools/08 - Grouping/cars_2014.csv') as f:
    next(f)
    for row in f:
        make, _ = row.strip('\n').split(',')
        makes[make] += 1

for key, value in makes.items():
    print(f'{key}: {value}')

ACURA: 6
ALFA ROMEO: 2
APRILIA: 4
ARCTIC CAT: 96
ARGO: 4
ASTON MARTIN: 5
AUDI: 27
BENTLEY: 2
BLUE BIRD: 1
BMW: 86
BUGATTI: 1
BUICK: 5
CADILLAC: 7
CAN-AM: 61
CHEVROLET: 33
CHRYSLER: 2
DODGE: 7
DUCATI: 4
FERRARI: 6
FIAT: 2
FORD: 34
FREIGHTLINER: 7
GMC: 12
HARLEY DAVIDSON: 29
HINO: 7
HONDA: 91
HUSABERG: 4
HUSQVARNA: 9
HYUNDAI: 13
INDIAN: 3
INFINITI: 8
JAGUAR: 9
JEEP: 5
JOHN DEERE: 19
KAWASAKI: 59
KENWORTH: 11
KIA: 10
KTM: 13
KUBOTA: 4
KYMCO: 28
LAMBORGHINI: 2
LAND ROVER: 6
LEXUS: 14
LINCOLN: 6
LOTUS: 1
MACK: 9
MASERATI: 3
MAZDA: 5
MCLAREN: 2
MERCEDES-BENZ: 60
MINI: 3
MITSUBISHI: 8
NISSAN: 24
PEUGEOT: 3
POLARIS: 101
PORSCHE: 4
RAM: 6
RENAULT: 4
ROLLS ROYCE: 3
SCION: 5
SEAT: 3
SKI-DOO: 67
SMART: 1
SRT: 1
SUBARU: 10
SUZUKI: 48
TESLA: 2
TOYOTA: 19
TRIUMPH: 10
VESPA: 4
VICTORY: 14
VOLKSWAGEN: 16
VOLVO: 8
YAMAHA: 110


`groupby` takes an **iterable** (iterators are iterables), iterates through each item and groups by some part of that item, e.g. the first element in that item.

In our case, `f` is the iterable (iterator, specifically).

In [39]:
with open('../Section 08 - Iteration Tools/08 - Grouping/cars_2014.csv') as f:
    make_groups = itertools.groupby(f, lambda x: x.split(',')[0])  # we don't need to strip \n because we're only interested 
                                                                   # in the first element. The \n will be in the 2nd element of the split.

list(make_groups)

ValueError: I/O operation on closed file.

Why did we get this error?

`groupby` is a **lazy iterator**, so the `make_groups` statement in the context manager hasn't touched any of the data yet. Instead, that would happen when we `list(make_groups)`.

But, we've closed the file already so `list(make_groups)` cannot do anything with the `f` in `.groupby(f, ...)`.

Here's the fix: 

Remember `make_groups` is an iterator of tuples of the form `(<key>, <sub_iter that generates entire rows>)`.

Since we don't care about the actual data in the `sub_iter` but rather how many items in it, we can just find the length of the `sub_iter`.

In [41]:
with open('../Section 08 - Iteration Tools/08 - Grouping/cars_2014.csv') as f:
    next(f)
    make_groups = itertools.groupby(f, lambda x: x.split(',')[0])
    result = [(key, len(sub_iter)) for key, sub_iter in make_groups]
    print(result)

TypeError: object of type 'itertools._grouper' has no len()

Unfortunately, this iterator has not implemented `__len__`, so we're going to show two ways of finding the length of a general iterator, using `squares` to demonstrate if it works.

In [64]:
def squares(n):
    for i in range(n):
        yield i**2

In [65]:
# Approach 1
def len_iterable(iterable):
    i = 0
    for _ in iterable:
        i += 1
    return i

len_iterable(squares(8))

8

In [66]:
# Approach 2
sum(1 for _ in squares(8))

8

The second approach just lazily generates `-> (1, 1, 1, 1, 1, 1)` and then sums it. We'll use this approach.

In [67]:
with open('../Section 08 - Iteration Tools/08 - Grouping/cars_2014.csv') as f:
    next(f)
    make_groups = itertools.groupby(f, lambda x: x.split(',')[0])
    result = (
        (key, sum(1 for _ in sub_iter))
        for key, sub_iter in make_groups
    )
    print(list(result))

[('ACURA', 6), ('ALFA ROMEO', 2), ('APRILIA', 4), ('ARCTIC CAT', 96), ('ARGO', 4), ('ASTON MARTIN', 5), ('AUDI', 27), ('BENTLEY', 2), ('BLUE BIRD', 1), ('BMW', 86), ('BUGATTI', 1), ('BUICK', 5), ('CADILLAC', 7), ('CAN-AM', 61), ('CHEVROLET', 33), ('CHRYSLER', 2), ('DODGE', 7), ('DUCATI', 4), ('FERRARI', 6), ('FIAT', 2), ('FORD', 34), ('FREIGHTLINER', 7), ('GMC', 12), ('HARLEY DAVIDSON', 29), ('HINO', 7), ('HONDA', 91), ('HUSABERG', 4), ('HUSQVARNA', 9), ('HYUNDAI', 13), ('INDIAN', 3), ('INFINITI', 8), ('JAGUAR', 9), ('JEEP', 5), ('JOHN DEERE', 19), ('KAWASAKI', 59), ('KENWORTH', 11), ('KIA', 10), ('KTM', 13), ('KUBOTA', 4), ('KYMCO', 28), ('LAMBORGHINI', 2), ('LAND ROVER', 6), ('LEXUS', 14), ('LINCOLN', 6), ('LOTUS', 1), ('MACK', 9), ('MASERATI', 3), ('MAZDA', 5), ('MCLAREN', 2), ('MERCEDES-BENZ', 60), ('MINI', 3), ('MITSUBISHI', 8), ('NISSAN', 24), ('PEUGEOT', 3), ('POLARIS', 101), ('PORSCHE', 4), ('RAM', 6), ('RENAULT', 4), ('ROLLS ROYCE', 3), ('SCION', 5), ('SEAT', 3), ('SKI-DOO', 6

# 09 - Combinatorics

#### Cartesian Product

Consider the two following sets: 
$$
s1 = \{1, 2, 3, ..., 10\}
$$
$$
s2 = \{1, 2, 3, ..., 10\}
$$
then the Cartesian product of the two sets is:
$$
s_1 \times s_2 = \{(x_1, x_2) \, \vert \, x_1 \in s_1 \, \textrm{and} \, x_2 \in s_2\}
$$

Another way to think of it is by creating a table:

```
        y1        y2        y3                          y1      y2      y3
x1  (x1, y1)  (x1, y2)  (x1, y3)                  x1  (1, 1)  (1, 2)  (1, 3)        

x2  (x2, y1)  (x2, y2)  (x2, y3)                  x2  (2, 1)  (2, 2)  (2, 3)
                                       ---> 
x3  (x3, y1)  (x3, y2)  (x3, y3)                  x3  (3, 1)  (3, 2)  (3, 3)

x4  (x4, y1)  (x4, y2)  (x4, y3)                  x4  (4, 1)  (4, 2)  (4, 3)
```

We can do this with the following:

In [3]:
import itertools

l1 = range(1, 5)
l2 = range(1, 4)

product = itertools.product(l1, l2)

print(list(product))

[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3)]


This is really useful if we want all the combinations of a number of iterables.

Another example: Creating a card deck can easily be with the cartesian product: 

In [6]:
from collections import namedtuple
Card = namedtuple('Card', 'rank suit')

SUITS = ['Spades', 'Hearts', 'Diamonds', 'Clubs']
RANKS = tuple(str(i) for i in range(2, 11)) + tuple('JQKA')

deck = [Card(rank, suit) for rank, suit in itertools.product(SUITS, RANKS)]
deck[0:5]

[Card(rank='Spades', suit='2'),
 Card(rank='Spades', suit='3'),
 Card(rank='Spades', suit='4'),
 Card(rank='Spades', suit='5'),
 Card(rank='Spades', suit='6')]

What are the odds of successively picking four aces from a shuffled deck?

$$
\frac{4}{52} \times \frac{3}{51} \times \frac{2}{50} \times \frac{1}{49}
= \frac{24}{6497400} = \frac{1}{270725}
$$

In other words, 52 cards choose 4 (unique) cards (no replacement). There are 270725 combinations, but only one of those is (Ace of Spades, Ace of Hearts, Ace of Diamonds, Ace of Clubs). Remember, the order doesn't matter.
In code:

In [8]:
from fractions import Fraction

deck = (Card(rank, suit) for suit, rank in itertools.product(SUITS, RANKS))
sample_space = itertools.combinations(deck, 4)
total = 0
acceptable = 0
for outcome in sample_space:
    total += 1
    for card in outcome:
        if card.rank != 'A':
            break
    else:
        # else block is executed if loop terminated without a break
        acceptable += 1
print(f'total={total}, acceptable={acceptable}')
print('odds={}'.format(Fraction(acceptable, total)))
print('odds={:.10f}'.format(acceptable/total))

total=270725, acceptable=1
odds=1/270725
odds=0.0000036938


In [10]:
all(['str', 0, []])

False

We can rewrite this algorithm using `all` and `map` to replace one of the `for` loops.

We do this by taking each of those 270725 group of 4 cards in turn and then checking if the rank of each card is Ace.

If we get 3 Aces and one Heart, it would look like: `[True, True, True, False]`. Applying `all` to this would result in `False` because they aren't all `True`.


In [11]:
deck = (Card(rank, suit) for suit, rank in itertools.product(SUITS, RANKS))
sample_space = itertools.combinations(deck, 4)
total = 0
acceptable = 0
for outcome in sample_space:
    total += 1
    if all(map(lambda x: x.rank == 'A', outcome)):
        acceptable += 1

print(f'total={total}, acceptable={acceptable}')
print('odds={}'.format(Fraction(acceptable, total)))
print('odds={:.10f}'.format(acceptable/total))

total=270725, acceptable=1
odds=1/270725
odds=0.0000036938


Here's another example: If we want to make an n-dimensional grid that doesn't have integer divisions, we can use `itertools.takewhile` and `itertools.count`:

In [22]:
def grid(min_val, max_val, step, *, num_dimensions=2):
    axis = itertools.takewhile(lambda x: x <= max_val, itertools.count(min_val, step))

    axes = itertools.tee(axis, num_dimensions)
    return itertools.product(*axes)

list(grid(-1, 1, 0.5))

[(-1, -1),
 (-1, -0.5),
 (-1, 0.0),
 (-1, 0.5),
 (-1, 1.0),
 (-0.5, -1),
 (-0.5, -0.5),
 (-0.5, 0.0),
 (-0.5, 0.5),
 (-0.5, 1.0),
 (0.0, -1),
 (0.0, -0.5),
 (0.0, 0.0),
 (0.0, 0.5),
 (0.0, 1.0),
 (0.5, -1),
 (0.5, -0.5),
 (0.5, 0.0),
 (0.5, 0.5),
 (0.5, 1.0),
 (1.0, -1),
 (1.0, -0.5),
 (1.0, 0.0),
 (1.0, 0.5),
 (1.0, 1.0)]

Yet another example: Imagine we have numerous dice and we roll them all at once. The different combinations possible is just the cartesian product. What if we wanted to know all the combinations that add up to 8? We'll need to filter the sample space:

In [33]:
sample_space = list(itertools.product(*itertools.tee(range(1,7),2)))
print(list(sample_space))

[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]


In [35]:
outcomes = list(filter(lambda x: x[0] + x[1] == 8, sample_space))
print(list(outcomes))

[(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)]


#### Permutations

From Wikipedia: 


> In mathematics, the notion of permutation relates to the act of arranging all the members of a set into some sequence or order, or if the set is already ordered, rearranging (reordering) its elements, a process called permuting. These differ from combinations, which are selections of some members of a set where order is disregarded.


https://en.wikipedia.org/wiki/Permutation

We can create permutations of length n from an iterable of length m (n <= m) using the `permutation` function:

This is basically 'p choose r' I think. By default, 'r' is equal to the length of the iterable, so below it's 3 choose 3.

Permutations are the same as combinations, except with permutations, order matters and replacements are not allowed.

In [37]:
l1 = 'abc'
list(itertools.permutations(l1))

[('a', 'b', 'c'),
 ('a', 'c', 'b'),
 ('b', 'a', 'c'),
 ('b', 'c', 'a'),
 ('c', 'a', 'b'),
 ('c', 'b', 'a')]

In [38]:
list(itertools.permutations(l1, 2))

[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]