# What are itertools

- The itertools module includes a set of functions for working with sequence data sets.
- Iterator-based code offers better memory consumption characteristics than code that uses lists (or other iterables).
- Since data is not produced from the iterator until it is needed, all of the data does not need to be stored in memory at the same time
- This “lazy” processing model can reduce swapping and other side-effects of large data sets, improving performance.

In [3]:
from itertools import *  # you should only import functions that you need in practice.

# Merging and Splitting Iterators

## chain()

- It __combines__ the sequences.

__Use case:__

- When you have more than one large iterables that needs to be combined and processed.

In [2]:
# what does chain return

numbers = [1, 2, 3]
characters = ['a', 'b', 'c']


chained_iterables = chain(numbers, characters)

print(f"TYPE: {type(chained_iterables)}")

print(f"Calling next: {next(chained_iterables)}")
print(f"Calling next: {next(chained_iterables)}")
print(f"Calling next: {next(chained_iterables)}")
print(f"Calling next: {next(chained_iterables)}")

TYPE: <class 'itertools.chain'>
Calling next: 1
Calling next: 2
Calling next: 3
Calling next: a


In [3]:
for i in chain(numbers, characters):
    print(i, end=' ')

1 2 3 a b c 

## chain.from_iterable

- What if we want to combine two iterables that are not known in advance
- Or we want to evaluate this iterables __lazily__

__Use case:__

- Create a lazy chain or unknown iterable chain.

In [5]:
""" 

let's say we have a business logic and that produces the iterables that we do not know in advance
below genrator creates two lists 
one with odd numbers less than 50 and 
another with even numbers greater or equal to 50

"""

def make_iterables_to_chain():
    
    num_odd_lt_50 = []
    num_even_gt_50 = []
    
    for num in range(100):
        if num%2 and num < 50:
            num_odd_lt_50.append(num)
        elif num >= 50 and not num%2:
            num_even_gt_50.append(num)
    
    yield num_odd_lt_50
    
    yield num_even_gt_50

In [6]:
# what does chain.from_iterable return

chained_iterables = chain.from_iterable(make_iterables_to_chain())

print(f"TYPE: {type(chained_iterables)}")

print(f"Calling next: {next(chained_iterables)}")
print(f"Calling next: {next(chained_iterables)}")
print(f"Calling next: {next(chained_iterables)}")
print(f"Calling next: {next(chained_iterables)}")

TYPE: <class 'itertools.chain'>
Calling next: 1
Calling next: 3
Calling next: 5
Calling next: 7


In [7]:
for i in chain.from_iterable(make_iterables_to_chain()):
    print(i, end=' ')

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 

## zip()

- Returns an iterator that combines the elements of several iterators into tuples.
- zip is __infact a built in function and not part of itertools__ but let's have a look at it.
- Too useful when you want to loop over two iterables simultaneously.
- __zip() stops when the shortest input iterable is exhausted.__

__Use case:__

- To iterate over more than one iterables at same time with same size.

In [10]:
numbers = [1, 2, 3]
characters = ['a', 'b', 'c', 'd']

zipped_iterables = zip(numbers, characters)

print(f"TYPE: {type(zipped_iterables)}")

print(f"Calling next: {next(zipped_iterables)}")
print(f"Calling next: {next(zipped_iterables)}")
print(f"Calling next: {next(zipped_iterables)}")

TYPE: <class 'zip'>
Calling next: (1, 'a')
Calling next: (2, 'b')
Calling next: (3, 'c')


In [11]:
for num, char in zip(numbers, characters):
    print(f"num: {num}, char: {char}")

num: 1, char: a
num: 2, char: b
num: 3, char: c


## zip_longest()

- When we want to zip iterables that are of different sizes.
- It uses __None__ as default to fill value of shorter iterable when it is exhausted.
- However, we can specify another default if we wish.

__Use case:__

- To iterate over more than one different sized iterables.

In [12]:
numbers = [1, 2, 3]
characters = ['a', 'b', 'c', 'd']

zipped_iterables = zip_longest(numbers, characters)

print(f"TYPE: {type(zipped_iterables)}")

print(f"Calling next: {next(zipped_iterables)}")
print(f"Calling next: {next(zipped_iterables)}")
print(f"Calling next: {next(zipped_iterables)}")
print(f"Calling next: {next(zipped_iterables)}")

TYPE: <class 'itertools.zip_longest'>
Calling next: (1, 'a')
Calling next: (2, 'b')
Calling next: (3, 'c')
Calling next: (None, 'd')


In [None]:
for num, char in zip_longest(numbers, characters, fillvalue=0):
    print(f"num: {num}, char: {char}")

## islice()

- Returns an iterator which returns selected items from the input iterator, by index.
- __first__ argument is __iterable__,
- __second__ argument is __last index__ if no other arguments are given,
- if __three arguments__ are given then __second is starting__ index and __third is ending__ index,
- if __four arguments__ are given then __second is starting__ index, __third is ending__ index and __fourth is steps__ to take.

__Use case:__

- Slicing any iterable you need.

In [14]:
a_dict = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6, 'h': 7, 'i': 8, 'j': 9}


print("First 5 items of the dict:\n")

sliced_dict = islice(a_dict, 5)

print(f"TYPE: {type(sliced_dict)}")

print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")

print("\nFrom 5th index to 9th index:\n")

sliced_dict = islice(a_dict, 5, 10)

print(f"TYPE: {type(sliced_dict)}")

print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")

print("\nTaking 2 steps at a time:\n")

sliced_dict = islice(a_dict, 0, 11, 2)

print(f"TYPE: {type(sliced_dict)}")

print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")
print(f"Calling next: {next(sliced_dict)}")

First 5 items of the dict:

TYPE: <class 'itertools.islice'>
Calling next: a
Calling next: b
Calling next: c
Calling next: d

From 5th index to 9th index:

TYPE: <class 'itertools.islice'>
Calling next: f
Calling next: g
Calling next: h
Calling next: i
Calling next: j

Taking 2 steps at a time:

TYPE: <class 'itertools.islice'>
Calling next: a
Calling next: c
Calling next: e
Calling next: g


In [15]:
print("First 5 items of the dict:\n")

sliced_dict = islice(a_dict, 5)

print(f"TYPE: {type(sliced_dict)}")

for item in sliced_dict:
    print(item, end=' ')

print("\n\nFrom 5th index to 9th index:\n")

sliced_dict = islice(a_dict, 5, 10)

print(f"TYPE: {type(sliced_dict)}")

for item in sliced_dict:
    print(item, end=' ')

print("\n\nTaking 2 steps at a time:\n")

sliced_dict = islice(a_dict, 0, 11, 2)

print(f"TYPE: {type(sliced_dict)}")

for item in sliced_dict:
    print(item, end=' ')

First 5 items of the dict:

TYPE: <class 'itertools.islice'>
a b c d e 

From 5th index to 9th index:

TYPE: <class 'itertools.islice'>
f g h i j 

Taking 2 steps at a time:

TYPE: <class 'itertools.islice'>
a c e g i 

## tee()

- returns several __independent__ iterators __(defaults to 2)__ based on a single original input.

__Use case:__

- The iterators returned by tee() can be used to feed the same set of data into multiple algorithms to be processed in parallel.

In [16]:
a_string = "Python"

tee1, tee2, tee3 = tee(a_string, 3)
print(f"t1{tee1}, t2{tee2} and t3{tee3}")


print(f"TYPE: {type(a_string)}")

print(f"Calling next: {next(tee1)}")
print(f"Calling next: {next(tee1)}")
print(f"Calling next: {next(tee1)}")
print(f"Calling next: {next(tee1)}")

print(f"Calling next: {next(tee2)}")
print(f"Calling next: {next(tee2)}")
print(f"Calling next: {next(tee2)}")
print(f"Calling next: {next(tee2)}")

print("\nUsing the same tee1 iterator in loop now: ", end='')
for char in tee1:
    print(char, end=' ')
    
print("\nUsing the same tee2 iterator in loop now: ", end='')
for char in tee2:
    print(char, end=' ')
    
print("\nUsing the tee3 iterator in loop now: ", end='')
for char in tee3:
    print(char, end=' ')

TYPE: <class 'str'>
Calling next: P
Calling next: y
Calling next: t
Calling next: h
Calling next: P
Calling next: y
Calling next: t
Calling next: h

Using the same tee1 iterator in loop now: o n 
Using the same tee2 iterator in loop now: o n 
Using the tee3 iterator in loop now: P y t h o n 

# Converting Inputs

## map()

- Returns an iterator that calls a function on the values in the input iterators.
- and returns the results
- It stops when any input iterator is exhausted.

__Use case:__

- Doing some operation on all the elements of an iterable.

In [None]:
def times_two(x):
    return 2 * x


def multiply(x, y):
    return (x, y, x * y)

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
numbers_reverse = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

doubles = map(times_two, numbers)

print(f"TYPE: {type(doubles)}")

print(f"Calling next: {next(doubles)}")
print(f"Calling next: {next(doubles)}")
print(f"Calling next: {next(doubles)}")
print(f"Calling next: {next(doubles)}")

multiplied_iterables = map(multiply, numbers, numbers_reverse)

print(f"\nTYPE: {type(multiplied_iterables)}")

for item in multiplied_iterables:
    print(f"{item[0]} * {item[1]} = {item[2]}")

## starmap()

- The functionality is same as map.
- But instead of constructing a tuple from multiple iterators, it splits up the items in a single iterator as arguments to the mapping function using the * syntax.

- Read more __[here](https://docs.python.org/2/library/itertools.html#itertools.starmap)__

# Producing New Values

## count()

- Returns an iterator that produces consecutive integers, indefinitely.
- The first number can be passed as an argument (the default is zero).
- There is __no upper bound__ argument.

__Use case:__

- To get index with some unindexed iterables.
- Count items in an iterable.

In [4]:
counter = count()

print(f"TYPE: {type(counter)}")

print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")

print("\nStart with 5 instead of 0\n")

counter = count(5)
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")

print("\nStart with 5 instead of 0 and take 5 steps\n")

counter = count(5, 5)  # steps can be any numeric value including decimals and negatives.
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")
print(f"Calling next: {next(counter)}")


print("\nIterating over a counter\n")

counter = count()

for item in counter:
    if item > 100:
        break  # if we do not break it keeps running until ram is overflowed.
    print(item, end=' ')

print('\n')

TYPE: <class 'itertools.count'>
Calling next: 0
Calling next: 1
Calling next: 2
Calling next: 3

Start with 5 instead of 0

Calling next: 5
Calling next: 6
Calling next: 7
Calling next: 8
Calling next: 9

Start with 5 instead of 0 and take 5 steps

Calling next: 5
Calling next: 10
Calling next: 15
Calling next: 20
Calling next: 25

Iterating over a counter

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 



## cycle()

- Returns an iterator that repeats the contents of the arguments it is given indefinitely.
- Since it has to remember the entire contents of the input iterator, __it may consume quite a bit of memory if the iterator is long.__

In [6]:
for i in zip(range(7), cycle(['a', 'b', 'c'])):
    print(i)

(0, 'a')
(1, 'b')
(2, 'c')
(3, 'a')
(4, 'b')
(5, 'c')
(6, 'a')


## repeat()

- The iterator returned by repeat() keeps returning data __forever.__
- unless the optional __times argument__ is provided to limit it.

In [None]:
for i in repeat([1,2,3], 5):

In [10]:
for i in repeat([1,2,3], 5):
    print(i)
    
for i in map(lambda x, y: (x, y, x * y), repeat(2), range(5)):
    print('{:d} * {:d} = {:d}'.format(i[0], i[1], i[2]))

[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
2 * 0 = 0
2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8


# Filtering

## dropwhile()

- Returns an iterator that produces elements of the input iterator after a condition becomes false for the first time.
- In simple words skip the elements until condition becomes false and take everything after that.
- After the condition is false the first time, all of the remaining items in the input are returned.

In [11]:
def should_drop(x):
    return x < 1


for i in dropwhile(should_drop, [-1, 0, 1, 2, -2]):
    print(i, end=' ')

1 2 -2 

## takewhile()

- The opposite of dropwhile().
- It returns an iterator that returns items from the input iterator as long as the test function returns true.
- As soon as should_take() returns False, takewhile() stops processing the input.

In [12]:
def should_drop(x):
    return x < 1


for i in takewhile(should_drop, [-1, 0, 1, 2, -2]):
    print(i, end=' ')

-1 0 

## filter()

- Returns an iterator that includes only items for which the test function returns true.
- filter() is different from dropwhile() and takewhile() in that every item is tested before it is returned.

In [13]:
def check_item(x):
    return x < 1


for i in filter(check_item, [-1, 0, 1, 2, -2]):
    print(i, end=' ')

-1 0 -2 

## filterfalse()

- Returns an iterator that includes only items where the test function returns false.
- Opposite of filter

In [14]:
def check_item(x):
    return x < 1


for i in filterfalse(check_item, [-1, 0, 1, 2, -2]):
    print(i, end=' ')

1 2 

## compress()

- Offers another way to filter the contents of an iterable.
- Instead of calling a function, it uses the values in another iterable to indicate when to accept a value and when to ignore it.

In [18]:
every_third = cycle([False, False, True])
data = range(1, 10)

for i in compress(data, every_third):
    print(i, end=' ')
print()

3 6 9 


# Grouping Data

## groupby()

- Returns an iterator that produces sets of values organized by a common key.
- The input sequence __needs to be sorted on the key value__ in order for the groupings to work out as expected.

In [19]:
def key_to_use(item_tuple):
    return item_tuple[0]

things = [("animal", "bear"), ("animal", "duck"), 
          ("plant", "cactus"), ("vehicle", "speed boat"), 
          ("vehicle", "school bus")]

for key, group in groupby(things, key_to_use):
    print(f"{key}:")
    for thing in group:
        print(f"{thing[1]}")
    print()

animal:
bear
duck

plant:
cactus

vehicle:
speed boat
school bus



# Combining Inputs

## accumulate()

- Takes nth and nth + 1 elements of input and produces output based on given function.
- Default function is add.
- When used with a sequence of non-integer values, the results depend on what it means to “add” two items together.

In [22]:
numbers = [0, 1, 2, 3, 4, 5, 6, 7]
characters = ['a', 'b', 'c', 'd', 'e']

for num in accumulate(numbers):
    print(num, end=' ')
    
print()
    
for string in accumulate(characters):
    print(string, end=' ')

0 1 3 6 10 15 21 28 
a ab abc abcd abcde 

## product()

- produces a single iterable whose values are the Cartesian product of the set of input values.
- To compute the product of a sequence with itself, specify how many times the input should be repeated.

In [23]:
numbers = [0, 1, 2, 3, 4, 5, 6, 7]
characters = ['a', 'b', 'c', 'd', 'e']

for num in product(numbers, characters):
    print(num, end=' ')

print("\n\nproduct with self\n")    

for num in product(numbers, repeat=2):
    print(num, end=' ')

(0, 'a') (0, 'b') (0, 'c') (0, 'd') (0, 'e') (1, 'a') (1, 'b') (1, 'c') (1, 'd') (1, 'e') (2, 'a') (2, 'b') (2, 'c') (2, 'd') (2, 'e') (3, 'a') (3, 'b') (3, 'c') (3, 'd') (3, 'e') (4, 'a') (4, 'b') (4, 'c') (4, 'd') (4, 'e') (5, 'a') (5, 'b') (5, 'c') (5, 'd') (5, 'e') (6, 'a') (6, 'b') (6, 'c') (6, 'd') (6, 'e') (7, 'a') (7, 'b') (7, 'c') (7, 'd') (7, 'e') 

product with self

(0, 0) (0, 1) (0, 2) (0, 3) (0, 4) (0, 5) (0, 6) (0, 7) (1, 0) (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (1, 7) (2, 0) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (2, 7) (3, 0) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (3, 7) (4, 0) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (4, 7) (5, 0) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (5, 7) (6, 0) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) (6, 7) (7, 0) (7, 1) (7, 2) (7, 3) (7, 4) (7, 5) (7, 6) (7, 7) 

## permutations()

- Same as mathematical permutation.
- Produces items from the input iterable combined in the possible permutations of the given length.
- It defaults to producing the full set of all permutations.
- Order of item matters i.e. __('a', 'b') and ('b', 'a') are different__

In [27]:
print('All permutations:\n')
print(list(permutations('abc')))

print("\nLimit length to 2:\n")
print(list(permutations('abc', r=2)))

All permutations:

[('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'), ('b', 'c', 'a'), ('c', 'a', 'b'), ('c', 'b', 'a')]

Limit length to 2:

[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]


## combinations()

- As long as the members of the input are unique, the output will not include any repeated values.
- Unlike with permutations, the r argument to combinations() is required.
- order does not matter i.e. __('a', 'b') and ('b', 'a') are same__

In [29]:
print('Unique pairs:\n')
print(list(combinations('abc', r=2)))

Unique pairs:

[('a', 'b'), ('a', 'c'), ('b', 'c')]


## combinations_with_replacement()

- Sometimes it is useful to consider combinations that we include repeated elements.
- For those cases, use combinations_with_replacement().

In [31]:
print('Unique pairs:\n')
print(list(combinations_with_replacement('abc', r=2)))

Unique pairs:

[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]
