# Day 3: Useful stuff

## Itertools and Functools



## Itertools

On day 1, we talked about iterables and iterators. As a reminder:
- An iterable is an object that can be iterated over (e.g. a list) and that has an `__iter__` method. 
- An iterator is an object that can be iterated over (e.g. a list) and that has an `__iter__` and a `__next__` method.

A bit more information about them from the [Python documentation](https://docs.python.org/3/glossary.html#term-iterable):

- **iterable**
    An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements sequence semantics.

    Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.

- **iterator**
    An object representing a stream of data. Repeated calls to the iterator’s __next__() method (or passing it to the built-in function next()) return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its __next__() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.

Itertools is a module that provides a set of functions that operate on iterables. It is part of the standard library, so you don't need to install it. It creates more complex and powerful iterators from simpler ones. This makes it possible to write very efficient code that does not need to create intermediate lists, so our code is faster and uses less memory.

Why would you want to do that? Well, think about some of the functions like `enumerate()` or `zip()` that we have already seen. They take iterables/iterators as an argument and return more powerful constructs. Itertools provides a lot of other functions that do similar things, but in a more general way.

First let's look at a short glossary of useful functions in itertools (from [here](https://jmduke.com/2013/11/29/itertools)) and then we'll look at how we can use them in the wild.

In [15]:
import itertools

letters = ['a', 'b', 'c', 'd', 'e', 'f']
booleans = [1, 0, 1, 0, 0, 1]
numbers = [23, 20, 44, 32, 7, 12]
decimals = [0.1, 0.7, 0.4, 0.4, 0.5]

In [16]:
#chain() - combines several iterables into one long one
print(itertools.chain(letters, booleans, decimals))
print(list(itertools.chain(letters, booleans, decimals)))

#chain.from_iterable() - flattens one iterable of iterables
print(list(itertools.chain.from_iterable([letters, booleans, decimals])))

<itertools.chain object at 0x107ca6e20>
['a', 'b', 'c', 'd', 'e', 'f', 1, 0, 1, 0, 0, 1, 0.1, 0.7, 0.4, 0.4, 0.5]
['a', 'b', 'c', 'd', 'e', 'f', 1, 0, 1, 0, 0, 1, 0.1, 0.7, 0.4, 0.4, 0.5]


In [17]:
#count() - infinite iterator, returns evenly spaced values starting with the number you pass in
counter = itertools.count(10, 0.1)
print(next(counter))
print(next(counter))

10
10.1


In [22]:
#repeat() - infinite iterator, returns the element you pass in over and over again (can be made finite with times argument)
repeater = itertools.repeat('On', 3)
print(next(repeater))
print(next(repeater))
print(next(repeater))
# print(next(repeater))

On
On
On


In [None]:
#cycle() - infinite iterator, returns elements from the iterable you pass in over and over again
cycle_counter = itertools.cycle(['On', 'Off'])
print(next(cycle_counter))
print(next(cycle_counter))
print(next(cycle_counter))

In [18]:
#compress() - filters one iterable with another
print(list(itertools.compress(letters, booleans)))


['a', 'c', 'f']


In [7]:
#dropwhile() - returns an iterator that returns elements of the iterable after a certain condition becomes false for the first time
print(list(itertools.dropwhile(lambda x: x<5, [1, 4, 6, 4, 1])))
print(list(itertools.dropwhile(lambda x: x<5, [1, 4, 0, 0, 1])))


[6, 4, 1]
[]


In [8]:
#filterfalse() - returns an iterator that returns elements of the iterable for which the passed in function returns false
print(list(itertools.filterfalse(lambda x: x%2, range(10))))

[0, 2, 4, 6, 8]


In [9]:
#zip_longest() - returns an iterator that aggregates elements from two or more iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted
print(list(itertools.zip_longest('abcdefg', range(3), fillvalue='?')))

[('a', 0), ('b', 1), ('c', 2), ('d', '?'), ('e', '?'), ('f', '?'), ('g', '?')]


In [12]:
#isslice() - returns an iterator that returns selected elements from the iterable. It takes three arguments: start, stop, and step
print(list(itertools.islice('abcdefg', 0, None, 2)))

['a', 'c', 'e', 'g']


In [23]:
#starmap() - returns an iterator that computes the function using arguments obtained from the iterable
print(list(itertools.starmap(pow, [(2,5), (3,2), (10,3)])))

[32, 9, 1000]


In [29]:
#tee() - returns several independent iterators (defaults to 2) based on a single original input
it = itertools.tee(range(5), 3)
print(it)
print(list(it[0]))
print(list(it[1]))
print(list(it[2]))

(<itertools._tee object at 0x10d08b1c0>, <itertools._tee object at 0x10d087940>, <itertools._tee object at 0x107b9b9c0>)
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


It is also interesting to note that some methods that were originally in itertools have been moved to the built-in functions in Python 3. For example, `zip()`, `map()` and `filter()` are now built-in functions, but were originally in the itertools library as `izip()`, `imap()` and `ifilter()` (the standard library functions back then returned lists (iterables), not iterators, making them slower and less memory efficient). You can find more information about this [here](https://docs.python.org/3/library/itertools.html#itertools-recipes).

Now let's look at some [recipes](https://docs.python.org/3/library/itertools.html#itertools-recipes) from the itertools documentation. All of these and more are implemented in the `more_itertools` library, which you can [install](https://more-itertools.readthedocs.io/en/latest/index.html) with `pip install more_itertools`.

In [14]:
def batched(iterable, n):
    "Batch data into tuples of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while batch := tuple(itertools.islice(it, n)):
        yield batch

print(list(batched('ABCDEFGHIJKLMNOP', 3)))

[('A', 'B', 'C'), ('D', 'E', 'F'), ('G', 'H', 'I'), ('J', 'K', 'L'), ('M', 'N', 'O'), ('P',)]


In [31]:
def repeatfunc(func, times=None, *args):
    """Repeat calls to func with specified arguments.

    Example:  repeatfunc(random.random)
    """
    if times is None:
        return itertools.starmap(func, itertools.repeat(args))
    return itertools.starmap(func, itertools.repeat(args, times))

print(list(repeatfunc(pow, 5, 2, 5)))

[32, 32, 32, 32, 32]


In [26]:
import collections

def sliding_window(iterable, n):
    # sliding_window('ABCDEFG', 4) --> ABCD BCDE CDEF DEFG
    it = iter(iterable)
    window = collections.deque(itertools.islice(it, n), maxlen=n)
    if len(window) == n:
        yield tuple(window)
    for x in it:
        window.append(x)
        yield tuple(window)

print(list(sliding_window('ABCDEFG', 4)))

[('A', 'B', 'C', 'D'), ('B', 'C', 'D', 'E'), ('C', 'D', 'E', 'F'), ('D', 'E', 'F', 'G')]


In [32]:
def partition(pred, iterable):
    "Use a predicate to partition entries into false entries and true entries"
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = itertools.tee(iterable)
    return itertools.filterfalse(pred, t1), filter(pred, t2)

result = partition(lambda x: x%2, range(10))
print(result)
print(list(result[0]))
print(list(result[1]))

(<itertools.filterfalse object at 0x107da8f40>, <filter object at 0x107da8c40>)
[0, 2, 4, 6, 8]
[1, 3, 5, 7, 9]


In [34]:
set([1,2,4,1,2,3])

{1, 2, 3, 4}

## Functools



## Context Managers

In [6]:
data = open("../data/assay_data.csv", 'r')
print(data)
print(data.readline())
print(data.readline())

#read data into a list
data_list = []
for line in data:
    data_list.append(line.strip().split(','))
print(data_list[0:5])

data.close()

<_io.TextIOWrapper name='../data/assay_data.csv' mode='r' encoding='UTF-8'>
time,type,activity

1,WT,20

[['1', 'Mut1', '10'], ['1', 'Mut2', '30'], ['2', 'WT', '30'], ['2', 'Mut1', '20'], ['2', 'Mut2', '10']]


In [7]:
with open("../data/assay_data.csv", 'r') as data:
    data_list = []
    for line in data:
        data_list.append(line.strip().split(','))
print(data_list[0:5])


[['time', 'type', 'activity'], ['1', 'WT', '20'], ['1', 'Mut1', '10'], ['1', 'Mut2', '30'], ['2', 'WT', '30']]


In [8]:
data = open("../data/assay_data.csv", 'r')
try:
    data_list = []
    for line in data:
        data_list.append(line.strip().split(','))
finally:
    data.close()

Internally, context managers use the `__enter__` and `__exit__` methods. The `__enter__` method is called when the context is entered and the `__exit__` method is called when the context is exited. The `__exit__` method is called even if an exception is raised in the context. This is useful for cleaning up resources even if an error occurs.

In [9]:
#write a context manager that implements a timer that can be used with the with statement
import time

class Timer:
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.interval = self.end - self.start
        print(f"Elapsed time: {self.interval} seconds")

In [36]:
with Timer() as t:
    #perform 1 million calculations
    for i in range(1000000):
        i = i + 1
    with Timer() as t2:
        l = []
        for i in range(100000):
            l.append(i)
        for i in range(100000):
            l.pop()
    with Timer() as t3:
        l = []
        for i in range(100000):
            l.insert(0, i)
        for i in range(100000):
            l.pop(0)


Elapsed time: 0.013677120208740234 seconds
Elapsed time: 4.208178997039795 seconds
Elapsed time: 4.2911059856414795 seconds


## Args and Kwargs

## Type Hints

## Building a project: Namespaces, Packages etc

## Data Structures in Python

### Dictionaries

In [24]:
#create a dictionary
d = {'a': 1, 'b': 2, 'c': 3}
x = 5
print(x.__hash__())
print(hash("a"))
# print(d.__hash__)
# print(hash(d))

5
2258155850630925145


In [30]:
#ordered dictionary; maintains order of keys as they are added
from collections import OrderedDict
d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
print(d)

#default dictionary; returns a default value if the key is not found
from collections import defaultdict
d = defaultdict(lambda: 0)
d['a'] = 1
d['b'] = 2
print(d['c'])

#ChainMap - combines several dictionaries or mappings (updates are made to the first dictionary)
from collections import ChainMap
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'd': 4}
d3 = {'e': 5, 'f': 6}
d = ChainMap(d1, d2, d3)
print(d['a'])
print(d['c'])
d['e'] = 10
print(d)

OrderedDict([('a', 1), ('b', 2), ('c', 3)])
0
1
3
ChainMap({'a': 1, 'b': 2, 'e': 10}, {'c': 3, 'd': 4}, {'e': 5, 'f': 6})
