# Advanced Python Course 
## PreDoc Course EBI 2019
### by Christian Fufezan 

christian@fufezan.net

https://fufezan.net


<img src="https://octodex.github.com/images/mummytocat.gif" width="200" height="200" style="float: right;"/>

# Useful modules that are part of standard Python 

# Pretty print

In [4]:
from pprint import pprint as pprint

In [31]:
room_quality = {city : {room_type : {measurement :  [x for x in np.random.normal(
    loc=np.random.random(), scale=1, size=3)] for measurement in ['air', 'light', 'smell']}
        for room_type in ['Kitchen', 'Bedroom', 'Toilet'] if city + room_type != "HeidelbergBedroom"
    } for city in ['Heidelberg', 'Paris', 'New York']}



<img src="./imgs/sure_thing.jpg" width="400" height="400" style="float: right;"/>

In [30]:
import numpy as np
room_quality = {
    city : {
        room_type : {
            measurement :  [x for x in np.random.normal(loc=np.random.random(), scale=1, size=3)]
            for measurement in ['air', 'light', 'smell'] 
        }
        for room_type in ['Kitchen', 'Bedroom', 'Toilet'] if city + room_type != "HeidelbergBedroom"
    }
    for city in ['Heidelberg', 'Paris', 'New York']
}


In [None]:
room_quality

In [None]:
from pprint import pprint as pprint
pprint(room_quality)

# The csv module

There are several ways to interact with files that contain data in a "comma separated value" format.

We cover the [basic csv module](https://docs.python.org/3/library/csv.html), as it is sometimes really helpful to retain only a fraction of the information of a csv to avoid memory overflow. If you use pandas to read you xGB csv file, then yes everything is put into a data frame to start with just to have df.drop(columns[ ... ]) applied.

In [None]:
import csv

with open("../data/amino_acid_properties.csv") as aap:
    aap_reader = csv.DictReader(aap, delimiter=",") 
    for line_dict in aap_reader:
        print(line_dict)
        break

We can also use the csv module to write csvs, or tab separated value files if we change the delimiter to "\t"

In [None]:
with open("../data/test.csv", "w") as output:
    aap_writer = csv.DictWriter(output, fieldnames=["Name", "3-letter code"])
    aap_writer.writeheader()
    aap_writer.writerow({"Name": "Alanine", "3-letter code": "Ala", "1-letter code": "A"})

# What do you expect to happen ?

In [None]:
!cat ../data/test.csv

How to fix it ?

In [None]:
# fix it
with open("c", "w") as output:
    aap_writer = csv.DictWriter(output, fieldnames=["Name", "3-letter code"], extrasaction='ignore')
    aap_writer.writeheader()
    aap_writer.writerow({"Name": "Alanine", "3-letter code": "Ala", "1-letter code": "A"})

# Collections - high performance containers ... sorta

# [collections.Counter](https://docs.python.org/3.7/library/collections.html#counter-objects)
A counter tool is provided to support convenient and rapid tallies. For example

In [None]:
from collections import Counter
s = """
MQRLMMLLATSGACLGLLAVAAVAAAGANPAQRDTHSLLPTHRRQKRDWIWNQMHIDEEK
NTSLPHHVGKIKSSVSRKNAKYLLKGEYVGKVFRVDAETGDVFAIERLDRENISEYHLTA
VIVDKDTGENLETPSSFTIKVHDVNDNWPVFTHRLFNASVPESSAVGTSVISVTAVDADD
PTVGDHASVMYQILKGKEYFAIDNSGRIITITKSLDREKQARYEIVVEARDAQGLRGDSG
TATVLVTLQDINDNFPFFTQTKYTFVVPEDTRVGTSVGSLFVEDPDEPQNRMTKYSILRG
DYQDAFTIETNPAHNEGIIKPMKPLDYEYIQQYSFIVEATDPTIDL RYMSPPAGNRAQVI
"""
Counter(s)

In [None]:
# Counter objects can be added together
Counter("AABB") + Counter("BBCC")

In [None]:
# Works with any type of object that are comparable
Counter([(1, 1), (1, 2), (2, 1), (1, 1)])

# [collections.deque](https://docs.python.org/3.7/library/collections.html#deque-objects)
Deque \[deck\] or double-ended queue can be used for many tasks, e.g. building a sliding window

In [None]:
from collections import deque
s = """MQRLMMLLATSGACLGLLAVAAVAAAGANPAQRDTHSLLPTHRRQKRDWIWNQMHIDEEKNTSLPHHVGKIKSSVSRKNAKYLLKGEYVGKVFRVDAETGDVFAIERLDRENISEYHLTA"""
window = deque([], maxlen=5)

In [None]:
for pos, aa in enumerate(s):
    window.append(aa)
    print(window)
    if pos > 7:
        break

In [None]:
Counter(window)

# [collections.defaultdicts](https://docs.python.org/3.7/library/collections.html#defaultdict-objects)
Defaultdicts are like dicts yet they treat missing values not with an error, thus testing if key exists is not neccessary and makes life easier :) Ofcourse, one needs to define the default value that is taken if a key is not existent. 

I use it a lot for counting 
```python
counter["error"] += 1
```
or collecting elements in lists
```python
sorter["type_A"].append({"name": "John"})
```

In [32]:
from collections import defaultdict

ddict_int = defaultdict(int)
#                        ^---- default factory
ddict_list = defaultdict(list)

In [None]:
ddict_int[10] += 10
ddict_int

In [None]:
ddict_int[0]

In [35]:
def default_factory_with_prefilled_dictionary():
    return {"__name": "our custom dict", "errors": 0}
ddict_custom = defaultdict(default_factory_with_prefilled_dictionary)


In [37]:
ddict_custom[10] += 10

In [None]:
ddict_custom[10]['errors'] += 10

In [None]:
ddict_custom[10]

# bisect module

Bisect module allows to find positions in a **sorted list** into which a given element can be inserted without loosing its sorting.

Essential element in binary tree searches and similar techniques.

In [39]:
import bisect
a = [1, 3, 6, 12, 14, 16]
bisect.bisect(a, 4)

2

In [40]:
bisect.bisect_right(a, 3)

2

In [41]:
bisect.bisect_left(a, 3)

1

In [42]:
# also works with more complex lists, as log the elements are comparable
a = [(1, "First"), (3, "Third"), (6, "too late")]
bisect.bisect(a, (3, "Really add some more stuff"))

1

why 1 and not 2 ?

# Iteration helpers

How would you code a loop that generates out of an iterable \['A', 'B', 'C' \]
the following 
* AB, AC, BC
* AA, AB, AC, BA, BB, BC, CA, CB, CC



In [None]:
a = ['A', 'B', 'C']
for ...

# [itertools](https://docs.python.org/3.7/library/itertools.html)

Python iterator helpers for efficient and **readable** looping.

In [None]:
from itertools import product

list(product(a, repeat=2))

In [None]:
from itertools import combinations

list(combinations(a, 2))