## Advanced Python Course 
## MoBi - University Heidelberg 2019
### by Christian Fufezan 

christian@fufezan.net

https://fufezan.net


<img src="https://octodex.github.com/images/Professortocat_v2.png" width="100" height="100" style="float: right;"/>

# Mission statement

I'd like to teach you 
* What I think is advanced Python
* Give you my view on how to code (and how not to code)
* Introduce you to 
    * code structure
    * code testing
    * code documentation
* Show case some useful python modules (that you might know of or not)
* \[ What you want to learn \]


## Project show cases

### pymzML

pymzML is an extension to Python that offers
* easy access to mass spectrometry (MS) data that allows the rapid development of tools
* a very fast parser for mzML data, the standard mass spectrometry data format
* a set of functions to compare and/or handle spectra
* random access in compressed files
* interactive data visualization

M Kösters, J Leufken, S Schulze, K Sugimoto, J Klein, R P Zahedi, M Hippler, S A Leidel, C Fufezan; pymzML v2.0: introducing a highly compressed and seekable gzip format, Bioinformatics, doi: https://doi.org/10.1093/bioinformatics/bty046

T Bald, J Barth, A Niehues, M Specht, M Hippler, C Fufezan; pymzML - Python module for high-throughput bioinformatics on mass spectrometry data, Bioinformatics, doi: https://doi.org/10.1093/bioinformatics/bts066 

|![](https://travis-ci.org/pymzml/pymzML.svg?branch=master)|![](https://ci.appveyor.com/api/projects/status/4nlw52a9qn22921d?svg=true)|![](https://readthedocs.org/projects/pymzml/badge/?version=latest)|![](https://codecov.io/gh/pymzml/pymzml/branch/master/graph/badge.svg)|![](https://img.shields.io/pypi/v/pymzML.svg)|![](https://pepy.tech/badge/pymzml)|![](https://img.shields.io/badge/code%20style-black-000000.svg)|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|[travis](https://travis-ci.org/pymzml/pymzML)|[appveyor](https://ci.appveyor.com/project/fufezan-lab/pymzml)|[rtd](http://pymzml.readthedocs.io/en/latest/?badge=latest)|[codecov](https://codecov.io/gh/pymzml/pymzml)|[pypi](https://pypi.org/project/pymzML/)|[pepy](https://pepy.tech/project/pymzml)|[black](https://github.com/psf/black)|


### pyQms

pyQms is an extension to Python that offers amongst other things
* fast and accurate quantification of all high-res LC-MS data
* full labeling and modification flexibility
* full platform independence


Leufken, J., Niehues, A., Hippler, M., Sarin, L. P., Hippler, M., Leidel, S. A., and Fufezan, C. (2017) pyQms enables universal and accurate quantification of mass spectrometry data. MCP, https://doi.org/10.1074/mcp.M117.068007

|![](https://travis-ci.org/pyQms/pyqms.svg?branch=master)|![](https://ci.appveyor.com/api/projects/status/n4x2ug7h3ce4d49y?svg=true)|![](https://readthedocs.org/projects/pyqms/badge/?version=latest)|![](https://img.shields.io/pypi/v/pyqms.svg)|![](https://img.shields.io/badge/code%20style-black-000000.svg)|
|:---:|:---:|:---:|:---:|:---:|
|[travis](https://travis-ci.org/pyQms/pyqms)|[appveyor](https://ci.appveyor.com/project/fufezan-lab/pyqms)|[rtd](http://pyqms.readthedocs.io/en/latest/?badge=latest)|[pypi](https://pypi.org/project/pyqms/)|[black](https://github.com/psf/black)|


### ursgal

ursgal - Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis
* Peptide spectrum matching with up to eight different search engines (some available in multiple versions), including three open modification search engines
* Evaluation and post processing of search results with up to two different engines
* Integration of search results from different search engines
* De novo sequencing with up to two different search engines
* Miscellaneous tools including the creation of a target decoy database as well as filtering, sanitizing and visualizing of results


Kremer, L. P. M., Leufken, J., Oyunchimeg, P., Schulze, S. and Fufezan, C.
(2015) Ursgal, Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis, Journal of Proteome research, 15, 788-.
DOI:10.1021/acs.jproteome.5b00860*

|![](https://travis-ci.org/ursgal/ursgal.svg?branch=master)|![](https://ci.appveyor.com/api/projects/status/hel9rowah1u3rfe1?svg=true)|![](http://readthedocs.org/projects/ursgal/badge/?version=latest)|![](https://badge.fury.io/py/ursgal.svg)|
|:---:|:---:|:---:|:---:|
|[travis](https://travis-ci.org/ursgal/ursgal)|[appveyor](https://ci.appveyor.com/project/fufezan-lab/ursgal)|[rtd](http://ursgal.readthedocs.io/en/latest/?badge=latest)|[pypi](https://badge.fury.io/py/ursgal/)|


## a lasting name 
http://proteomicsnews.blogspot.com/2017/02/ursgal-combine-all-python-proteomics.html

<img style="right" src="./imgs/proteomics-blog.png">

In [1]:
# %load topics.py
import pandas as pd
import psutil

pd.set_option("display.max_colwidth" , 300)

df_high_level = pd.DataFrame(
    data=[
        {'day': 'Monday', 'Topic': 'Check-In, recaps and functions'},
        {'day': 'Tuesday', 'Topic': 'Coding philosophy, data flow and some more useful std modules'},
        {'day': 'Wednesday', 'Topic': 'Test driven development, python module, sphinx'},
        {'day': 'Thursday', 'Topic': 'OOP - Object oriented programming'},
        {'day': 'Friday', 'Topic': 'Q&A and code clean up'},
        {'day': '', 'Topic': ''},
        {'day': 'Monday', 'Topic': ''},
        {'day': 'Tuesday', 'Topic': ''},
        {'day': 'Wednesday', 'Topic': ''},
        {'day': 'Thursday', 'Topic': ''},
        {'day': 'Friday', 'Topic': 'Q&A and Tutorium'},


    ]
)

df_details = pd.DataFrame(
    data=[
        {'day': 1, 'Topic': 'Check-in'},
        {'day': 1, 'Topic': 'Procedural stuff'},
        {'day': 1, 'Topic': "python basic in 5'"},
        {'day': 1, 'Topic': 'lists and generators'},
        {'day': 1, 'Topic': 'bisect module'},
        # ----------------------------
        {'day': 2, 'Topic': 'functions'},
        {'day': 2, 'Topic': 'csv module'},
        {'day': 2, 'Topic': 'Exercises'},
        {'day': 2, 'Topic': 'Zen of Python and general coding philosophy'},
        {'day': 2, 'Topic': 'basic plotting with plotly'},
        {'day': 2, 'Topic': "String format"},
        {'day': 2, 'Topic': 'dicts'},
        {'day': 2, 'Topic': 'collections module'},
        {'day': 2, 'Topic': 'itertools'},
        {'day': 2, 'Topic': 'data flow'},
        # -----------------------------
        {'day': 3, 'Topic': "Basic Python package"},
        {'day': 3, 'Topic': "Test Driven development"},
        {'day': 3, 'Topic': "Auto documentation with Sphinx"},
        # -----------------------------
        {'day': 4, 'Topic': "OOP"},
    ]
)


def display_topics(day=1, df=None):
    if df is None:
        df = df_details
    return df[df['day'] == day][['day', 'Topic']].head(20)


In [70]:
display_high_level_topics = df_high_level.head(20)

# Full course overview

In [71]:
display_high_level_topics

Unnamed: 0,day,Topic
0,Monday,"Check-In, recaps and functions"
1,Tuesday,"Coding philosophy, data flow and some more useful std modules"
2,Wednesday,"Test driven development, python module, sphinx"
3,Thursday,OOP - Object oriented programming
4,Friday,Q&A and code clean up
5,,
6,Monday,
7,Tuesday,
8,Wednesday,
9,Thursday,


# Day 1
## Overview

In [72]:
display_topics(day=1)

Unnamed: 0,day,Topic
0,1,Check-in
1,1,Procedural stuff
2,1,python basic in 5'
3,1,lists and generators
4,1,bisect module


# Python basic type
* int
* float
* str
* list
* tuple
* set
* dict

Have been covered else where. Quick recap, what's the outcome of these lines ...

In [1]:
a = 34 + 3.2
type(a)

float

In [3]:
a = "Never odd" + " or even"
a[::-1]

'neve ro ddo reveN'

In [None]:
a[0] = 'n'

In [6]:
a.split()

['Never', 'odd', 'or', 'even']

In [None]:
{ [12,32]: "Well done ... "}

In [None]:
set([12, 13, 14]) & set([13, 14, 15])

## Python comparisons
* \>
* \>=
* ==
* <=
* <
* is

Have been covered elsewhere. Quick recap, what's the outpcome of these lines ...

In [7]:
a = 42
39.9 < a <= 42

True

In [8]:
a = "Nature is lethal but it doesn't hold a candle to man."
b = "Nature is lethal but it doesn't hold a candle to man."
print("a is b?", a is b)
print("a == b", a == b)

a is b? False
a == b True


In [26]:
# thinking that objects eval to the same ensures equality is one major source of errors
def release_password(authentication=False):
    answer = "Not authenticated"
    if authentication:
        answer = "Here is your pasword"
    return answer

In [8]:
release_password(authentication=True)

'Here is your pasword'

In [9]:
release_password(authentication=[1])

'Not authenticated'

In [10]:
release_password(authentication=[])

'Not authenticated'

In [11]:
# thinking that objectes eval to the same ensures equality is one major source of errors
def release_password(authentication=False):
    answer = "Not authenticated"
    if authentication == True:
        answer = "Here is your pasword"
    return answer

In [31]:
print("[] is True?", [] is True)
print("[1] is True?", [1] is True) 

[] is True? False
[1] is True? False


In [1]:
print("[] == True?", [] == True)
print("[1] == True?", [1] == True) 

[] == True? False
[1] == True? False


In [2]:
print("bool([]) == True?", bool([]) == True)
print("bool([1]) == True?", bool([1]) == True) 

bool([]) == True? False
bool([1]) == True? True


## Other operators
### membership operators

In [None]:
'dog' in 'Not sure I went out with the dOg'

In [None]:
'dog' in ('d0g', 'dOg', 'dog', 'dot')

### logical operators

In [None]:
(1 is True) or (1 == True) 

In [33]:
# Does not have to be boolean operations ... though imho less readable
4 or 3

4

## Object references

In [36]:
a = {}
b = a.copy()
print("a = {0}  id:{1}".format(a, id(a)))
print("b = {0}  id:{1}".format(b, id(b)))
print("a is b?", a is b)
print("Adding 1 to a..")
a[12] = 12
print("a = {0}  id:{1}".format(a, id(a)))
print("b = {0}  id:{1}".format(b, id(b)))
print("a is b?", a is b)

a = {}  id:4565606480
b = {}  id:4565609920
a is b? False
Adding 1 to a..
a = {12: 12}  id:4565606480
b = {}  id:4565609920
a is b? False


## lists and generators

In [37]:
a = [x for x in range(11)]
a

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [39]:
a = []
for x in range(11):
    a.append(x)
a

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [41]:
a = {x: x**2 for x in range(11)}
a

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}

In [42]:
import sys

object_1 = [x/10. for x in range(1000)]
object_2 = (x/10. for x in range(1000))

print("""
Object 1 allocates {0} bytes
Object 2 allocates {1} bytes
""".format(
    sys.getsizeof(object_1),
    sys.getsizeof(object_2),
))




Object 1 allocates 9032 bytes
Object 2 allocates 128 bytes



quick recap, what's the result of the following lines

In [29]:
a = [1, 2, 5, 7]

In [30]:
a.insert(0, ">>")
a

['>>', 1, 2, 5, 7]

In [31]:
a.pop(0)
a

[1, 2, 5, 7]

In [33]:
for element in a:
    print(element)

1
2
5
7


how to get the position of the element during iteration?

In [49]:
a = [('King', 1), ('Queen', 2), ('Knight', 'A')]

In [50]:
for pos, (element, element2) in enumerate(a):
    print("At position:", pos, "we have:", element)

At position: 0 we have: King
At position: 1 we have: Queen
At position: 2 we have: Knight


## bisect module

Bisect module allows to find positions in a sorted list into which a given element can be inserted without loosing its sorting.

Essential element in binary tree searches and similar techniques.

In [51]:
import bisect
a = [1, 3, 6, 12, 14, 16]
bisect.bisect(a, 4)

2

In [52]:
bisect.bisect_right(a, 3)

2

In [53]:
bisect.bisect_left(a, 3)

1

In [54]:
# also works with more complex lists, as log the elements are comparable
a = [(1, "First"), (3, "Third"), (6, "too late")]
bisect.bisect(a, (3, "Really add some more stuff"))

1

why 1 and not 2 ?