# Python for the experienced*

\* both more or less

### by Ralph Heinkel
rh@ralph-heinkel.com


<small>Published under [Creative Commons Attribution ShareAlike (CC BY-SA) License](https://creativecommons.org/licenses/by-sa/4.0/).</small>
[<img style="vertical-align: left;" src="Images/cc-by-sa.png">](https://creativecommons.org/licenses/by-sa/4.0/)

# About me

 * Studied Medical Computer Science at University of Heidelberg
 * Diploma theses at EMBL in structural biology
 * Supercomputing Resource Manager at EMBL
 * IT unit director at Cenix Bioscience GmbH
 * Freelance biocomputing and IT consultant (GlaxoSmithCline, Oxford University, SAP, ...)
 
# Python, Linux & me
 * Using Linux and Python since 1996 at EMBL
 * Since then all my projects were implemented in Python
 * Developed three Laboratory Information Management Systems (LIMS)

# Overview of today's course

 * About Python and its ecosphere
 * Python interpreter
 * Virtual environments and external packages
 * Using jupyter notebook
 * Some Python basics recap
 * Delving deeper into Python
 * Ensuring code quality

# About Python and its ecosphere
 * Version 2.7.x will be deprecated on Jan. 1st 2020
 * The current latest version is 3.8.0
 * Python is an interpreted language ...
 * ... but there are options to compile code
 * Cross-platform (Linux, Windows, Mac, and maaaaaany more)
 * Can be extended by C/C++/Fortran/... extension libraries
 * Comes with powerful standard library ("batteries included")
 * Huge collection of 3rd party packages, projects, frameworks, ...

## Python resources

### www.python.org
For downloading Python or quick links to the documentation.

At https://docs.python.org/3/library the full documentation of the standard library can be found (keep it under your pillow!).<br/>
*This is still my main point of reference*.

### www.pypi.org

The **Py**thon **P**ackage **I**ndex. Contains over 200.000 projects. 

This is *the* main resource if you have a problem which might have already been solved. And many are ;-)



# The Python interpreter

# Exercise: Create a simple Python script and run it

Open a text editor, create a file *hello_world.py* and paste the following text into it
``` python
#!/usr/bin/env python3
print("Hello world!")
```

This script can be executed by typing the following in your terminal:
``` bash
$ python3 hello_world.py
```

or by making the script executable and running it as standalone. This requires a Shebang line (line 1) in order to work, so the system knows how to interpret the text file. So
``` bash
$ chmod +x hello_world.py
$ ./hello_world.py
```
gives the same result :) 

(ok might be different under windows, but do yourself a favour and don't develop code under windows)

# Use Python directly in the interpreter

In your unix (or possibly windows) shell type (do it yourself!)
``` python
$ python3
>>> 15 * 3
45
```
Here an example on how to use the `math` module of the standard library:
``` python
$ python3
>>> import math
>>> math.pi * 10
31.4159265359
```
Quiz: What happens if you type `_ / 10`?

Exit the interpreter with `Control-d` (Linux/Mac) or `Control-z` (Windows). 

Quiz: What happens when typing `Control-z` in a running Python interpreter on Linux?

## Little test for your Python knowledge: Find the bug

In [None]:
import math
radius = input('Radius of the circle ')
circumference = 2 * math.pi * radius
print(circumference)

Reason: The `input()`-function does not interpret what is typed to it, it always returns strings, but for doing math you need a numeric value. 

Solution?

`circumference = 2 * math.pi * float(radius)`

# Run a simple Python command directly from the shell:
```bash
$ python3 -c 'import math; print(math.pi * 10)'
31.41592653589793
```

# Directly executing a stdlib module

Some modules of the standard library (as well as other 3rd party packages) allow for direct execution.

The following starts up a very simple webserver:
```bash
$ python -m http.server
Serving HTTP on 0.0.0.0 port 7000 (http://0.0.0.0:7000/) ...
```

Now open a browser, and go to the shown URL. What do you see?

**Important: Never use this for a productive service!!!**

# Excercise:

 1. Goto the docs for this http server module
 2. Find out how to serve the `/tmp` directory

## Installing external packages


## Virtual environments

# Using pip

How to install 3rd party packages? Use the *`pip`* command.

Possible problems with `pip`:
 * Packages are installed into system's Python `site-packages` -> usually write protected
 * Same package, but different versions for different projects could lead to conflicts
 

## The solution: Virtual Python environments

 * For each project create its own light-weight virtualenv
 * Each virtualenv only contains packages required for this project
 * Cheap to create, cheap to dispose of
 
Most Linux distro have system packages for virtualenv (and virtualenvwrapper).<br/>
Possible installer commands (as root) are:
```bash
yum install virtualenv virtualenvwrapper      # RedHat, Centos, 
zypper install virtualenv virtualenvwrapper   # OpenSuse
apt-get install virtualenv virtualenvwrapper  # Ubuntu
```
Or use the `venv` command comming with Python3. 

# Virtualenv Howto

Open a Linux shell, create a new (project) directory, `cd` into it, and create the virtualenv as shown below.

For the setup here at EBI, please type `deactivate conda` first (the word `base` at your prompt should disappear!)

As (a / my) convention I name the virtualenv the same as the project.
```bash
$ mkdir mynewproj; cd mynewproj
$ mkvirtualenv -a . -p /usr/bin/python3 mynewproj
Running virtualenv with interpreter /usr/bin/python3
Already using interpreter /usr/bin/python3
New python executable in /home/ralph/.virtualenvs/mynewproj/bin/python
Installing setuptools, pip, wheel...
done.
[...]
(mynewproj) ~/mynewproj $
```

## More virtualenv commands

Quit the virtualenv:
```bash
$ deactivate
(note how the shell prompt changes).
```    

Activate the virtualenv again. For an important effect to see do `cd /tmp` first.
```bash
$ workon mynewproj
(watch the shell prompt, and your current directory ...)
```    

Finally remove the virtualenv (and all its packages):
```bash
$ rmvirtualenv mynewproj
(note 1: virtualenv must be deactivated first)
(note 2: this does NOT remove your project!)
```


# Use a virtualenv to install jupyter
```bash
$ mkdir jup; cd jup
$ mkvirtualenv -a . -p /usr/bin/python3 jup
$ pip install jupyter
```

To check where `jupyter` is installed type `which jupyter`.<br/>
Then fire up `jupyter` notebook:
```bash
$ jupyter notebook
```
and your browser should automatically open a page in `jupyter`.<br/>
If not, copy the URL printed out on the shell and paste it into a new browser tab.<br/>
(E.g. `http://localhost:8888/?token=e3c6967185a776cefc71d219cc`)

# Using jupyter notebook

# Quick jupyter notebook intro

- Click on 'New' on the right hand side and choose 'Python3'
- Just type some Python code into the first cell and hit `Ctrl-ENTER` or `Shift-ENTER`. 
- One cell depends on the imports and result values of the previous ones
- Some magic commands can be executed within jupyter:
    - `%ls` - List files from the current directory
    - `%%writefile python_code.py` - Write the cell's code into this file
    - `%load python_code.py` - Insert code from external file
    - `!<some bash commend>` - Run external command from within cell. Examples:<br/>
        - `!pip install pandas`
        - `!ls -al *.csv`
  
  More can be found at https://ipython.readthedocs.io/en/stable/interactive/magics.html


# Some Python basics recap

# What is an "iterable"?

### Answer:  Something (usually a container) you can loop over.

### Examples?

- Lists
- Tuples
- Sets
- Dictionaries
- Strings
- ... what else? ...

### The simplest form of iteration:
Looping over a list:

In [None]:
for elem in [0, 1, 2, 3, 4]:
    print(elem)

or over a range:  (Question: which value to put for `x` to get the same output?)

In [None]:
for elem in range(x):
    print('num: %6d' % elem)       # traditional style of string formatting using % chars

For the `range()`-function the same rules apply as for slicing:

`[0, 1, 2, 3, 4, 5, 6, 7][2:4]    # -> returns [2, 3]`

Reason: The index points to the place **in between** the list items (think of "at the commas"). So to get the same result from `range()` as for the slicing do:

`range(2, 4)`

#### Loop over a string:

In [None]:
pangram = "The quick brown fox jumps over the lazy dog"

for word in pangram:
    # format() with positional parameter, can be numbered like {0}, {1}, ...
    print('{}: {}'.format('elem', word))

**Question**: What does this print?

#### Exercise: Loop over individual words of this sentence.

In [None]:
for word in pangram.split():
    print('{elem}'.format(elem=word))     # format() with dict-style parameters

**Explanation:** The `split()` method returns a list of words that can be iterated over. By default it splits at space or newline characters.

**Quick question in between:** How to find out what methods and attributes an object has?

**Answer:** Use the `dir()`-function!

In [None]:
dir(str)   # or dir('')

## Looping over more complex data structure
In the next example we loop over a list of tuples:


In [None]:
chess_coordinates = [('a', 5, 'king'), ('c', 6, 'bishop'), ('e', 2, 'knight')]
for x, y, figure in chess_coordinates:
    print('+{2:>9s} at position {0}{1}'.format(x, y, figure))

Note the extra formatting directives: 
- reserve 9 characters for the figure strings
- print them right-aligned

#### Exercise: Back to the pangram - additionally print the word number next to each word.

The not-so-elegant-but-functional solution:

In [None]:
pangram_words = pangram.split()
for index in range(len(pangram_words)):
    print('{0}: {1}'.format(index, pangram_words[index]))

Now my favourite solution:

In [None]:
for index, elem in enumerate(pangram.split()):
    print(f'{index}: {elem}')                  # latest feature: f-strings

**Explanation:** The `enumerate()`-function returns a list of `(<index>, <value>)`-tuples.

## Interating over a dictionary

In [None]:
phone_numbers = {'emma': '0178-3434', 'john': '07225-5544', 'larissa': '555-666-77'}
for num in phone_numbers:
    print(num)

**Question:** What does this print?

**Rule:** Iterating over a dictionary means iterating over its keys, so it prints the names!<br/>
**Note:** Since Python 3.6 dictionaries are ordered, content is delivered in the same order as it has been added.

**Question:** How to iterate over the values?

In [None]:
# To iterate over values only:
for num in phone_numbers.values():
    print(num)
    
# Or to iterate over keys and values together:
for name, num in phone_numbers.items():
    print(f'{name}: {num}')

## Interating over a file
Yes, a file is also iterable. 

**Exercise**: Let's readout the `passwd` file in `/etc`.

In [None]:
fp = open('/etc/passwd')
for line in fp:
    print(line)

**Questions**: 
- Why are there empty lines in the printout?
- How to get rid of the extra empty lines?

**Solution**: Use the `.strip()`-method of the `line`-string.

In [None]:
fp = open('/etc/passwd')
for line in fp:
    print(line.strip())

### Iterating over a file (cont.)
**Exercise**: Loop over `/etc/passwd` and only print username (pos 0) and home directory (pos 5)

In [None]:
with open('/etc/passwd') as fp:
    for line in fp:
        parts = line.strip().split(':')   
        print(f'{parts[0]}: {parts[5]}')
# from here on 'fp' is closed and cannot be used any longer.

**Exercise**: Now do the same using the `csv`-module of Python. Preferably use the `DictReader` class.

In [None]:
import csv
fields = ['user', 'pwd', 'uid', 'gid', 'desc', 'home', 'shell']
with open('/etc/passwd') as fp:
   reader = csv.DictReader(fp, fieldnames=fields, delimiter=':')
   for row in reader:
       print('{user}: {home}'.format(**row))

# Delving deeper into Python

# Comprehensions

**Comprehensions** are a way of computationally creating/filling Python containers.

Instead of:
```
mylist = []
for val in range(10):
    if val not in (3, 6):
        mylist.append(val**2)
```
use a **List-comprehension:**:
```
mylist = [val**2 for val in range(10) if val not in (3,6)]
```

or, to create a set (set-comprehension):
```
myset = {val**2 for val in range(10) if val not in (3,6)}
```

## Comprehensions (cont.)
#### Exercise: Build a dictionary with the plain number as key, the squares as values:

In [None]:
{val: val**2 for val in range(10) if val not in (3,6)}

Note: The syntax is very similar to set-comprehension.<br/>The difference is `{val: val**2 ...}` instead of `{val**2 ..}`.

### About comprehensions
- Comprehensins are typically much faster than explicit Python loops
- Syntax is more concise ...
- ... but might also be harder to read if spanning multiple rows or nested

#### Exercise: Build a list of dictionaries from `/etc/passwd` using dict-comprehensions.

In [None]:
fields = ['user', 'pwd', 'uid', 'gid', 'desc', 'home', 'shell']
rows = []
with open('/etc/passwd') as fp:
    for line in fp:
        parts = line.strip().split(':')   
        

# Iterators

### An iterator 
- has a state that tells were it is during iteration
- can be called with the `next()` function to return the next value
- raises `StopIteration` when values are exhausted.

**Example:**
```
s = 'UK'      # a string, no state, is iterable
i = iter(s)   # an iterator; its state is the position 0 of 'U'
next(i)       # returns 'U', advances internally to 'K'
next(i)       # returns 'K'
next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
```
An iterator can also be read out in one shot using the `list()` function:
```
list(iter(s))   # returns ['U', 'K']
```

### So, why iterators?
They can save a lot of memory, compared to having all data availble (loaded or computed) upfront.

#### An example for comparison:

In [None]:
import sys
fp = open('/etc/services')  # or any other biggish text file ...
print(sys.getsizeof(fp))
lines = fp.readlines()
print(sys.getsizeof(lines))

**Explanation:** `fp` is an iterator instance that loops over the file returning only one line at a time when requested (e.g. one at each iteration of a loop). 

### More examples of Python iterators (1)
The `zip()`-function returns an iterator that aggregates elements from each of the iterables passed as arguments.

#### Example:

In [None]:
list(zip(['a', 'b', 'c'], [1, 2, 3]))

Try it out - what do you get? Then, make one list longer than they other. Try out again.

Again we need the `list()`-function to exhaust the iterator in one shot.

One nice application of `zip()`: This iterator can be the sole argument for creating a dictionary with `dict()`:

In [None]:
dict(zip(['a', 'b', 'c'], [1, 2, 3]))

Why? A dictionary can be initialized with a list of tuples:
`dict([('a', 1), ('b', 2), ...])`. <br/>
This does only work with the `dict` command, not with the curly braces.

### More examples of Python iterators (2)
The `itertools` libraries provide a lot of convenience iterator implementations. 

**Example:** Use `itertools.count(start, step) is very useful for retrieving an infinite counter for numbers.

In [None]:
import itertools
counter = itertools.count(3, 0.1)
print(next(counter))
print(next(counter))
print(next(counter))
# ...

### More examples of Python iterators (3)

**Example:** The `itertools.cycle()` is convenient if you need to repeat the same sequence over and over again. 

In [None]:
workdays = itertools.cycle(['Mon', 'Tue', 'Wed', 'Thu', 'Fri'])
for idx in range(7):
    print(next(workdays))

#### Caution: Do NOT use the `list()` function to read from the counter, it will never come back until you run out of memory!!

### Exercise: Put together everything we've learned so far.

Let's build go back to the `/etc/passwd` file and generate the same result as the `csv.DictReader()` does. I.e. produce a list of dictionaries containing all fields for every line of the file.


#### Possible solution:

In [None]:
fields = ['user', 'pwd', 'uid', 'gid', 'desc', 'home', 'shell']
fp = open('/etc/passwd')

[dict(zip(fields, row.split(':'))) for row in fp]

# Generators

## Use Generators to build your own iterator functions.

### But first quickly recap Python functions:
```
def myfunction(arg1, arg2):
    res = arg1 + arg2
    return res
```
A function accepts (optional) input values, and - if it contains a `return` statement - the returned value is the result of the function. Easy.

### So, how to build a generator?

A function is automagically converted into a generator if it contains one (or more) `yield`-statements.

Example:

In [None]:
def dummy_generator():
    index = 0
    yield index
    index += 1
    yield index
    index += 2
    yield index

dg = dummy_generator()
print(dg)
list(dg)

It is important to understand that the generator remembers the state of all internal variables in between yield statements, and continues with the next statement after the yield.

### This of course also works using loops within a generator:

In [147]:
def my_char_generator(mystring):
    for char in mystring:
        yield char

In [None]:
mycg = my_char_generator('ebi')
print('start: ', next(mycg))
for char in mycg:
    print('other: ', char)

### Exercise: Re-implement basic `range()` functionality as generator

In [None]:
def myrange(maxval):
    counter = 0
    while counter < maxval:
        yield counter
        counter += 1
        
list(myrange(5)) == list(range(5))    # What does this print?

### Special type of generators: generator expressions

They look very similar to list-comprehensions which we showed before:

```lc = [value *2 for value in range(5)]```

Just replace the square-brackets with round parentheses and you made your first generator expressions:

In [None]:
ge = (value * 2 for value in range(5))
print(ge)
print(list(ge))                           # or alternatively loop over 'ge'

#### Keep in mind:
- list/set/dict-comprehensions generate the entire data structure immediately
- generator-expressions generate each value on demand (lazy evaluation)

# Ensuring code quality through automatic testing

## Why testing?

- Code quality doesn't matter for hacky prototype code ...
- ... but becomes an issue when a project grows
- It costs time to write tests, but pays off quickly when automatic testing is applied
- Automatic tests are especially useful if you need to refactor (improve/enhance) an existing function and you want to be sure it still works as expected

## Different types of testing
- **Unittests:**
  Tests units of code (e.g. functions) independently of the rest of the code base
- **Integration tests:**
  Tests that larger units of code (modules, functions) properly iteract with each other
- **Smoke tests:**
  Tests that the system still behaves reliably under heavy load

## Test-driven development (TDD)
This term means, that you first define WHAT a new function (or class) should do and HOW it is called and what is returned (aka its 'signature'). Then you write the corresponding test that checks for the expected behavior. AFTER writing the tests you implement the function body. The task if finished once all tests succeed.

# Exercise: Write your first unittest functions.

### Goal
The goal of this exercise is that we want to have a function that accepts a list of numbers and adds them up.
Items can be of type
- integers 
- floats

### Approach
Open a fresh jupyter notebook (on the jupyter homepage tab click on 'New' on the right hand side).

In the first cell enter:

In [None]:
%%writefile mycode.py

def add_it_up(items):
   pass   # "pass" here means, nothing is done, and None is returned

## Writing the test functions

Create a second cell an enter the following test code:

In [20]:
%%writefile test_mycode.py

from mycode import add_it_up

def test_add_integers():
    assert add_it_up([1, 2, 3]) == 6
   
def test_add_floats():
    assert 1 == 0   # write your test

Overwriting test_mycode.py


Now, think about corner cases for further tests ... e.g. empty lists are passed ... what should be the return value???

## Install the 'pytest' package:

In a 3rd cell enter and execute:

In [None]:
! pip install pytest

This will print out a lot of detail that we do not need any longer. You can safely delete this cell - make sure your are im 'command'-mode (i.e. the frame around it is blue), then type `dd` (hit the 'd' key twice). This will just remove this cell.

## Run the tests
Create a new cell, the type and execute pytest.

### What it does:
pytest looks for files which start with `test_`-prefix. Inside those found files it looks for function also with the `test_`-prefix, and executes them.

The outcome should look something like:

In [None]:
! pytest

## It's time for coding

So, now go and find your implementation for the `add_it_up` function, and rerun `pytest`. Loop until all tests pass.

## Enhancing the `add_it_up()` function
Now we come to a typical situation in larger projects: 
- There is a function that does something (well)
- But it needs to cover some additional functionality.

In our case the `add_it_up()` function should now also be able to sum up strings. 
In Python the following is completely valid code:
```
somevar = "abc" + "def"
```

#### Exercise: 
- Add a new test function `test_add_strings()`. 
- Refactor the `add_it_up()` to support this functionality.

# Then End

### I hope you had fun and learned something from this session.

### Happy hacking!