Copyright (c) 2019 OERCompBiomed

# Python

Python is a dynamic general-purpose programming language, currently on its third major version: Python 3.7. It enjoys widespread adoption in the scientific community, and it is the *de facto* standard computational environment for data science and artificial intelligence, and partly also for computational biomedicine.

The following notebook serves as a whirlwind-type introduction to Python. If you already know some Python, feel free to browse down to the first point where you see something unfamiliar or interesting.

We are using the `Jupyter Notebook` - for a comprehensive introduction and tutorial see e.g. https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook and https://www.dataquest.io/blog/jupyter-notebook-tutorial.

To further practice your skills in Python check and register for https://practice.datacamp.com/p

## Primitive datatypes and operators

Numbers come in two varieties, integers and floating point. Run the cells. Write in an optional number in the last cell.


In [None]:
3

In [None]:
1.2

Math works exactly like you would expect. Run the cells. Write in an optional simple equation in the last cell. 

In [None]:
2 + 3

In [None]:
6 - 2

In [None]:
3 * 7

We use `/` for true division and `//` for integer division (floor division).

In [None]:
21 / 3    # The output is a floating point number, even though the division has no remainder

In [None]:
22 / 3

In [None]:
21 // 3

In [None]:
22 // 3

The modulo operator (remainder after division) is `%`, and exponentiation is denoted by `**`. Try also 217 % 5 and 217**5.

In [None]:
7 % 3

In [None]:
2**3

You can of course override operator precedence with parentheses. What happens if you use the parentheses like this: 1 +(3 * 2) ?

In [None]:
1 + 3 * 2

In [None]:
(1 + 3) * 2

The two booleans* are called `True` and `False` (note the capital letters). The boolean operators are `and`, `or` and `not`.

*) booleans are named after the British mathematician George Boole. Python "understands" the two words True and False. If you want to know more about the outcome of combining the operators with the booleans in python, then try google "truth table python". 

In [None]:
not True

In [None]:
not False

In [None]:
True and False

In [None]:
False or True

If you want to compare two things in python you use so called comparison operators. Comparison operators look like they do in most other programming languages: `==` (equal value), `!=` (not equal value), `<` (less than), `>` (greater than), `<=` (less than or equal to), `>=` (greater than or equal to). Execute the cells and make an optional comparison in the last cell. 

In [None]:
1 == 1

In [None]:
1 == 1.0

In [None]:
1 < 10

In [None]:
1 > 10

In [None]:
2 <= 2

One notable feature of Python is that you can chain comparisons. Try to predict the outcome (True or False) before you execute the cells below.

In [None]:
-5 != False != True    # Same as (-5 != False) and (False != True)

In [None]:
1 < 2 < 3              # Same as (1 < 2) and (2 < 3)

If you want to write a text strings in python you use single or double quotation marks (both are acceptable). This will leave the text exactly as you wrote it. Try to write your name or a sentence in the last cell. 

In [None]:
"alpha"

In [None]:
'beta'

Every value in Python belong to one of the data type; Integers, Floats, Strings or Booleana. In data science, you will often need to change the type of your data, so that it becomes easier to use and work with. For type conversion, the functions `int`, `float`, `bool` and `str` are your friends.

In [None]:
int("2")

In [None]:
float(5)

In [None]:
bool(0)

In [None]:
str(15.3)

***Add your answers here***

*(double-click here to edit the cell)*

***Question I:What is the difference between the two operations / and //?***

    
***Question II:What would be the outcome of bool(12)?***


***Question III:What would be the outcome of (2 + 3) * 4 != 2 + 3 * 4  ?***



## Collections and mutability

The most fundamental type of collection in Python is the *list*. It is an *ordered* collection of an arbitrary number of objects. 

In [None]:
[1, 2, 3, 4]

The elements of a list do not need to be of the same data types. Here we define a list (mylist). The list has four objects - of four different data types. Do not expect anything to happen if you run the cell - for now we have just defined the list. 

In [None]:
mylist = [False, 1, 2.0, "Three"]

You can access the elements of a list by indexing. However, be aware that the first object (element) does not have index 1 - instead the first element has index **ZERO**.!! Try to predict the outcome of before you execute the two cells below. 

In [None]:
mylist[0]

In [None]:
mylist[1]

Negative indices count back from the end of the list. Again, try to predict the outcome before you execute the cell below.

In [None]:
mylist[-1]

You can use the `len` function to get the number of elements in a list. The previous code could thus have been written like this.

In [None]:
mylist[len(mylist)-1]

You can also use the `len` function to get the length of the list, i.e. the number of objects in the list. 

In [None]:
len(mylist)

To *slice* a list (extract sublists), you can use a colon to separate starting and ending index. Note that the ending index is exclusive, thus `0:4` contains the indices `0,1,2,3`. 
For the cell below, try to predict the outcome before you execute the cell. In the cell below extract a sublist containing only the two objects '1' and '2.0'.

In [None]:
mylist[0:2]

Negative indices work here too, and if you omit an index, it defaults to the start or end, respectively.

In [None]:
mylist[:-1]

In [None]:
mylist[::]

An optional third "argument" gives the step. So adding the number 2 to the former cell will give us every second object in the entire list. If you do not specify anything as the third argument it will be 1 by default. 

In [None]:
mylist[::2]

When you define a list like mylist, it is saved in the notebook memory. 
Lists are **mutable**. See the following code.

In [None]:
a = [0, 1, 2]
b = a
a[0] = 'changed!'
b[0]

What happens here is that a and b refer to **the same list in memory** and when the first index of the list a (a[0]) is changed, so is the first object of b (b[0]).  Therefore, changes made via the name `a` are also reflected under the name `b`. This is sometimes what you want, and sometimes not. If it's not what you want, consider making a *copy* of the list instead of having two names refering to the same list (here a and b). To make a copy instead, you should use the `list` function as illustrated below. Then you can change any objects in list a without changing anything in list b (first object in list b will still be the integer 0).

In [None]:
a = [0, 1, 2]
b = list(a)
a[0] = 'changed!'
b[0]

Another way of having a collection of elements in Phyton is to make them as a tuple. The major difference between tuples and lists is that a list is mutable, whereas a tuple is an unchangeable ordered collection like (0, 1, 2), i.e. it is immutable. This means that a list can be changed, but a tuple cannot. Note that it's the *commas* that make the tuple*, not the parentheses.

In [None]:
0, 1, 2

Also note that the protection against mutations only extends as far as the elements of the tuple. For example:

In [None]:
a = ([0], 1, 2)
b = a
a[0][0] = 'changed!'
b[0][0]

However, the same thing would happen if you made a copy (b=list(a)), since the copy is only "one level deep."

In [None]:
a = [[0], 1, 2]
b = list(a)
a[0][0] = 'changed!'
b[0][0]

The third major type of collection we will look at is the *dictionary*. Dictionaries are key-value maps where the keys can be (almost) any type of object.

In [None]:
mydict = {'a': 1, 'b': 2, 'c': 3}
mydict['a']

Dictionaries, like lists, are mutable.

In [None]:
mydict['d'] = 4
mydict['d']

The final collection that you might find useful is the *set*. A set is an undordered collection of objects that ensures no duplicates are possible.

In [None]:
myset = {1, 2, 3, 2, 3}
myset

***Add your answers here***

*(double-click here to edit the cell)*

***Question I:List the four times of collections you can use in Phyton.***

    
***Question II:Is the following statement true? The first element in a list will have index 0. As a consequence mylist[2] will give you the third element of the list.***


***Question III:Which collection is mutable - a list or a tuble?***



## Working with collections

To check whether an object is in a collection, you can use the `in` operator. This is much faster on sets and dictionaries than on lists and tuples.

In [None]:
1 in [1, 2, 3]

In [None]:
4 in (1, 2, 3)

On dictionaries, the `in` operator checks whether the object is a *key*, not whether it is a value. This could be relevant if you want a function to do something only if the element is in a predefined collection.

In [None]:
'a' in {'a': 1}

In [None]:
1 in {'a': 1}

Instead of writing `not (x in y)` you can write `x not in y`. This could be relevant if you want a function to do something only if the element is not in the predefined collection. Thus,

In [None]:
'a' not in {'a': 1}

In [None]:
"b" not in {'a': 1}

You can convert between different types of collections using the `list`, `tuple`, `dict` and `set` functions. As discussed before, this is also useful to make copies of collections in the case you might want to change them. In the last cell, try to make a conversion of your own choice. 

In [None]:
list((1,2,3))

In [None]:
tuple({1, 2, 3})

In [None]:
dict([('a',1), ('b',2)])

In [None]:
set(dict([('a',1), ('b',2)]))

It is often easier to extract elements from a tuple or a list by *unpacking* it, rather than indexing. This is an elegant mechanism that allows for very nice code. Some examples:

In [None]:
a, b = (1, 2)
print(a, b)

In [None]:
a, b, *rest = (1, 2, 3, 4, 5)
print(a, b, rest)

In [None]:
a, b, *rest = (1, 2)
print(a, b, rest)

***Add your answers here***

*(double-click here to edit the cell)*

***Question I:When could it be useful to change a collection of one type into another type?***

    
***Question II:Explain what *rest does.***



## Looping over collections

The Python `for`-loop runs the same code for each element (elt) in a collection. As such it is best compared to the `for each` loops in some other programming languages.

In [None]:
for elt in [1, 2, 3]:
    print(elt)

Note that a block of code in Python is determined by its indentation. Therefore there's a difference between this:

In [None]:
for elt in [1, 2, 3]:
    print(elt)
    print('Done!')

and this:

In [None]:
for elt in [1, 2, 3]:
    print(elt)
print('Done!')

Again, looping over a dictionary just gets you the keys.

In [None]:
for key in {'a': 1, 'b': 2}:
    print(key)

If you need both the keys and the values, use `.items()`, like this:

In [None]:
mydict = {'a': 1, 'b': 2}
for key, value in mydict.items():
    print(key, '=>', value)

**Note:** This is a special form of unpacking syntax. `mydict.items()` is a collection of tuples. This is equivalent:

In [None]:
mydict = {'a': 1, 'b': 2}
for item in mydict.items():
    key, value = item
    print(key, '=>', value)

##  Branching

In Python, branching is achieved via `if`.

In [None]:
a = 2

if a == 2:
    print('a is 2')

An `if`-branch may have an arbitrary number of "else if" branches followed by an optional "else". Only one of these branches will be chosen. Try to change the value of 'a' and see what happens. 

In [None]:
a = 3

if a == 1:
    print('a is 1')
elif a == 2:
    print('a is 2')
elif a == 3:
    print('a is 3')
else:
    print("I don't know what a is")

***Add your answers here***

*(double-click here to edit the cell)*

***Question I:Explain in your own words what an if-branch is.***


## Functions

A function is a block of code which only runs when it is called. You can pass data into a function and the function can then return a result, - e.g. you can build a function that will return the value of the first element in a list. To define a function in Python, use the `def` keyword. The first function we will look at is simply a function that can print the word Hello! We call the function say_hello(). 

We here define the function (no result will be given before we call the function):

In [None]:
def say_hello():
    print('Hello!')

When the function is defined, you can call the function like this.

In [None]:
say_hello()

Like you might expect, functions can take arguments. An argument could e.g. be a name. Study the function below. Notice that the printed comma is the one from 'Hello,'. 

In [None]:
def say_hello(name):
    print('Hello,', name)
    
say_hello('Bob')

They can also return values. This function will give you the first element (index 0, [0]) in the collection you specify. 

In [None]:
def get_first_element(collection):
    return collection[0]

This function now works with lists, tuples and strings.

In [None]:
get_first_element([5, 6, 7])

In [None]:
get_first_element((6, 7, 8))

In [None]:
get_first_element('abc')

Functions can take multiple arguments too.

In [None]:
def get_an_element(collection, index):
    return collection[index]

get_an_element('abcdef', 4)

When calling a function, you can give named arguments too (also called keyword arguments).

In [None]:
get_an_element('abcdef', index=4)

When doing so you can even change the order.

In [None]:
get_an_element(index=4, collection='abcdef')

However, don't try this.

In [None]:
# get_an_element(index=4, 'abcdef')

You can have arguments with default values, effectively making them optional.

In [None]:
def get_an_element(collection, index=0):
    return collection[index]

get_an_element('abcdef')

In [None]:
get_an_element('abcdef', 4)

It is customary to use named arguments when providing values for optional parameters, and to use positional arguments otherwise. However, this is merely custom. Here you see a list of possible solutions - of which some are useful, others work but are not considered normal and the two last one does not work. 

In [None]:
get_an_element('abcdef')                        # OK, index has its default value
get_an_element('abcdef', index=4)               # OK, override default value of index
get_an_element('abcdef', 4)                     # Works, not considered normal
get_an_element(collection='abcdef', index=4)    # Works, not considered normal
# get_an_element(collection='abcdef', 4)          # Illegal
# get_an_element(index=4, 'abcdef')               # Illegal, but also ambiguous

You can also write functions that take an arbitrary number of arguments. Here, the asterisk `*` is called the "splat" operator. Use it when you donÂ´t want to specify the number of possible arguments.

In [None]:
def print_all_args(*args):
    print(args)

print_all_args('a', 'b', 'c')

Note that `args` becomes a tuple containing all the arguments. You can also collect keyword arguments into a dictionary with the double-splat operator.

In [None]:
def print_all_args(*args, **kwargs):
    print(args, kwargs)
    
print_all_args('a', 'b', 'c', name='Eivind', place='Geilo')

A combination of actual arguments and splats also work "as expected", although it's not always obvious what is expected. :-)

In [None]:
def print_all_args(a, b, *args, c=1, **kwargs):
    print(a, b, c, args, kwargs)
    
print_all_args(1, 2, 3, 4, 5, c=6, d=7, e=8)

Splatting also works the other way, for example, here's a function that sums three numbers:

In [None]:
def sum_three(a, b, c):
    return a + b + c

We can call it like this: (args[0] refering to the first index in the list called args)

In [None]:
args = [5, 6, 7]
sum_three(args[0], args[1], args[2])

But this is much more elegant:

In [None]:
sum_three(*args)

You can mix splats and normal arguments.

In [None]:
sum_three(5, *[6, 7])

In [None]:
sum_three(*[5], 6, *[7])

A similar construction exists for keyword arguments, which requires a dictionary.

In [None]:
kwargs = {'a': 5, 'b': 6, 'c': 7}
sum_three(**kwargs)

Combinations of regular arguments, named arguments, splat arguments and double-splat keyword arguments all work, and should produce the expected results. If Python ever produces an error, you are probably just trying to do something that doesn't make sense.

***Add your answers here***

*(double-click here to edit the cell)*

***Question I:Explain in your own word what a function is.***

***Question II:Explain the difference between single and double star "splat" operator.***
    
***Question III:Explain the output of the cell with "print_all_args(1, 2, 3, 4, 5, c=6, d=7, e=8)".***


## Comprehensions and generators (advanced)

*Comprehensions* are very useful to make code cleaner and easier to read. Let us say we have a function that determines whether a number is a prime number. (This function is very inefficient, so don't "do this at home.") If there's anything in this function that is unclear, don't worry. We'll get to it.

In [None]:
import math

def is_prime(number):
    return number > 1 and all(number % divisor != 0 for divisor in range(2, int(math.sqrt(number) + 1)))

Let us say we want to create a list of all primes up to 20. We might be tempted to write code like this. Note the use of the `range` function to loop over integers up to a maximum (like a traditional for-loop) and the `.append()` method for lists. Explanation of each line is provided using the hashtag. 

In [None]:
primes = []                         # Create an empty list of prime numbers
for num in range(20):               # range(20) is the collection 0, 1, 2, ..., 19
    if is_prime(num):               # Check whether it is a prime number using the funtion above
        primes.append(num)          # If so, add it to the list
primes                              # When you write primes, the generated list will be printed

While this works, a much more elegant solution is the following.

In [None]:
[num for num in range(20) if is_prime(num)]

This is called a *list comprehension*, and it's a thing of beauty. (Take a moment to reflect if you like.) The basic syntax looks like this:

`[<something> for <something> in <collection>]`

or like this:

`[<something> for <something> in <collection> if <condition>]`

Note that the condition is optional, therefore we can create a list of the numbers from 0 to 19 like this.

In [None]:
[num for num in range(20)]

Or, we could create a list of the *squares* of prime numbers like this:

In [None]:
[num**2 for num in range(20) if is_prime(num)]

You can use comprehensions to create sets too.

In [None]:
{num for num in range(20) if is_prime(num)}

Or even dictionaries. What do you think this does?

In [None]:
mydict = {num: is_prime(num) for num in range(20)}

You might think, then, that this creates a tuple:

In [None]:
something = (num for num in range(20) if is_prime(num))

However, this is a *generator*. A generator is a collection-like object that only creates output when requested. Therefore no primes have been computed yet. However when we loop over `something` (for example), primes appear.

In [None]:
for prime in something:
    print(prime)

If you try to loop over the same generator again, it won't work. They are one-use only.

In [None]:
for prime in something:
    print(prime)           # No output, `something` is empty

Looking back at the `is_prime` function again, we find this code:
    
    (number % divisor != 0 for divisor in range(2, int(math.sqrt(number) + 1)))
    
This is a generator that runs over all possible divisors to `number`. (The maximal possible divisor is the square root of `number`. We add one because the upper end of a `range` is exclusive, and we convert to an `int` because `range` doesn't work on floating point numbers.)

It then checks whether `number` leaves a remainder of zero when divided by `divisor`, i.e. whether `divisor` is an *actual* divisor to `number`. It then produces `False` if is is the case, or `True` if not.

A prime number is a number with no proper divisors. Therefore `number` is prime if *all* output of this generator are `True`. The function `all` checks this.

    all(number % divisor != 0 for divisor in range(2, int(math.sqrt(number) + 1))))
    
Python allows you to drop one layer of parentheses if a generator is the only argument to a function, which lets us write

    all(x for x in ...)
    
instead of

    all((x for x in ...))

## Iterables and itertools (advanced)

In Python, an *iterable* is anything that can be iterated over, in other words anything that fits in a `for`-loop. Lists, tuples, dictionaries, sets and strings are all iterables, but we have seen others too: the return value of the `range` function is iterable, as are generators.

The Python ecosystem revolves heavily around iterables, and Python itself has a large amount of tools to work with them, often leading to very elegant code. I will present some of these tools here.

**WARNING:** With very few exceptions, all functions that return iterables return *generators*. In other words, they don't produce elements unless those elements are consumed by something, such as a `for`-loop. The exceptions are the functions `list`, `tuple`, `dict`, and `set`, which accept an iterable as an argument and then consumes it, returning the elements as a list, tuple, dictionary or set. Therefore, in the following, we will use `list(...)` to show the result of a piece of code. In regular code this would usually not be necessary.

The `map` function applies a function to each element of an iterable.

In [None]:
list(map(int, ['1', 2.0, 3.1]))

The `filter` function filters out the items of an iterable which fail a predicate test.

In [None]:
def has_length_two(s):
    return len(s) == 2

list(filter(has_length_two, ['a', 'abc', 'de', 'fg', 'hij']))

Note that both `map` and `filter` can be expressed with comprehension syntax, and that this sort of syntax is usually considered preferable among Pythonistas.

The `enumerate` function allows you to iterate over both the elements of a collection *and* their indices at the same time.

In [None]:
for index, value in enumerate('abcd'):
    print(index, '=>', value)

This is much more elegant than code such as this:

In [None]:
s = 'abcd'
for index in range(len(s)):
    print(index, '=>', s[index])

The `zip` function lets you iterate over multiple iterables simultaneously, like a zipper.

In [None]:
list(zip('abcd', 'zyxw'))

`zip` accepts an arbitrary number of iterables. They can even be of different length, and the total length of the iterable will be that of the shortest argument.

In [None]:
list(zip('abcd', 'zyx', 'abcdefghijkl'))

The `itertools` module contains much more goodies. Let's try some of them by importing it.

In [None]:
import itertools as it

The `product` function creates a Cartesian product of several iterables.

In [None]:
list(it.product([0, 1], 'ab'))

The `combinations` function returns subsets of a collection.

In [None]:
list(it.combinations('abcd', 2))

The `chain` function concatenates several iterables together.

In [None]:
list(it.chain('abc', range(3)))

The `repeat` function creates an infinite iterable that just outputs a single thing. (Don't try to do `list(repeat(...))` however.)

In [None]:
it.repeat(3)   # => 3, 3, 3, ...

The `cycle` function creates an iterable that cycles through another iterable endlessly.

In [None]:
it.cycle('abc')    # => 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', ...

The `count` function creates an iterable that counts up from a given number.

In [None]:
it.count(0)     # => 0, 1, 2, 3, ...