 1_python_basics Copyright (c) 2019 OERCompBiomed

# Python

Python is a dynamic general-purpose programming language, currently on its third major version: Python 3.7. It enjoys widespread adoption in the scientific community, and it is the *de facto* standard computational environment for data science and artificial intelligence, and partly also for computational biomedicine.

Excerpts from Tim J. Stevens: _The Python Programming Language_ in : A. Hofmann and S. Clokie: [Wilson and Walker's Principles and Techniques of Biochemistry and Molecular Biology](https://www.cambridge.org/core/books/wilson-and-walkers-principles-and-techniques-of-biochemistry-and-molecular-biology/2159004E019DDD87C0A97EE8DB72B79F), 8th edition, Cambridge University Press, 2018.


> ... A biologist will often turn to computer programming in situations where the amount or the complexity of data is too much to be senisibly hanled by spreadsheets, and where no other, more specialized, software exists. Often only a relative simple program needs to be written to get something useful from biological data, which would otherwise not be available.
>
> For biologists, the task of writing a computer program can sometimes seem like a significant barrier, but once the basic programming skills are learned then many possibilities are enabled. [Python](https://www.python.org) is one of the most popular programming languages and is becoming an increasingly attractive options for the biologist. It is a high-level, general-purpose language that is well supported and relatively easy to learn. Also, it has a large number of eternal modules, including many related to mathematics, science and biology. Python is easy to install and runs on almost all kinds of computer system. Presently, Python 3 is the _de facto_ standard.
>
> Even if you don't intend to use Python in the long-run or for all programming work [in some cases [R](https://www.r-project.org) could be an alternative], it nonetheless serves as a good starting point to learn some of the major principles of many modern computing languages.

# How to use this notebook:

This is a very detailed, and at times advanced introduction to Python, and it is unrealistic to get through it in a single go. Rather, you are encouraged to learn the primary learning outcomes (summary below), and return to it at any point you need to refreshen your skills, for solving tasks later in the course. 

# Learning outcomes, and summary of syntax

**Be able to read and construct code related to the core Python language concepts:**

### Primitive datatypes
  - **Integers**
  - **Floats**
  - **Strings**
  - **Booleans**
  
```python
# keywords:
int
float
str
bool
```

### Operators
  - `+`, `-`, `*`, `/`, `%` (mathematical operators)
  - `=` (assigning variables) 
  - `==`, `!=`, `<`, `>`, `<=`, `>=` (comparing values)
  - `&`, `|`, `and`, `or`, `not` (Boolean operators)
  
### Collections
  - **Lists** (adding elements, slicing, list comprehension,  ...)
  - **Tuples**
  - **Dictionaries** (key-value maps, dictionary comprehension, ... )
  - **Sets**
  
```python
# make empty collections:
ls = list()
tp = tuple()
st = set()
dc = dict()

# 
ls = [1,2,3]
tp = (1,2,3)
st = {1,2,3}
dc = {'a':1, 'b':2, 'c':3}

```  

- **indexing**
`[]` (square brackets)

```python
ls = [2,5,10]
ls[0] # returns "0'th" (1st) element, i.e. 2
ls[2] # returns returns 3rd element, i.e. 10

```
  
 
### Loops
  - **for**
  - **while**

```python
sm = 0
for i in [1,2,3]:
    sm = sm + i

while condition==True:
    print('this code is repeated until contition is false')
    
```

### Control flow
- `if`, `elif`, `else`

```python
if time < 12:
    print('good morning')
elif time > 12 and time < 18:
    print('good afternoon')
else: 
    print('good evening')
    
    
```

### Functions
- `def`

```python
def function_name(input1, input2):
    return input2 + input1

# function call:
function(3, 5)
# returns 8
          
```

### Libraries

- Specialized collections of functionality
- Extends the available functions and classes

```python
import time # basic syntax
import numpy as np # alias
from matplotlib import pyplot as plt # import module, not whole library, then alias
```

This notebook serves as a whirlwind-type introduction to Python. If you already know some Python, feel free to browse down to the first point where you see something unfamiliar or interesting.

We are using the `Jupyter Notebook` - for a comprehensive introduction and tutorial see e.g. https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook and https://www.dataquest.io/blog/jupyter-notebook-tutorial and https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007007

**You are encouraged to experiment with all the code!** <br>

NOTE: The original notebook on Github might change over the time, and we recommend that you make a copy of our notebooks before you are editing them. In this respect you might adopt the naming convention my_<'name_of_notebook'>.ipynb, e.g. `my_01-begin-python-programming.ipynb`

To further practice your skills in Python check and register for https://practice.datacamp.com/p <br>(https://www.datacamp.com/onboarding/learn?technology=python)

## Primitive datatypes and operators

Numbers come in two varieties, integers and floating point


In [1]:
3

3

In [2]:
1.2

1.2

Math works exactly like you would expect.

In [3]:
2 + 3

5

In [4]:
6 - 2

4

In [5]:
3 * 7

21

We use `/` for true division and `//` for integer division (floor division).

In [6]:
21 / 3    # The output is a floating point number, even though the division has no remainder

7.0

In [7]:
22 / 3

7.333333333333333

In [8]:
21 // 3

7

In [9]:
22 // 3

7

The modulo operator (remainder after division) is `%`, and exponentiation is denoted by `**`.

In [10]:
7 % 3

1

In [11]:
2**3

8

You can of course override operator precedence with parentheses.

In [12]:
1 + 3 * 2

7

In [13]:
(1 + 3) * 2

8

or, this one: **Can you solve 8÷2(2+2) = ?**  $\ldots$ going viral in August 2019 (see [Popular Mechanics](https://www.popularmechanics.com/science/math/a28569610/viral-math-problem-2019-solved), [Fox News](https://www.foxnews.com/tech/viral-math-problem-baffles-many-internet), and the [New York Times](https://www.nytimes.com/2019/08/06/science/math-equation-pemdas.html))

In [9]:
8/2*(2+2)

16.0

In [10]:
(8/2)*(2+2)

16.0

In [11]:
8/(2*(2+2))

1.0

## Booleans

The two boolean values are called `True` and `False` (note the capital letters). The boolean operators are `and`, `or` and `not`.

In [17]:
not True

False

In [18]:
not False

True

In [19]:
True and False

False

In [20]:
False or True

True

booleans are named after the British mathematician George Boole. Python "understands" the two words True and False. If you want to know more about the outcome of combining the operators with the booleans in python, then try google "truth table python". 

Comparison operators look like they do in most other programming languages: `==` (equal value), `!=` (not equal value), `<` (less than), `>` (greater than), `<=` (less than or equal to), `>=` (greater than or equal to)

In [21]:
1 == 1

True

In [22]:
1 == 1.0

True

In [23]:
1 < 10

True

In [24]:
1 > 10

False

In [25]:
2 <= 2

True

In [26]:
2 >= 2

True

One notable feature of Python is that you can chain comparisons.

In [27]:
-5 != False != True    # Same as (-5 != False) and (False != True)

True

In [28]:
1 < 2 < 3              # Same as (1 < 2) and (2 < 3)

True

Strings of text work as you might expect, too. Both double and single quotation marks are acceptable.

In [29]:
"alpha"

'alpha'

In [30]:
'beta'

'beta'

## Converting between types
For type conversion, the functions `int`, `float`, `bool` and `str` are your friends.

In [31]:
int("2")

2

In [32]:
float(5)

5.0

In [33]:
bool(0)

False

In [34]:
str(15.3)

'15.3'

### Try to answer the below questions before you run the code. Does it check out?

Question I:What is the difference between the two operations / and //?

Question II:What would be the outcome of bool(12)?

Question III:What would be the outcome of (2 + 3) * 4 != 2 + 3 * 4 ?

### Assigning variables

You can store a value of any type into a variable using the = symbol.

In [15]:
a = 5
b = 5.0
c = "5"

#### Ex1.1. evaluate whether a, b and c are equal (pairwise). 

In [16]:
# %load solutions/ex1_1.py


In [17]:
print(type(a))
print(type(b))
print(type(c))

# you could also call everything in a single print statement using
# print(type(a), type(b), type(c))

<class 'int'>
<class 'float'>
<class 'str'>


# Collections

Python supports 4 basic types of collections: `list`, `tuple`, `set` and `dict`.

## Lists
The most fundamental collection in Python is the `list`. It is an *ordered* collection of an arbitrary number of objects. A list is the Python equivalent of an array, but is resizeable and can contain elements of different types.
- `list`
- example: `[1,3,5]`

## Tuples
Tuples are just like lists, but once a value has been stored, it cannot be changed (immutability).
- `tuple`
- example: `(1,3,5)`

## Sets
Sets are unique in that they cannot have redundancy - i.e. the same value cannot appear more than once. This data structure is underutilized in Python.
- `set`
- example: `{1,3,5}`

## Dictionaries
Dictionaries are collections of key,value pairs.
- `dict`
- example: `{'a':1, 'b':3, 'c':5}`

### We start working with lists

In [45]:
xs = [5, 6, 7]
print(xs)

[5, 6, 7]


## List indexing
You can access the elements of a list individually by calling its index using the syntax `list[number]`.

**NOTE:** Python start counting at 0!


In [41]:
print(xs[0])    # Access the "0th" element in xs
print(xs[1])    # Access the "1st" element'
print(xs[2])
print(xs[-1])   # the last element
print(xs[-2])   # the second to last element

5
6
7
7
6


In [29]:
# you can change the entry of the list at a given index
xs[0] = 999
xs

[999, 6, 7]

### Slicing
You can "chop" a list to access a range of values using the syntax `list[start:stop]`

In [88]:
xs = [20, 14, 8, 1, 0, 9, 13, 10, 0, 6]

xs[2:5]

[8, 1, 0]

Omitting the upper and lower boundary will make it expand to the end of the list using either `list[start:]` or `list[:stop]` 

In [91]:
xs[:5]

[20, 14, 8, 1, 0]

the slicing can actually take more optional inputs: the step parameter. The general syntax is `list[start:stop:step]`

In [93]:
xs[2:8:2]

[8, 0, 13]

Lists can contain multiple data types inside

In [43]:
dog = ['Freddie', 9, True, 1.1, 2001, ['bone', 'little ball']]

Can you find out how to get the ‘bone’ element, which is located in a nested list, like the following? 

In [44]:
dog[-1][0]

'bone'

## Functions (on lists)

Python have builtin functions which let you perform some diverse operations. Functions are executed like `function(argument)`. Below are examples of some builtin functions (highlighted in green).

In [136]:
print(xs)

[20, 14, 8, 1, 0, 9, 13, 10, 0, 6]


In [137]:
len(xs)

10

In [138]:
type(xs)

list

In [139]:
sum(xs)

81

In [146]:
min(xs)

0

In [148]:
max(xs)

20

In [48]:
#first lets make a long list. You can make a sequence as follows
long_list = list(range(20))
print(long_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


#### Ex1.2 print the first half of long_list.

**Hint:** your index must be of type `int`.

In [50]:
# %load solutions/ex1_2.py


## List methods
Another type of function in python are called *methods*, and differ from regular functions in terms of syntax. These are prebuilt into the object, and are executed by calling the name of your object and a "." followed by the method name.

`object_name.method()`

In [51]:
# append to a list
xs = [0,1]
xs.append(5)
xs

[0, 1, 5]

In [35]:
# chain together two lists using .extend
xs = [5,6,7]
ys = [1,2,3]
xs.extend(ys)
print(xs)

[5, 6, 7, 1, 2, 3]


In [36]:
# remove the last element
print('before: ', xs)
xs.pop()
print('after: ', xs)

before:  [5, 6, 7, 1, 2, 3]
after:  [5, 6, 7, 1, 2]


In [37]:
# remove a specific value
print(xs)
xs.remove(7) # removes the first occurence of 7
xs

[5, 6, 7, 1, 2]


[5, 6, 1, 2]

A list can even be a list of lists:

In [52]:
sample_matrix = [[1, 4, 9], [1, 8, 27], [1, 16, 81]]

In [53]:
sample_matrix[1][2]   # third item of the second list

27

**Tip**: Write `object_name.` and press tab see all available methods

**Tip**: use ?? to get the docstring 

In [None]:
xs.

In [48]:
xs.insert??

#### Ex1.3. Can you insert 999 after the first value in xs?

In [None]:
# %load solutions/ex1_3.py


## For loops

A loop lets you run through each element in an iterable, performing some code at each stop on the way.

In [57]:
for i in range(10):
    print('This is a repeat')

This is a repeat
This is a repeat
This is a repeat
This is a repeat
This is a repeat
This is a repeat
This is a repeat
This is a repeat
This is a repeat
This is a repeat


In [59]:
xs = [0,4,10]

for i in xs:
    print(i, "is a number")
    print(i*2, "is twice of that")
    print()    # Prints a new line

0 is a number
0 is twice of that

4 is a number
8 is twice of that

10 is a number
20 is twice of that



You can name the variable whatever you'd like:

In [61]:
animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print(animal)

cat
dog
monkey


If you want access to the index of each element within the body of a loop, use the built-in `enumerate` function:

In [62]:
for idx, animal in enumerate(animals):
    print('#', idx+1, ":", animal)

# 1 : cat
# 2 : dog
# 3 : monkey


**List comprehensions**:
#### Ex1.4. Write a for loop which creates a list of the squares of 0-5. 
**Tip:** make an empty list, then append the values iteratively 

In [55]:
# %load solutions/ex1_4.py


[0, 1, 4, 9, 16, 25]

An easier way to achieve the same thing is with **list comprehensions**

In [56]:
squares = [x ** 2 for x in range(6)]
squares

[0, 1, 4, 9, 16, 25]

#### Ex1.5. Are you able to edit the code below to print a symmetrical christmas tree?

In [59]:
n = 15
for i in range(n):
    print('*'*i)


*
**
***
****
*****
******
*******
********
*********
**********
***********
************
*************
**************


In [74]:
# %load solutions/ex1_5.py



### The if statement

Another fundamental building block in all of programming is the if statement. It allows you to execute code, only if some condition is satisfied.


In [107]:
num = 5
if num > 4: print(num, 'is bigger than four!')

5 is bigger than four!


If you want the program to do different things depending on whether the condition is true or false, you can add an else clause.

In [108]:
if num > 10:
    print(num, 'is bigger than ten!')
else:
    print(num, 'is smaller than ten.')

5 is smaller than ten.


Finally, if you want to test for multiple condition before calling the final else clause, you can use elif (else if).


In [104]:
if num > 10:
    print("gee, that's a large number")
elif num == 5:
    print("it's just five")
else:
    print("I don't know what it is")

it's just five


#### Ex1.6. make a list of the number 0-9. Loop through the list and print "yes" if the number greater than 4. Otherwise, print "no".


In [None]:
# %load solutions/ex1_6.py


List comprehensions can also contain conditions:

In [79]:
[print('yes') if x>4 else print('no') for x in range(10)]

no
no
no
no
no
yes
yes
yes
yes
yes


[None, None, None, None, None, None, None, None, None, None]

In [116]:
[x*3 for x in range(6) if x <= 3]

[0, 3, 6, 9]

#### Ex1.7. Make a list of the first 5 squares like above, but only if it is an even number.
**Tip:** use the modulo operator: `%` to determine evenness

In [83]:
# %load solutions/ex1_7.py


Lists are **mutable**. See the following code.

In [85]:
a = [0, 1, 2]
b = a
a[0] = 'changed!'
b[0]

'changed!'

This happens because after the line `b = a`, both `b` and `a` point to **the same list in memory**. Therefore, changes made via the name `a` are also reflected under the name `b`. This is sometimes what you want, and sometimes not. If it's not what you want, consider making a *copy* of the list. To do that, use the `list` function.

In [86]:
a = [0, 1, 2]
b = list(a)
a[0] = 'changed!'
b[0]

0

## Tuples are immutable lists
Another type of collection is known as a tuple. In contrast to lists, tuples are immutable. We can make a tuple as follows.

In [107]:
tup = (0,1,2)
type(tup)

tuple

Note that it's the *commas* that make the tuple, not the parentheses.

In [56]:
0, 1, 2

(0, 1, 2)

Also note that the protection against mutations only extends as far as the elements of the tuple. For example:

In [57]:
a = ([0], 1, 2)
b = a
a[0][0] = 'changed!'
b[0][0]

'changed!'

However, the same thing would happen if you made a copy, since the copy is only "one level deep."

In [58]:
a = [[0], 1, 2]
b = list(a)
a[0][0] = 'changed!'
b[0][0]

'changed!'

# Dictionaries

The third major type of collection we will look at is the *dictionary*. Dictionaries are key-value maps where the keys can be (almost) any type of object. The values in a dictionary are accessed by key, not by index, and each key is used only once.

In [59]:
mydict = {'a': 1, 'b': 2, 'c': 3}    # Create a dictionary with some data
print(mydict['a'])                   # Get an entry from a dictionary
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary; prints "cute"
print('cat' in d)     # Check if a dictionary has a given key; prints "True"
d['fish'] = 'wet'     # Set an entry in a dictionary
print(d['fish'])      # Prints "wet"
# print(d['monkey'])  # KeyError: 'monkey' not a key of d
print(d.get('monkey', 'N/A'))  # Get an element with a default; prints "N/A"
print(d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"
del d['fish']         # Remove an element from a dictionary
print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"

1
cute
True
wet
N/A
wet
N/A


The following example is a dictionary where the keys are strings (DNA base codes) and the values are numbers (nucleotide masses).

In [60]:
d = {"G":329.21, "C":289.18, "A":313.21, "T":314.19}
print(d['A'])      # 313.21 -  value associated with 'A'
print(len(d))      # 4 - number of key:value pairs
print(d.keys())    # Just keys 'G', 'A', 'T', 'C'
print(d.values())  # Just values 329.21, 313.21, 314.19, 289.18

313.21
4
dict_keys(['G', 'C', 'A', 'T'])
dict_values([329.21, 289.18, 313.21, 314.19])


If a key is already present in the dictionary, then a simple assignment of the form `dict[key]=value` is used to change the value associated with that key. If the key was not already present, this kind of assignment will add a new `key:value` pair. Existing keys can not be changed directly, but it is possible to remove a `key:value` pair using `del` and add the same value back again with a different key.

In [61]:
d = {"G":329.21, "C":289.18, "A":313.21, "T":314.19}
d['T'] = 304.19   # Change the value of an existing item
d['U'] = 291.08   # Add a new key:value pair
print(len(d))     # 5 - dict is larger  
del d['U']        # Delete a key and its value from the dictionary
d

5


{'G': 329.21, 'C': 289.18, 'A': 313.21, 'T': 304.19}

**Loops**: It is easy to iterate over the keys in a dictionary:

In [62]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has 8 legs"

A person has 2 legs
A cat has 4 legs
A spider has 8 legs


If you want access to keys and their corresponding values, use the `items` method:

In [63]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, legs in d.items():
    print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has 8 legs"

A person has 2 legs
A cat has 4 legs
A spider has 8 legs


**Dictionary comprehensions**: These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:

In [64]:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print(even_num_to_square)  # Prints "{0: 0, 2: 4, 4: 16}"

{0: 0, 2: 4, 4: 16}


Dictionaries, like lists, are mutable.

In [65]:
mydict = {'a': 1, 'b': 2, 'c': 3}    # Create a dictionary with some data
mydict['d'] = 4
mydict

{'a': 1, 'b': 2, 'c': 3, 'd': 4}

# Sets

The final collection that you might find useful is the *set*. A set is an undordered collection of distinct elements, i.e. objects that ensures no duplicates are possible. As a simple example, consider the following:

In [66]:
myset = {1, 2, 3, 2, 3}   # Duplicates are eliminated
print(myset)
animals = {'cat', 'dog'}
print('cat' in animals)   # Check if an element is in a set; prints "True"
print('fish' in animals)  # prints "False"
animals.add('fish')       # Add an element to a set
print('fish' in animals)  # Prints "True"
print(len(animals))       # Number of elements in a set; prints "3"
animals.add('cat')        # Adding an element that is already in the set does nothing
print(len(animals))       # Prints "3"
animals.remove('cat')     # Remove an element from a set
print(len(animals))       # Prints "2"

{1, 2, 3}
True
False
True
3
3
2


**Loops**: Iterating over a set has the same syntax as iterating over a list; however since sets are unordered, you cannot make assumptions about the order in which you visit the elements of the set:

In [67]:
animals = {'cat', 'dog', 'fish'}
for idx, animal in enumerate(animals):
    print('#%d: %s' % (idx + 1, animal))
# Prints "#1: fish", "#2: dog", "#3: cat"

#1: fish
#2: cat
#3: dog


**Set comprehensions**: Like lists and dictionaries, we can easily construct sets using set comprehensions:

In [68]:
from math import sqrt
nums = {int(sqrt(x)) for x in range(30)}
print(nums)  # Prints "{0, 1, 2, 3, 4, 5}"

{0, 1, 2, 3, 4, 5}


## Working with collections

To check whether an object is in a collection, you can use the `in` operator. This is much faster on sets and dictionaries than on lists and tuples.

In [70]:
1 in [1, 2, 3]

True

In [71]:
4 in (1, 2, 3)

False

If your collection is very long (hundreds or thousands of elements), like the Merriam Webster English dictionary, the `in` operator is extremely handy.

On dictionaries, the `in` operator checks whether the object is a *key*, not whether it is a value.

In [72]:
'a' in {'a': 1}

True

In [73]:
1 in {'a': 1}

False

Instead of writing `not (x in y)` you can write `x not in y`. Thus,

In [74]:
'a' not in {'a': 1}

False

In [75]:
1 not in {'a': 1}

True

You can convert between different types of collections using the `list`, `tuple`, `dict` and `set` functions. As discussed before, this is also useful to make copies of collections in the case you might want to change them.

In [76]:
list((1,2,3))

[1, 2, 3]

In [77]:
tuple({1, 2, 3})

(1, 2, 3)

In [78]:
dict([('a',1), ('b',2)])

{'a': 1, 'b': 2}

In [79]:
set(dict([('a',1), ('b',2)]))

{'a', 'b'}

It is often easier to extract elements from a tuple or a list by *unpacking* it, rather than indexing. This is an elegant mechanism that allows for very nice code. Some examples:

In [80]:
a, b = (1, 2)
print(a, b)

1 2


In [81]:
a, b, *rest = (1, 2, 3, 4, 5)
print(a, b, rest)

1 2 [3, 4, 5]


In [82]:
a, b, *rest = (1, 2)
print(a, b, rest)

1 2 []


The collection types have a number of **inbuilt functions** (methods) that are accessed with the **dot syntax**. The functions avaialable to a given collection are restricted to the characteriostics of its type (e.g. sets di not have functions that refer to positional indices). Some examples are:

In [83]:
x = ['Mon', 'Tue', 'Wed'] # A list of strings
y = ['Fri', 'Sat', 'Sun'] # And another
print('x:', x)
print('y:', y)
x.append('Thu')  # Add a single new item to end
print('x.append(''Thu''):\t', x)
x.extend(y)	   # Extend with items from another collection 
print('x.extend(y):\t', x) 
x.sort()         # Sort contents alphabetically
print('x.sort():\t', x)
x.remove('Sun')  # Remove an item
print('x.remove(''Sun''):\t', x)
x.index('Sat')   # Positional index of an item
print('x.index(''Sat''):\t', x)
print('\n')
s = {'G', 'C', 'A', 'T'}   # A set with 4 strings
t = {'N', 'R', 'Y'}
print('s:', s)
print('t:', t)
s.add('U')       # Add a single item (if not present) 
print('s.add(''U''):\t', s)
s.update(t)      # Add any new items from another collection
print('s.update(t):\t', s)

x: ['Mon', 'Tue', 'Wed']
y: ['Fri', 'Sat', 'Sun']
x.append(Thu):	 ['Mon', 'Tue', 'Wed', 'Thu']
x.extend(y):	 ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
x.sort():	 ['Fri', 'Mon', 'Sat', 'Sun', 'Thu', 'Tue', 'Wed']
x.remove(Sun):	 ['Fri', 'Mon', 'Sat', 'Thu', 'Tue', 'Wed']
x.index(Sat):	 ['Fri', 'Mon', 'Sat', 'Thu', 'Tue', 'Wed']


s: {'G', 'T', 'A', 'C'}
t: {'R', 'Y', 'N'}
s.add(U):	 {'T', 'A', 'C', 'G', 'U'}
s.update(t):	 {'Y', 'R', 'T', 'A', 'C', 'G', 'N', 'U'}


## Summarizing questions about collections

***Question I:What are the four basic collections you can use in Phyton?***

    
***Question II:Is the following statement true? The first element in a list will have index 0. As a consequence mylist[2] will give you the third element of the list.***


***Question III: Which collection is mutable - a list or a tuple?***

***Question IV: you want to make a digital phone book, which python collection is best suited for that task?***

## More about loops

The Python `for`-loop runs the same code for each element in a collection. As such it is best compared to the `for each` loops in some other programming languages.

In [95]:
for elt in [1, 2, 3]:
    print(elt)

1
2
3


Note that a block of code in Python is determined by its indentation. Therefore there's a difference between this:

In [96]:
for elt in [1, 2, 3]:
    print(elt)
    print('Done!')

1
Done!
2
Done!
3
Done!


and this:

In [97]:
for elt in [1, 2, 3]:
    print(elt)
print('Done!')

1
2
3
Done!


Again, looping over a dictionary just gets you the keys.

In [98]:
for key in {'a': 1, 'b': 2}:
    print(key)

a
b


If you need both the keys and the values, use `.items()`, like this:

In [87]:
mydict = {'a': 1, 'b': 2}
for key, value in mydict.items():
    print(key, '=>', value)

a => 1
b => 2


**Note:** This is a special form of unpacking syntax. `mydict.items()` is a collection of tuples. This is equivalent:

In [88]:
mydict = {'a': 1, 'b': 2}
for item in mydict.items():
    key, value = item
    print(key, '=>', value)

a => 1
b => 2


##  Branching and control flow

Lines of Python code are generally executed in sequentiual order. There are situations, however, where we wish to deviate from this paradigm, e.g. repeat a section of code several times in a loop, or only execute a block of code under certain conditions.

In Python, we achieve this via the `if` statement.

In [1]:
a = 2

if a == 2:
    print('a is 2')

a is 2


An `if`-branch may have an arbitrary number of "else if" branches followed by an optional "else". Only one of these branches will be chosen.

In [2]:
a = 3

if a == 1:
    print('a is 1')
elif a == 2:
    print('a is 2')
elif a == 3:
    print('a is 3')
else:
    print("I don't know what a is")

a is 3


In [91]:
x = 3
if x < -1 or x > 1:    # Run the next indented lines only when true   
    x *= 2                
    print('Value was doubled')  

print('Value is:', x)  # Always executed, not in indented block

Value was doubled
Value is: 6


In [92]:
x = 3
print(x)
if x > 0:
    print('Positive')
elif x < 0:            # Checked if first condition was false
    print('Negative')
else:                  # If all fails
    print('Zero')

x = -5
print(x)
if x > 0:
    print('Positive')
elif x < 0:            # Checked if first condition was false
    print('Negative')
else:                  # If all fails
    print('Zero')

3
Positive
-5
Negative


**Repetitive loops** can be created with a `for` statement or a `while` statement, e.g.:

In [93]:
total = 0
data = [1,4,9,25,36]
for x in data:       # x is first 1, then 4, then 9 etc.
    print(x)         # Current value of x in this cycle
    total += x       # Add current value of x to total

1
4
9
25
36


It is often convenient to use the `enumerate()` function with a `for` loop. This allows the loop to iterate over both numbers for the items (usually the positional indices) and their actual values, e.g.:

In [94]:
text = 'AGCAGTAGACGAACAT'     # String of characters
for i, x in enumerate(text):  # Extract index and character value
    print(i, x)               # Print index and value for each cycle

0 A
1 G
2 C
3 A
4 G
5 T
6 A
7 G
8 A
9 C
10 G
11 A
12 A
13 C
14 A
15 T


A `while` loop repeats a block of code while a certain condition evaluates to be true, e.g.:

In [95]:
x = 1
while x < 1000:   # Repeat the indented block while this is true
    print(x) 
    x *= 2        # Double the value

print(x)  # 1024 - final value stopped the loop: not less than 1000

1
2
4
8
16
32
64
128
256
512
1024


Loops of both kinds can be skipped, for the remainder of their block, using `continue` and stopped entirley with `break`, e.g.:

In [96]:
t = 0
data = [3, -1, 2, -5, 999, 9, -2]
for x in data:
    print('x =', x)
    if x < 0:
        continue      # Skip the remainder of 'for' loop
    elif x == 999:
        break         # Quit entirely
     
    t += x * x        # Otherwise do a calculation
    print('t =', t)

print('Final',t)

x = 3
t = 9
x = -1
x = 2
t = 13
x = -5
x = 999
Final 13


### `try:` and `except:`

A `try: except` block is used to catch and deal with illegal circumstances. The code in a `try` block is run and if a problem occurs an `except` block of code may be run if a particular kind of **error** (a type of Exception object) is detected. In this eay we can prevent the program from failing and sinsibly handle an error. If we want, the original error can be retriggered using `raise()`, e.g.:

In [138]:
x = 1
y = 0
try:                 # Run the following block and check for failure
    w = x / y

except ZeroDivisionError as err:        # y was zero
    print('divided by zero, continuing') # warn, but otherwise ignore

except Exception as err:   # Other, unspecified problem
    raise(err)             # Trigger the error, do not continue

divided by zero, continuing


# Functions

To define a function in Python, use the `def` keyword. Like this:

In [149]:
def say_hello():
    print('Hello!')

You can then call the function like this.

In [150]:
say_hello()

Hello!


Like you might expect, functions can take arguments, e.g.:

In [151]:
def say_hello(name):
    print('Hello,', name)
    
say_hello('Bob')

Hello, Bob


In [152]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))

negative
zero
positive


A function can return just anything, for example the first element of the input:

In [3]:
def get_first_element(collection):
    return collection[0]

This function now works with lists, tuples and strings (in fact any indexable collection).

In [6]:
get_first_element([5, 6, 7])

5

In [7]:
get_first_element((6, 7, 8))

6

In [8]:
get_first_element('abc')

'a'

#### Ex1.8. Write a function which takes a list (or other iterable, like a string) as input, and tell whether it containts any duplicates (the same number more than once).

In [None]:
# %load solutions/ex1_8.py    

# there are many possible solutions...

In [32]:
# here you can test if it works

test1 = 'abd43ghj'
test2 = 'abcefg'

duplicates(test1)

No duplicates


## Rules for calling (applying) functions in python:

you can choose to feed in arguments either by name of the argument, or pass them in the right order. 

In [43]:
def get_an_element(collection, index):
    return collection[index]

get_an_element('abcdef', 4)

'e'

In [44]:
# by order
get_an_element('abcdef', 4)

'e'

In [45]:
# by name:
get_an_element(collection='abcdef', index=4)

'e'

When you call by name, the order can be arbitrary:

In [46]:
get_an_element(index=4, collection='abcdef')

'e'

We refer to these as keyword-arguments when we use the name. A rule states you cannot pass in a keyword-argument before a positional argument, so the following won't work:

In [47]:
get_an_element(collection='abcdef', 4)

SyntaxError: positional argument follows keyword argument (<ipython-input-47-420d5bf658cd>, line 1)

But this will:

In [42]:
get_an_element('abcdef', index=4)

'e'

## Default arguments

When we construct a function, we can pass in a default value in the `def` statement. If we don't pass in that argument when calling the function, it will just use the default value:

In [48]:
def get_an_element(collection, index=0):
    return collection[index]

get_an_element('abcdef')

'a'

But we have the power to change that at any point:

In [49]:
get_an_element('abcdef', 4)

'e'

In other words, these arguments are *optional*

In [113]:
get_an_element('abcdef')                        # OK, index has its default value
get_an_element('abcdef', index=4)               # OK, override default value of index
get_an_element('abcdef', 4)                     # Works, not considered normal
get_an_element(collection='abcdef', index=4)    # Works, not considered normal
# get_an_element(collection='abcdef', 4)          # Illegal
# get_an_element(index=4, 'abcdef')               # Illegal, but also ambiguous

'e'

You can also write functions that take an arbitrary number of arguments. Here, the asterisk `*` is called the "splat" operator.

In [51]:
def print_all_args(*args):
    print(args)

print_all_args('a', 'b', 'c')

('a', 'b', 'c')


Note that `args` becomes a tuple containing all the arguments. You can also collect keyword arguments into a dictionary with the double-splat operator.

In [52]:
def print_all_args(*args, **kwargs):
    print(args, kwargs)
    
print_all_args('a', 'b', 'c', name='Arvid', place='Seili')

('a', 'b', 'c') {'name': 'Arvid', 'place': 'Seili'}


A combination of actual arguments and splats also work "as expected", although it's not always obvious what is expected. :-)

In [53]:
def print_all_args(a, b, *args, c=1, **kwargs):
    print(a, b, c, args, kwargs)
    
print_all_args(1, 2, 3, 4, 5, c=6, d=7, e=8)

1 2 6 (3, 4, 5) {'d': 7, 'e': 8}


Splatting also works the other way, for example, here's a function that sums three numbers:

In [54]:
def sum_three(a, b, c):
    return a + b + c

We can call it like this:

In [55]:
args = [5, 6, 7]
sum_three(args[0], args[1], args[2])

18

But this is much more elegant:

In [56]:
sum_three(*args)

18

You can mix splats and normal arguments.

In [57]:
sum_three(5, *[6, 7])

18

In [58]:
sum_three(*[5], 6, *[7])

18

A similar construction exists for keyword arguments, which requires a dictionary.

In [122]:
kwargs = {'a': 5, 'b': 6, 'c': 7}
sum_three(**kwargs)

18

#### Exercise 9: can you make a function `mult` that will return the product of any number of inputs?
NB: python has a builtin function `sum` that does it for sums

In [75]:
# %load solutions/ex1_9.py

def mult(*args):
    p=1
    for num in args:
        p*=num
    return p
    
mult(1, 3, 5, 7)

105

Combinations of regular arguments, named arguments, splat arguments and double-splat keyword arguments all work, and should produce the expected results. If Python ever produces an error, you are probably just trying to do something that doesn't make sense.

# A word on Python libraries

By default Python has many available functions and classes. Some of these functions are only accessible through importing specialized modules. Let's say you wish to do math operations. We use the `import` to make the math functions available: 

In [171]:
import math

Now we can access `math` functionalities by calling `math.` followed by the function. For instance, Euler's number is:

In [176]:
math.e

2.718281828459045

In [178]:
math.sin(1)

0.8414709848078965

But more vast is the `numpy` library, which crucially let's you work with matrices and vectors. We usually import it using an alias `np`.

In [191]:
import numpy as np

In [192]:
np.e

2.718281828459045

In [194]:
mat = np.array([[0,1,2],[3,4,5],[6,7,8]])
mat

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [196]:
# get the transpose
mat.T

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

## Comprehensions and generators

We have already had a sneak peak of comprehensions, but here we explain it more in detail.

*Comprehensions* are very useful to make code cleaner and easier to read. Let us say we have a function that determines whether a number is a prime number. (This function is very inefficient, so don't "do this at home.") If there's anything in this function that is unclear, don't worry. We'll get to it.

In [164]:
import math

def is_prime(number):
    return number > 1 and all(number % divisor != 0 for divisor in range(2, int(math.sqrt(number) + 1)))

Let us say we want to create a list of all primes up to 20. We might be tempted to write code like this. Note the use of the `range` function to loop over integers up to a maximum (like a traditional for-loop) and the `.append()` method for lists.

In [165]:
primes = []                         # Create an empty list of prime numbers
for num in range(20):               # range(20) is the collection 0, 1, 2, ..., 19
    if is_prime(num):               # Check whether it is a prime number
        primes.append(num)          # If so, add it to the list
primes

[2, 3, 5, 7, 11, 13, 17, 19]

While this works, a much more elegant solution is the following.

In [125]:
[num for num in range(20) if is_prime(num)]

[2, 3, 5, 7, 11, 13, 17, 19]

This is called a *list comprehension*, and it's a thing of beauty. (Take a moment to reflect if you like.) The basic syntax looks like this:

`[<something> for <something> in <collection>]`

or like this:

`[<something> for <something> in <collection> if <condition>]`

Note that the condition is optional, therefore we can create a list of the numbers from 0 to 19 like this.

In [126]:
[num for num in range(20)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Or, we could create a list of the *squares* of prime numbers like this:

In [127]:
[num**2 for num in range(20) if is_prime(num)]

[4, 9, 25, 49, 121, 169, 289, 361]

You can use comprehensions to create sets too.

In [128]:
{num for num in range(20) if is_prime(num)}

{2, 3, 5, 7, 11, 13, 17, 19}

Or even dictionaries. What do you think this does?

In [129]:
mydict = {num: is_prime(num) for num in range(20)}

You might think, then, that this creates a tuple:

In [130]:
something = (num for num in range(20) if is_prime(num))

However, this is a *generator*. A generator is a collection-like object that only creates output when requested. Therefore no primes have been computed yet. However when we loop over `something` (for example), primes appear.

In [131]:
for prime in something:
    print(prime)

2
3
5
7
11
13
17
19


If you try to loop over the same generator again, it won't work. They are one-use only.

In [132]:
for prime in something:
    print(prime)           # No output, `something` is empty

Looking back at the `is_prime` function again, we find this code:
    
    (number % divisor != 0 for divisor in range(2, int(math.sqrt(number) + 1)))
    
This is a generator that runs over all possible divisors to `number`. (The maximal possible divisor is the square root of `number`. We add one because the upper end of a `range` is exclusive, and we convert to an `int` because `range` doesn't work on floating point numbers.)

It then checks whether `number` leaves a remainder of zero when divided by `divisor`, i.e. whether `divisor` is an *actual* divisor to `number`. It then produces `False` if is is the case, or `True` if not.

A prime number is a number with no proper divisors. Therefore `number` is prime if *all* output of this generator are `True`. The function `all` checks this.

    all(number % divisor != 0 for divisor in range(2, int(math.sqrt(number) + 1))))
    
Python allows you to drop one layer of parentheses if a generator is the only argument to a function, which lets us write

    all(x for x in ...)
    
instead of

    all((x for x in ...))

## Iterables and itertools

In Python, an *iterable* is anything that can be iterated over, in other words anything that fits in a `for`-loop. Lists, tuples, dictionaries, sets and strings are all iterables, but we have seen others too: the return value of the `range` function is iterable, as are generators.

The Python ecosystem revolves heavily around iterables, and Python itself has a large amount of tools to work with them, often leading to very elegant code. I will present some of these tools here.

**WARNING:** With very few exceptions, all functions that return iterables return *generators*. In other words, they don't produce elements unless those elements are consumed by something, such as a `for`-loop. The exceptions are the functions `list`, `tuple`, `dict`, and `set`, which accept an iterable as an argument and then consumes it, returning the elements as a list, tuple, dictionary or set. Therefore, in the following, we will use `list(...)` to show the result of a piece of code. In regular code this would usually not be necessary.

The `map` function applies a function to each element of an iterable.

In [133]:
list(map(int, ['1', 2.0, 3.1]))

[1, 2, 3]

The `filter` function filters out the items of an iterable which fail a predicate test.

In [134]:
def has_length_two(s):
    return len(s) == 2

list(filter(has_length_two, ['a', 'abc', 'de', 'fg', 'hij']))

['de', 'fg']

Note that both `map` and `filter` can be expressed with comprehension syntax, and that this sort of syntax is usually considered preferable among Pythonistas.

The `enumerate` function allows you to iterate over both the elements of a collection *and* their indices at the same time.

In [135]:
for index, value in enumerate('abcd'):
    print(index, '=>', value)

0 => a
1 => b
2 => c
3 => d


This is much more elegant than code such as this:

In [136]:
s = 'abcd'
for index in range(len(s)):
    print(index, '=>', s[index])

0 => a
1 => b
2 => c
3 => d


The `zip` function lets you iterate over multiple iterables simultaneously, like a zipper.

In [137]:
list(zip('abcd', 'zyxw'))

[('a', 'z'), ('b', 'y'), ('c', 'x'), ('d', 'w')]

`zip` accepts an arbitrary number of iterables. They can even be of different length, and the total length of the iterable will be that of the shortest argument.

In [138]:
list(zip('abcd', 'zyx', 'abcdefghijkl'))

[('a', 'z', 'a'), ('b', 'y', 'b'), ('c', 'x', 'c')]

The `itertools` module contains much more goodies. Let's try some of them by importing it.

In [139]:
import itertools as it

The `product` function creates a Cartesian product of several iterables.

In [140]:
list(it.product([0, 1], 'ab'))

[(0, 'a'), (0, 'b'), (1, 'a'), (1, 'b')]

The `combinations` function returns subsets of a collection.

In [141]:
list(it.combinations('abcd', 2))

[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]

The `chain` function concatenates several iterables together.

In [142]:
list(it.chain('abc', range(3)))

['a', 'b', 'c', 0, 1, 2]

The `repeat` function creates an infinite iterable that just outputs a single thing. (Don't try to do `list(repeat(...))` however.)

In [143]:
it.repeat(3)   # => 3, 3, 3, ...

repeat(3)

The `cycle` function creates an iterable that cycles through another iterable endlessly.

In [144]:
it.cycle('abc')    # => 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', ...

<itertools.cycle at 0x7f027c867910>

The `count` function creates an iterable that counts up from a given number.

In [145]:
it.count(0)     # => 0, 1, 2, 3, ...

count(0)

# Classes, objects and OOP

[Classes](https://docs.python.org/3/tutorial/classes.html) provide a means of bundling data and functionality together. Creating a new class creates a new type of object, allowing new instances of that type to be made. Each class instance can have attributes attached to it for maintaining its state. Class instances can also have methods (defined by its class) for modifying its state.
The syntax for defining classes in Python is straightforward:

In [167]:
class Greeter(object):

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print('HELLO, %s!' % self.name.upper())
        else:
            print('Hello, %s' % self.name)

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"

Hello, Fred
HELLO, FRED!


In [168]:
class Person():         # Next indented block is in the class definition

    def __init__(self, name, age):  # Values specified when object is made
        self.name = name              # Link input values to the object
        self.age = age
  
    def get_first_name(self):       # A second, custom function
        names = self.name.split()     # self refers to the run-time object
        return names[0]               # Give back first word

p1 = Person('Lisa Simpson', 8)    # Make object of Person class
p2 = Person('Bart Simpson', 10)   # Make another
print(p1.age, p2.age)             # Values linked to objects - 8, 10 
print(p1.get_first_name())        # Run a linked function - gives 'Lisa'

8 10
Lisa


# Summary

### Some fundamental datatypes:
- **ints** : `1, 2, -10, 999`
- **floats** : `1.2, 3.141...`
- **bools** : `True, False`
- **strings** `'a', 'HELLO'`

You can store any object into a variable.

### Comparisons
- `==`
- `!=`
- `<`
- `>`
- `<=`
- `>=`

##### Boolean operators
- `and`
- `or`
- `not`


### Some collections
- **list** : `[1, 2, 3, 1]`
- **tuple** : `(1, 2, 3, 1)`
- **set** : `{1, 2, 3}`
- **dict** : `{'a':1, 'b':2, 'c':3}`

- **indexing** : `xs[i]`
- **slicing** : `xs[l:u]`

Lists are mutable, tuples are immutable, sets have no order and contain only unique elements, dicts store key-value pairs. Collections can also be collections of collections. 

##### check if a collection contains some value, a:
- `a in my_collection`


### Loops
- **For loops** : loops through elements in an iterable (predefined n)
- **While loops** : repeats indefinitely as long as some criteria is met

##### List comprehensions
- a more compact and more readable way of implementing (and storing values generated in) a loop

### Control flow
- **if**
- **else**
- **elif**

Let's you set up criteria for whether or not to execute some code. 
 

### Functions
- **def** keyword
- **return** keyword

Functions may or may not take arguments. The `return` keyword is not mandatory (if absent, the function returns `None`. You can store the return value of a function in a variable.

### Classes
- **methods:** functions tailored for that object.
- **attributes:** features of an object.

A class is a blueprint for making an object, with some predefined behaviour.

### Libraries
- `import library`
- `import library as lib`
- `from library import function/module`

- **numpy, pandas, matplotlib, sklearn...**

The space of libraries in Python is vast, and highly specialized. Make good use of them: don't reinvent the wheel. 


#### Other things to keep in mind:
- *Indentations are important: 4 spaces*
- *Python counts from 0*
- *Make use of the python debugger* 
- *Google is your friend*
- *Play around with complicated code to better understand what it does*
- *Make use of the offical docs (using `??` after a function or object)*


With a firm understanding of these building blocks, you can write fairly complex programs.

In [201]:
xs??