![title](https://www.dropbox.com/scl/fi/4te3zawuuvrd8g2iv6vmk/Header_SGS.01101_fancy.png?rlkey=eub0vjjwjqitlmxpf73o9c919&dl=1)

# Lesson 3 - Python programming

## Introduction

### Scope

The aim of this lecture is to learn basics in programming with the language `Python`. In particular, this notebook will demonstrate:

* Basics
* Data structures
* Indexing/indices
* Functions
* Control flow statements (loops and logic)



Python is maybe the most popular open source language in science due to its versatility. Python is a language so there are rules (syntax). Due to the large user pool, there are thousands of pre-made (already programmed by others) modules that allow analysis of various data formats and application of various methods easily.



We recall the baseline elements of programming from the previous lecture:

* **input:** Get data from the keyboard, a file, or some other device. 
* **output:** Display data on the screen or send data to a file or other device. 
* **math:** Perform basic mathematical operations like addition and multiplication.
* **conditional execution:** Check for certain conditions and execute the appropriate code. 
* **repetition:** Perform some action repeatedly, usually with some variation.



The example from the previous lesson showed this function: `print('Hello World!')`.

In one version, you saw a promt in which you entered the message that the `print()` function would `output`. This makes it already two of the elements.
- `input`: We gave the `print()` function the input `'Hello World'` or whatever you wrote in the prompt.
- `output`: We get the text output `Hello World!`

Not so obvious are things that happen invisible for us in the `print()` function. But printing the sentence `'Hello World!'` actually covers **four** elements from the list. The additional two are:
- `math`: in the background R takes the input and applies methods to create a string object, and defines how to display it on the screen 
- `conditional execution`: the print function checks if the input is valid 

*The `print()` function is actually written in a different programming language and many lines long. Great that someone else did that programming for us already  (╯°□°)╯*

***

## Basics
### Comments

In [1]:
# 2+2
# This is "only" a comment (in fact, it is very useful)
# A comment is created by using the "#" symbol. Everything after this symbol is not
# executed by Python.
# You will see no output from this cell ... that's how comments behave

### Print Function

The **`print()`** function can be used either directly with a string, or with a variable that represents a string:


In [2]:
# directly with a string
print("Hello World")

Hello World


In [3]:
# with a variable
string1="Hello World"
print(string1)

Hello World


***
### *Exercise 1*
Use the `print()` function to:

- Print the string `"My favorite number is:"` 
- and in a next line to print your favorite number
- Use the `+` function to combine the two (the sentence and the number) in one line, and then `print()` it
- The `+` only works if both elements are strings (or numbers); You can use the `str()` function to transform something into a string.


In [4]:
# write your code here:
# - remember the operator to combine strings "+"
# - remember "+" does not work if one element is a number and the other one is a string
# - the "str()" function transforms any input into a string !



***

### Relational Operators

| Symbol | Task Performed |
|----|---|
| == | True, if it is equal |
| !=  | True, if not equal to |
| < | less than |
| > | greater than |
| <=  | less than or equal to |
| >=  | greater than or equal to |

In [5]:
# What is the first and what is the second line of code doing?
z = 10
z == 1

False

In [6]:
5>10

False

In [7]:
z==1

False

In [8]:
z!=2

True

In [9]:
z>1

True

In [10]:
z<3

False

In [11]:
5>10

False

## Packages
Python is extremely popular because of a large community and their contributions in form of `packages` or `libraries`. A package is a collection of `functions` and `definitions` on how to handle and process data in a specific way. There are basic ones like `numpy` or `math` that provide ways to handle **matrices** and give constants like **pi** and trigonometric functions like **cosinus**. Packages must be installed once before using them (all packages we will use in this course are already installed), and loaded when we want to use them.

The cell below shows how a package is loaded. Once loaded, all the functions of that package are available to us. These functions can be accessed through the `namespace` and the `"."` symbol; in the example below, `numpy` is the namespace and `array()` is the function of that namespace.


In [12]:
# Loading the numpy package
import numpy

In [13]:
numpy.array([1,2,3])

array([1, 2, 3])

### Import "alias"

Programmers are lazy. Instead of writing a long package name `namespace` over and over again, it is possible to define that a package name should be shortened. For the same example as above, we can instead do an alias load and call the function from this alias.

In [14]:
import numpy as np
np.array([1,2,3])

array([1, 2, 3])

### Namespace "."

A package contains usually several functions, definitions, and variables. To indicate these belong to a specific package, the `"."` visually shows us this connection. A package has a specific name, so within this space -- hence `namespace`-- we find these functions and values. Below you see some examples how different values and functions from the numpy package are called.

In [15]:
np.pi

3.141592653589793

In [16]:
np.cos(3.141592)

np.float64(-0.9999999999997864)

### Namespace ".", ".", ...

For some data structures (see below), specific functions become available. For example, a function that would insert a number into a list makes only sense if we have a list! This might lead to seeing multiple dots combined `"."`. Here is an example with a list, where we count how many times the value `4` is inside this list.

In [17]:
a = [4,5,6,4]
type(a)

list

In [18]:
a.count(4)

2

## Data Structures
In simple terms, it is the collection or group of data in a particular structure. An analogy with food could be that you store milk in a bottle, and biscuits in a box. There are some data structures that are more suitable to store a certain kind of data (milk in a bottle vs. cookies in a carton box).

The main data structures in `Python` are:
- `lists`: any concatenation of single data entries without otherwise specified structure --> `[]` or `list()`
- `array`: similar to lists but strictly numeric data; Used for matrix representation; Normally using the `numpy` package --> `numpy.array([])`(1D), `numpy.array([[],[]])`(2D), ...
- `dictionaries`: any data organized with specific `keys` --> `{"a":[1,2,3]}` (a list called `a`)
- `pandas.DataFrame`: like matrices/tables with headers and option to represent time-series --> `pandas.DataFrame()`

`Python` is a real programming language and thus it is sometimes "picky", meaning it requires specific data `structures`, and `data types`. You might see errors related to wrong data types and structures. Often you can translate the structure or data type into a different structure or type, e.g. 

- `list` to `matrix`: `numpy.array(my_list)`
- `float` to `string`: `str(my_number)`
  

### Lists
Lists are the most simple form of data structures in `Python`. They are initiated with the `[]` symbols which is the equivalent to writing `list([])`. You can use the `type()` function to find out about the `data structure` or `data type`.


Values are included simply by separating them with a comma,
e.g.:`[1,2,3,9,10]`

In [19]:
var1 = [1,2,4]
type(var1)

list

In [20]:
# You can check (True/False) if a variable is of a certain data structure (list, matrix, list,...),
# like so:

isinstance(var1, list)  
# 1. argument is the variable, 2. argument is which structure 

True

### Matrix (Arrays)
<div>
    <img src="https://drive.switch.ch/index.php/s/Htjt9X9IBDY8lVo/download", width='200'>
</div>

A `matrix` is defined by 2 dimensions. More generally, for any kind of dimension (e.g. a cube), we talk about an `array`. So a matrix is a special case of an array. The 2 dimensions are the rows and columns. 

Arrays can be of any desired dimension (1D, 2D, 3D, 4D, ...). For arrays we use usually the `numpy` package, which is very efficient for calculations. `numpy` is used in many other packages because of that, like in `pandas`, which we will use mostly during this course.

In [21]:
import numpy as np

In [22]:
# 1D
a = np.array([1,2])
a

array([1, 2])

In [23]:
# 2D
b = np.array([[1,2],[3,4]])
b

array([[1, 2],
       [3, 4]])

As our two variables `a` and `b` are `numpy.arrays` now, we have access to functions and properties of these **objects**. We can, for example, look at the dimension of the array using their inherent property `shape`.

In [24]:
b.shape
# 2 rows, 2 columns

(2, 2)

### Pandas DataFrame
`pandas` is a package that facilitates working with spreadsheet data. It combines reading functions (text, Excel, ...), organizing functions, for example for dates and times, selecting data from querries, and also includes a rich set of plotting functions for visualizing.

A `pd` (short for `pandas`) `DataFrame`: 
- can store `numeric` and `character` data types
- can have multiple dimensions like a `matrix`

In the cell below, the function `DataFrame()` is used to convert the 2D array `b` into a `DataFrame`.


In [25]:
import pandas as pd

In [26]:
df1 = pd.DataFrame(b)
df1

Unnamed: 0,0,1
0,1,2
1,3,4


Note how the top-row, and the left-hand side row are displayed with **bold numbers**. These are the `DataFrame`-`columns` and `DataFrame`-`index` names.

These are properties of the `DataFrame` and so they are accessible with `".columns"` and `".index"`.

In [27]:
df1.columns

RangeIndex(start=0, stop=2, step=1)

Since `df1` is now a `pandas` object, we also have many functions available; also accessible using `"."`, e.g. the `mean` for each row with `".mean()"`. 


***Note: the functions are called with the `()`, and the properties without ~~`()`~~***. 

In [28]:
df1.mean()

0    2.0
1    3.0
dtype: float64

#### Pandas Columns and Index
When reading in data from a spreadsheet files, you usually have column names. If you read in time-series you often use the date and time as first information (row names). We can assign column names and row names with the respective properties `.columns` and `.index`.


In [29]:
df1.columns = ['Column1', 'Column2']
df1

Unnamed: 0,Column1,Column2
0,1,2
1,3,4


In [30]:
df1.index = ['Row A', 'Row B']
df1

Unnamed: 0,Column1,Column2
Row A,1,2
Row B,3,4


<div id="Indexing"/>
    
## Indexing 


Indexing refers to finding or accessing the data **at certain positions** in your variables. There are slightly different ways of using indices, depending on the **data structure** (list, array, DataFrame).

For multidimensional data (e.g. array, DataFrame), we also need to provide more than one index (**row-index, column-index**).

<span style="color:red">
    
- In `Python`, indexing starts at **0** ! So the first position has the index value 0
- Indices are always in brackets `[2]` (3rd value).
- To get all positions, we use the `:` symbol --> `[:]`
- To count backwards (starting at the end we use minus symbol and start counting at `-1` for the last position) --> `[-1]`
- A `:` can also be used to give a range, e.g. index **1 to 3** `[1:4]` -- This might be the most confusing one because we use  `:4`. The example below shows the locations of the indices. It is similar to saying a time like "from 1 to 4 p.m." would be the hours 1 + 2 + 3 but not 4.




</span>

```
Example: [0:3]

Index:    0   1   2   3   4   5   6  
         | 0 | 1 | 2 | 3 | 4 | 5 |
           ^       ^  
Start at 1, Stop at 3

```

### List

In [40]:
# List
shopping_list = ['apple', 'orange', 'carrot','potato']
shopping_list

['apple', 'orange', 'carrot', 'potato']

### Array

In [32]:
m2 = np.array([[1,2,3],[4,5,6]])
m2

array([[1, 2, 3],
       [4, 5, 6]])

In [33]:
m2[1,2]

np.int64(6)

### DataFrame

`pandas` DataFrames use the functions `loc` and `iloc` for acces via indices. The `loc` function uses the column and row names, whereas `iloc` uses the position (**i** for index). 

In [34]:
df2 = pd.DataFrame(m2)

# Assigning names to columns and rows
df2.columns = ['Column1', 'Column2', 'Column3']
df2.index = ['A', 'B']
df2

Unnamed: 0,Column1,Column2,Column3
A,1,2,3
B,4,5,6


#### Using '.loc[row_name][col_name]'
The `.loc` function takes for a 2D DataFrame 2 arguments in the form: `df.loc[in,jn]`, where `in` is the row `name`, and `jn` the column `name`.

In [35]:
df2.loc["A","Column2"]

np.int64(2)

In [36]:
df2.loc["B","Column2"]

np.int64(5)

#### Using '.iloc[row_index][col_index]'
The `.iloc` function takes for a 2D DataFrame 2 arguments in the form: `df.iloc[i,j]`, where `i` is the row `index`, and `j` is the column `index`.

In [37]:
df2.iloc[0,1]  # same as df2.loc['A','Column2']

np.int64(2)

In [38]:
df2.iloc[1,1]  # same as df2.loc["B","Column2"]

np.int64(5)

***
### *Exercise 2*

Use indexing to get the data stored in 
- the **2**nd position in the list `var1`
- the **3**rd position of the `shopping_list`
- the **1**st row, **3**rd column of the data frame `df2`

In [39]:
var1[?]

SyntaxError: invalid syntax (402574901.py, line 1)

In [None]:
shopping_list[?]

In [None]:
df2[?,?]

***

<font color=blue> $\Rightarrow$ Indices can be used to **assess** OR to **assign** data</font>

In [None]:
# write your code here:
var1[1] = "Eric"
var1

In [None]:
shopping_list[2] = 'Eric'
shopping_list

In [41]:
df2.iloc[0,2] = 'Eric'
df2

  df2.iloc[0,2] = 'Eric'


Unnamed: 0,Column1,Column2,Column3
A,1,2,Eric
B,4,5,6


***
## Indexing multiple items
With the above methods, we can get always **1** item only. In the following you see how to get multiple items from the data structures in`Python`.

- You can use a **sequence** (of indices) for this, or a **start** and **end+1** index.
- Think of the **slicing** as using the values `x`+`n`, as in `a[x:x+n]`
    - if `x` is for example `2`, and `n` is the number of values you want to have (e.g. `2`), this gives:
        - `a[2:2+2]` or `a[2:4]`

Remember the way indices are located ([Section "Indexing"](#Indexing)). We require for the end index to increase its number by `+1`. Check out the examples below.


### List


In [48]:
var1 = [1,2,7,8]
var1

[1, 2, 7, 8]

In [50]:
var1[1:3] 

[2, 7]

In [51]:
shopping_list

['apple', 'orange', 'carrot', 'potato']

In [52]:
shopping_list[1:1+2]

['orange', 'carrot']

### Array

In [53]:
m2[0:3,1:2]  # the first part (0:3) means from row 0 to 2, and the second part (1:2) means from column 1 to 1=

array([[2],
       [5]])

### DataFrame

In [68]:
df2.iloc[0:2,1:3]

Unnamed: 0,Column2,Column3
A,2,Eric
B,5,6


In [71]:
df2.iloc[0,:]

Column1       1
Column2       2
Column3    Eric
Name: A, dtype: object

<font color=blue>$\Rightarrow$A sequence of indices using an additional `list` </font><br>
***Note: Here we do not have a slice but two single indices (0 and 2)***

In [75]:
index_vector = [0,2]
df2.iloc[: ,index_vector]

Unnamed: 0,Column1,Column3
A,1,Eric
B,4,6


***
### *Exercise 3*
Below you see the DataFrame called `df3`. 
You can actually use the indices even more flexibly. Instead of writing `df3.iloc[:,:]`, we will now use :
- `df3.iloc[start:stop:increment, start:stop:increment]`
  
You have one example where the rows C and E are extracted. Try to reverse the order by choosing:
- a negative increment (e.g. `ii = -1`. You need to adjust start and end accordingly (you start at the back and go towards the front!
- How can you get a subset of df3 with rows (F,E,D) and columns (6,5,4) (both reversed)?

In [127]:
data = np.arange(1, 43).reshape(6, 7, order='F')
df3 = pd.DataFrame(data,
                  columns=['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 5', 'Column 6', 'Column 7'],
                  index=['A','B','C','D','E','F'])
df3

Unnamed: 0,Column 1,Column 2,Column 3,Column 4,Column 5,Column 6,Column 7
A,1,7,13,19,25,31,37
B,2,8,14,20,26,32,38
C,3,9,15,21,27,33,39
D,4,10,16,22,28,34,40
E,5,11,17,23,29,35,41
F,6,12,18,24,30,36,42


In [133]:
i0 = 2  # start row index
i1 = 5  # end row index
ii = 2  # increment (e.g. jump 2 instead of 1 row)
# 2:5:2 -> From 2 to 5 by steps of 2. This yields: row3(i=2), row5(i=4)
df3.iloc[i0:i1:ii,:]

Unnamed: 0,Column 1,Column 2,Column 3,Column 4,Column 5,Column 6,Column 7
C,3,9,15,21,27,33,39
E,5,11,17,23,29,35,41


***

## Functions

We have already used several functions. You can identify them easily in `Python` because they always require **parenthesis**:
- `print()` to print
- `pd.DataFrame()` to make a DataFrame
- ...

You have also seen already that we can provide `arguments` to functions:
- `pd.DataFrame(data=, columns=)`
- You can find out about the arguments in the `help()` description - see below.
- Alternatively you can turn on "Contextual Help" --> Menu --> Help --> Contextual Help

In [140]:
# Try pressing the [Tab] key inside the parenthesis!
# help(pd.DataFrame())


### Defining a new function
Below are examples of self-made functions.

The most simple function:

In [145]:
def thesimplestfunction():
    # here could be an operation
    # but this is a very simple function that does absolutely nothing
    return 


### Using (calling) the function
We execute or "call" the function the same way, we did with the other ones before

In [146]:
thesimplestfunction()

<font color=blue>$\Rightarrow$But the real reasons to use our own functions are to do `math`, `conditional execution`, and `repetition`</font>

In [150]:
def greet(name):
    #This function greets a person, whose name is passed in as an argument
    text_string = "Hello, " + name + ". Good morning!"
    print(text_string)


In [151]:
greet('Nicolas')

Hello, Nicolas. Good morning!


***
### *Exercise 4*
- Change the `greet` function into a function that prints your age (copy & paste is your friend!)

In [None]:
# write your code here


***

### Pandas - Some more useful functions 
As said before, we will mainly work with `pandas.DataFrame` and there are several built-in functions available.

There are plenty related to statistics:
- mean()
- max()
- min()
- std()
- median()

When you use e.g. the `.mean()` you will receive the mean values **for each column**. Why is that? The `.mean()` function has an `argument` (additional information on how to execute the function) that is called `axis=`. The default is `axis=0` (which is the `index` / y-direction).

Look in the example below and:
- Add the argument `axis=0` in the function (df3.mean(axis=0)). Does this change the result?
- Change the argument value of axis to 1.
- What happens if you change the value to `None`?

In [161]:
df3.mean()

Column 1     3.5
Column 2     9.5
Column 3    15.5
Column 4    21.5
Column 5    27.5
Column 6    33.5
Column 7    39.5
dtype: float64

***
## Control Flow Statements
Often we need to control how our program executes a task, to include `conditional` information, or simply to prevent our model from crashing. 


- A navigation app will suggest the route ***`for`*** which the travel time will be the shortest

- The app should take a different route ***`if`*** there is a traffic jam on the shortest route, and use a different combination of possible paths ***`if`*** we are on foot, or ***`if`*** we are going by car. 

- And ***`while`*** we are driving, the app should still check if the jam persists or dissolves, so we can reevaluate if we should go back to the old route



![Gmap.png](https://www.dropbox.com/scl/fi/eyqvxydf118jyrhdz9b8l/01101_L3_gmap.jpg?rlkey=3yajszof5wy7t9kh8nyb448nu&dl=1)

*Figure 1: Control flow in navigation example. Depending on transport means, here by foot (left) or by car (right), and depending on traffic situation, a different route is the fastest.*

Control flow statements are created similar to a function
```
if(value1 conditional value2){
        # then some operation
}
```



### If and Else Statements
    
#### if
- check if something is True 
    - if it is True --> do it
    - if it is not True (False) --> don't do it
 

Notice that the conditional statement is within the `parenthesis` (`( )`), and what shall be done is within the `swirly brackets` (`{ }`)

In [None]:
num = 3
if(num <= 30){
    print("The value of the variable 'num' is smaller or equal to 30 (TRUE)")
    }

In [None]:
# Nothing is outputted because num > 10 is FALSE
num = 3
if(num > 30){
    print('Is this TRUE? If it is printed it is!')
    }

In [None]:
if(TRUE){
    print("TRUE is True")
    }

In [None]:
# Note that 1 stands also for TRUE and 0 for FALSE
if(1){
    print("1 is TRUE")
    }
if(0){
    print("0 is FALSE")
    }

if(!0){
    print("The exclamation mark means <<not>>; and 'not FALSE equals TRUE'")
    }

#### else: 
- if resulted in FALSE. Now another action can take place (not neccessary to have an `else`):
    - do something if and only if the `if` is FALSE

In [None]:
# the else statement executes whenever the previous if statement is FALSE
# else needs to be ALWAYS directly behind the swirly bracket of the previous if
num = 4
if(num == 1){
    print("num is 1")
}else{  # <- the else needs to be here
    print("num is not 1")
}

### Or / Not / And

| Logical Operator | Name |	Description |
| --- | --- | --- |
| **&** |AND|	If both the operands are TRUE then condition becomes TRUE. |
| **$|$** |OR|	If any of the two operands are TRUE then condition becomes TRUE. |
| **!** |NOT|	Used to reverse the logical (not FALSE becomes TRUE, not TRUE becomes FALSE) |

In [None]:
num = 4
num > 0 & num  > 15
# You can use parenthesis to make it more clear
# (num > 0) & (num > 15)

In [None]:
# both the conditions are true, so the num will be printed out
if(num > 0 & num  < 15){
    print(paste(num , "is bigger than 0 and smaller than 15"))
}
    

In [None]:
# | this operator means "OR"
# num > 0 is True, num > 15 is False
# Since the first condition is True, the overall condition is True (is EITHER statement True?)
num = 4
num > 0 | num  > 15

### The "for" loop
For loops iterates over a given `sequence`, `vector`, or `list`.
You will see that the structure is again similar to what we had before:

```
for(value in sequence){
    # do something (with the value)
}
```

Here is an example:

In [None]:
# looping through a sequence
sequence = seq(from = 1,to = 10,by = 2)
for(i in sequence){
    print(i)
}

In [None]:
# looping through a vector and print the values
primes = c(2, 3, 5, 7)
for(prime in primes){
    print(prime)
}
    

In [None]:
# looping through a list
for(i in shopping_list){
    print(i)
}

***
### *Exercise 5*
Write a program that counts the number of common elements in the vectors 

`a = c(1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89)` and 


`b = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)`

In [None]:
# replace the ???
a = c(1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89) 
b = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
counted_in_both = ???
for(i in ???){
    for(j in ???){
        if(??? == ???){
            counted_in_both = counted_in_both + 1
        }
    }
}
counted_in_both

***