![title](https://www.dropbox.com/scl/fi/4te3zawuuvrd8g2iv6vmk/Header_SGS.01101_fancy.png?rlkey=eub0vjjwjqitlmxpf73o9c919&dl=1)

# Lesson 3 - Python programming

## Introduction

### Scope

The aim of this lecture is to learn basics in programming with the language `Python`. In particular, this notebook will demonstrate:

* Basics
* Data structures
* Indexing/indices
* Functions
* Control flow statements (loops and logic)



Python is maybe the most popular open source language in science due to its versatility. Python is a language so there are rules (syntax). Due to the large user pool, there are thousands of pre-made (already programmed by others) modules that allow analysis of various data formats and application of various methods easily.



We recall the baseline elements of programming from the previous lecture:

* **input:** Get data from the keyboard, a file, or some other device. 
* **output:** Display data on the screen or send data to a file or other device. 
* **math:** Perform basic mathematical operations like addition and multiplication.
* **conditional execution:** Check for certain conditions and execute the appropriate code. 
* **repetition:** Perform some action repeatedly, usually with some variation.



The example from the previous lesson showed this function: `print('Hello World!')`.

In one version, you saw a promt in which you entered the message that the `print()` function would `output`. This makes it already two of the elements.
- `input`: We gave the `print()` function the input `'Hello World'` or whatever you wrote in the prompt.
- `output`: We get the text output `Hello World!`

Not so obvious are things that happen invisible for us in the `print()` function. But printing the sentence `'Hello World!'` actually covers **four** elements from the list. The additional two are:
- `math`: in the background R takes the input and applies methods to create a string object, and defines how to display it on the screen 
- `conditional execution`: the print function checks if the input is valid 

*The `print()` function is actually written in a different programming language and many lines long. Great that someone else did that programming for us already  (╯°□°)╯*

***

## Basics
### Comments

In [1]:
# 2+2
# This is "only" a comment (in fact, it is very useful)
# A comment is created by using the "#" symbol. Everything after this symbol is not
# executed by Python.
# You will see no output from this cell ... that's how comments behave

### Print Function

The **`print()`** function can be used either directly with a string, or with a variable that represents a string:


In [2]:
# directly with a string
print("Hello World")

Hello World


In [3]:
# with a variable
string1="Hello World"
print(string1)

Hello World


***
### *Exercise 1*
Use the `print()` function to:

- Print the string `"My favorite number is:"` 
- and in a next line to print your favorite number
- Use the `+` function to combine the two (the sentence and the number) in one line, and then `print()` it
- The `+` only works if both elements are strings (or numbers); You can use the `str()` function to transform something into a string.


In [None]:
# write your code here:
# - remember the operator to combine strings "+"
# - remember "+" does not work if one element is a number and the other one is a string
# - the "str()" function transforms any input into a string !



***

### Relational Operators

| Symbol | Task Performed |
|----|---|
| == | True, if it is equal |
| !=  | True, if not equal to |
| < | less than |
| > | greater than |
| <=  | less than or equal to |
| >=  | greater than or equal to |

In [4]:
# What is the first and what is the second line of code doing?
z = 10
z == 1

False

In [5]:
5>10

False

In [6]:
z==1

False

In [7]:
z!=2

True

In [8]:
z>1

True

In [9]:
z<3

False

In [10]:
5>10

False

## Data Structures
In simple terms, it is the collection or group of data in a particular structure. An analogy with food could be that you store milk in a bottle, and biscuits in a box. There are some data structures that are more suitable to store a certain kind of data (milk in a bottle vs. cookies in a carton box).

The main data structures in `Python` are:
- `lists`: any concatenation of single data entries without otherwise specified structure --> `[]` or `list()`
- `array`: similar to lists but strictly numeric data; Used for matrix representation; Normally using the `numpy` package --> `numpy.array([])`(1D), `numpy.array([[],[]])`(2D), ...
- `dictionaries`: any data organized with specific `keys` --> `{"a":[1,2,3]}` (a list called `a`)
- `pandas.DataFrame`: like matrices/tables with headers and option to represent time-series --> `pandas.DataFrame()`

`Python` is a real programming language and thus it is sometimes "picky", meaning it requires specific data `structures`, and `data types`. You might see errors related to wrong data types and structures. Often you can translate the structure or data type into a different structure or type, e.g. 

- `list` to `matrix`: `numpy.array(my_list)`
- `float` to `string`: `str(my_number)`
  

### Lists
Lists are the most simple form of data structures in `Python`. They are initiated with the `[]` symbols which is the equivalent to writing `list()`. You can use the `type()` function to find about the `data structure` or `data type`.


Values are included simply by separating them with a comma,
e.g.:`[1,2,3,9,10]`

In [19]:
var1 = [1,2,4]
type(var1)

list

`Python` is doing also data type conversions for us. What happens if we add a `character` to the vector?

In [20]:
var1 = [1,2,'u']
type(var1)
var1

[1, 2, 'u']

> It is important to note here that all elements in the vector are changed to `character` at the same time ! You can see that each value and letter has quotation marks around them: `' '` 

In [None]:
# There are several functions that check (True/False) if a variable is of a certain data structure (vector, matrix, list,...),
# like so:
is.vector(var1)

### Lists
Lists are more complex than simple vectors and are initiated with the `list()` function. **They can contain different data types (numeric & character)!**

In [None]:
a = list(1,'bread')
a

In [None]:
class(a)

### Matrix (Arrays)
<div>
    <img src="https://drive.switch.ch/index.php/s/Htjt9X9IBDY8lVo/download", width='200'>
</div>

A `matrix` is defined by 2 dimensions. If we go above two dimensions (e.g. a cube), we generally talk about an `array`. So a matrix is a special case of an array. 

The 2 dimensions are the rows and columns. If we define a matrix, we can use the function-arguments `nrow` and `ncol`, standing for number of rows and number of columns.


In [None]:
m1 = matrix(c(1,2,3,4,5,6),
            nrow = 2,
            ncol = 3)
class(m1)

In [None]:
m1

We can find out about the dimensions with the `dim()` function.

In [None]:
dim(m1)


### Data Frame
A `data.frame` structure is the most comprehensive data structure in `R`. 
- It can store `numeric` and `character` data types
- Can have multiple dimensions like a `matrix`

Data frames are very useful to work with tables that contain a mix of data. Data frames are created with the `data.frame()` function.

In [None]:
df1 = data.frame(m1)
class(df1)

In [None]:
df1

As you can see above, `data.frames` are displayed also different than matrices. They have column names `colnames()`.

In [None]:
colnames(df1)

## Indexing
Indexing refers to finding or accessing the data **at certain positions** in your variables. There are slightly different ways of indices, depending on the **data structure** (vector, list, matrix, data.frame).

For multidimensional data (e.g. matrix, data.frame, array), we also need to provide more than one index (row-index, column-index).

> In `R`, Indexing starts at **1** (in `Python` and some other languages it starts at 0).

### Vector

In [None]:
# Vector
shopping_list = c('apple', 'orange', 'carrot','potato')
shopping_list

In [None]:
shopping_list[2]

### List

In [None]:
# List
shopping_list = list('apple', 'orange', 'carrot','potato')
shopping_list

In [None]:
# the list requires double brackets to get back the value:
shopping_list[[2]]

# # When using only a single bracket, we get the value in form of a new list back:
# shopping_list[2]

### Matrix

In [None]:
m2 = matrix(data = c(1,2,3,4,5,6), nrow=2)
m2

In [None]:
m2[2,2]

### Data frame

In [None]:
df2 = data.frame(m2)
df2

In [None]:
df2[2,2]

***
### *Exercise 2*

Use indexing to get the data stored in 
- the **2**nd position in the vector `var1`
- the **3**rd position of the `shopping_list`
- the **1**st row, **3**rd column of the data frame `df2`

In [None]:
# write your code here:

***

<font color=blue> $\Rightarrow$ Indices can be used to **assess** OR to **assign** data</font>

In [None]:
# This shows us the value at position 2,2
m2[2,2]

In [None]:
# This writes a new value of 1000 at position 2,2
m2[2,2] = 1000
m2

In [None]:
shopping_list[[2]] = 'chocolate'
shopping_list

***
## Indexing multiple items
With the above methods, we can get always **1** item only. In the following you see how to get multiple items from the data structures in`R`.

In general, you can use a **sequence** (of indices) for this, or a **start** and **end** index.

### Vector


In [None]:
var1 = c(1,2,7,8)
var1[1:3]  # the 1:3 means from position 1 to 3

### List

In [None]:
shopping_list[2:3]

### Matrix

In [None]:
m2[1:2,2:3]  # the first part (1:2) means from row 1 to 2, and the second part (2:3) means from column 2 to 3

### Data frame

In [None]:
df2[1:2,2:3]

> Note that the data frame output shows the row names in this example (1,2). The column names (X2,X3) show us that really only the columns 2, and 3 were selected!

### Tricks for indexing
<font color=blue>$\Rightarrow$In the case of `matrix` and `data.frame` objects, we can also select **entire** rows or columns by leaving the index for this dimension empty:</font><br>


In [None]:
m2[,2:3]

<font color=blue>$\Rightarrow$A sequence of indices using either an additional `vector` </font><br>

In [None]:
index_vector = c(1,3)
df2[ ,index_vector]

<font color=blue>$\Rightarrow$or using the `seq()` function can be used</font><br>

In [None]:
seq(from=5,to=10,by = 2)

In [None]:
index_sequence = seq(from = 1,to = 2)
df2[ ,index_sequence]

***
### *Exercise 3*
 
- Create a sequence that goes backwards from 3 to 2 (assign it to the variable `my_sequence`
- What happens if you use this sequence to select the columns of the data frame `df2`?
- What happens if you use this sequence to assign a value

In [None]:
# write your code here (first two tasks)
my_sequence = seq(from = ?, to = ?)
my_sequence

In [None]:
df2[???]

In [None]:
# write your code here (third task)
df2[???] = ???
df2

***

## Functions

We have already used several functions. You can identify them easily in `R` because they always require **parenthesis**:
- `c()` to make a vector
- `data.frame()` to make a matrix
- `seq()` to make a sequence
- ...

You have also seen already that we can provide `arguments` to functions:
- `matrix(nrow=, ncol=)`
- You can find out about the arguments if you press the [Tab] key inside the parenthesis (in a "Code Cell")

In [None]:
# Try pressing the [Tab] key inside the parenthesis!
matrix()
c()

### Defining a new function
Below are examples of self-made functions.

The most simple function:

In [None]:
thesimplestfunction = function(){
    # here could be an operation
    # but this is a very simple function that does absolutely nothing
}

### Using (calling) the function
We execute or "call" the function the same way, we did with the other ones before

In [None]:
thesimplestfunction()

<font color=blue>$\Rightarrow$But the real reasons to use our own functions are to do `math`, `conditional execution`, and `repetition`</font>

In [None]:
greet = function(name){
    #This function greets a person, whose name is passed in as an argument
    text_string = paste("Hello, ", name, ". Good morning!")
    print(text_string)
} 

In [None]:
greet('Nicolas')

***
### *Exercise 4*
- Change the `greet` function into a function that prints your age (copy & paste is your friend!)

In [None]:
# write your code here


***

### Some more useful functions 

In [None]:
var3 = c(10,40,299, 3)
# Below   you find several functions. Use the vector "var3" as input to these arguments and see what they output
#summary()
#max()
#min()
#range()
#length()

## Control Flow Statements
Often we need to control how our program executes a task, to include `conditional` information, or simply to prevent our model from crashing. 


- A navigation app will suggest the route ***`for`*** which the travel time will be the shortest

- The app should take a different route ***`if`*** there is a traffic jam on the shortest route, and use a different combination of possible paths ***`if`*** we are on foot, or ***`if`*** we are going by car. 

- And ***`while`*** we are driving, the app should still check if the jam persists or dissolves, so we can reevaluate if we should go back to the old route



![Gmap.png](https://www.dropbox.com/scl/fi/eyqvxydf118jyrhdz9b8l/01101_L3_gmap.jpg?rlkey=3yajszof5wy7t9kh8nyb448nu&dl=1)

*Figure 1: Control flow in navigation example. Depending on transport means, here by foot (left) or by car (right), and depending on traffic situation, a different route is the fastest.*

Control flow statements are created similar to a function
```
if(value1 conditional value2){
        # then some operation
}
```



### If and Else Statements
    
#### if
- check if something is True 
    - if it is True --> do it
    - if it is not True (False) --> don't do it
 

Notice that the conditional statement is within the `parenthesis` (`( )`), and what shall be done is within the `swirly brackets` (`{ }`)

In [None]:
num = 3
if(num <= 30){
    print("The value of the variable 'num' is smaller or equal to 30 (TRUE)")
    }

In [None]:
# Nothing is outputted because num > 10 is FALSE
num = 3
if(num > 30){
    print('Is this TRUE? If it is printed it is!')
    }

In [None]:
if(TRUE){
    print("TRUE is True")
    }

In [None]:
# Note that 1 stands also for TRUE and 0 for FALSE
if(1){
    print("1 is TRUE")
    }
if(0){
    print("0 is FALSE")
    }

if(!0){
    print("The exclamation mark means <<not>>; and 'not FALSE equals TRUE'")
    }

#### else: 
- if resulted in FALSE. Now another action can take place (not neccessary to have an `else`):
    - do something if and only if the `if` is FALSE

In [None]:
# the else statement executes whenever the previous if statement is FALSE
# else needs to be ALWAYS directly behind the swirly bracket of the previous if
num = 4
if(num == 1){
    print("num is 1")
}else{  # <- the else needs to be here
    print("num is not 1")
}

### Or / Not / And

| Logical Operator | Name |	Description |
| --- | --- | --- |
| **&** |AND|	If both the operands are TRUE then condition becomes TRUE. |
| **$|$** |OR|	If any of the two operands are TRUE then condition becomes TRUE. |
| **!** |NOT|	Used to reverse the logical (not FALSE becomes TRUE, not TRUE becomes FALSE) |

In [None]:
num = 4
num > 0 & num  > 15
# You can use parenthesis to make it more clear
# (num > 0) & (num > 15)

In [None]:
# both the conditions are true, so the num will be printed out
if(num > 0 & num  < 15){
    print(paste(num , "is bigger than 0 and smaller than 15"))
}
    

In [None]:
# | this operator means "OR"
# num > 0 is True, num > 15 is False
# Since the first condition is True, the overall condition is True (is EITHER statement True?)
num = 4
num > 0 | num  > 15

### The "for" loop
For loops iterates over a given `sequence`, `vector`, or `list`.
You will see that the structure is again similar to what we had before:

```
for(value in sequence){
    # do something (with the value)
}
```

Here is an example:

In [None]:
# looping through a sequence
sequence = seq(from = 1,to = 10,by = 2)
for(i in sequence){
    print(i)
}

In [None]:
# looping through a vector and print the values
primes = c(2, 3, 5, 7)
for(prime in primes){
    print(prime)
}
    

In [None]:
# looping through a list
for(i in shopping_list){
    print(i)
}

***
### *Exercise 5*
Write a program that counts the number of common elements in the vectors 

`a = c(1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89)` and 


`b = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)`

In [None]:
# replace the ???
a = c(1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89) 
b = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
counted_in_both = ???
for(i in ???){
    for(j in ???){
        if(??? == ???){
            counted_in_both = counted_in_both + 1
        }
    }
}
counted_in_both

***