![header](figures/Header_SGG_02101.png)


# Programming basics

Programming aims to **find solutions to our problems** that we cannot (or do not want to) solve manually.

For a quick overview on your understanding, please fill out the questionnaire: https://tinyurl.com/w5metyhv

In [1]:
3 + 3

6

## Programming language Python
We will use `Python` as our programming language. Key points on `Python`:
- Omnipresent language in operating systems and scientific computing 
- Large user numbers and user-made packages
- Huge amount of documentation and user groups
- Object oriented (designed to be used without the need for detailed knowledge about computer components)
- Fast despite being high level language

`Python` is a stand-alone software. There are different user interfaces available to make it easier to work with it, e.g.:
- (base version / terminal)
- Pycharm, Spyder, ...
- **Jupyter Notebooks** 

![image_R_APIs](figures/api_python.png)

*Figure 1: The forms/flavors of `Python`. From left to right: Console/terminal, Pycharm, Jupyter Notebook.*

### Jupyter and Renkulab
- Jupyter Notebooks have a simple, and intuitive design (website). 
- Allows saving the output in form of a website with outputs and images inside (good for sharing your work)

Jupyter (Notebooks) can run on your own computer. Because this requires the installation and configuration of two programs (Anaconda, and Python) this can cause some problems, which would take a lot of time to solve for many students. Especially when installing so-called external `libaries` or `packages`, which require additional installation and configuration. Therefore, we use the `JupyterLab`/`RenkuLab` environment (a special form developped by the Swiss Data Science Center).

![structure_renkulab](figures/python_RENKU_flow.jpg)

*Figure 2: Layercake environment. `Python` is the basis to do the programming. If used in PyCharm or Jupyter directly, users have individual installations with slight variations (and problems). Renkulab is running on a dedicated server and provides a "copy" of a running Python-version and Jupyter environment that allows every user to work with the same software.*

**RenkuLab** is a centralized server that runs Jupyter for us. In our case, this will be the RenkuLab (https://renkulab.io). On this server we have created a working environment for you. This working environment is an installation of Jupyter, Python, and all needed packages.
You can log in to (https://renkulab.io) using your **SWITCH account** and use it as if it was your own Jupyter installation. Renkulab will run inside your browser. All you need is thus a browser! But it is recommended to use a laptop (not a smartphone) so you can better see the content and have a proper keyboard.

> **NOTE:** Some browsers might cause problems (especially Safari). We tested everythin

> Make sure to save your jupyter notebooks <font color=red>**and copy them to your personal computers**  </font> after the classes.



## Working with online resources
> From Quantitative Methods I (-> Geostatistic) Lecture 2

Programming aims to **find solutions to our problems**! It is not of central scope, or even impossible, to know all the programming syntax. 
It is more important to know that most solutions for programming-related issues have already been dealt with by others. These solutions often result from someone asking in online forums "How can I solve this particular problem?". It might be a challenge to find these solutions because you will have to know the search terms.<br> (**note:** *searching in English will often have a higher success rate because the English speaking community is the biggest*)


You can also find plenty of R online books, courses and tutorials. 

* In English: A large variety of online courses and tutorials exists
    * "R for Data Science" by R4DS (R for data science) written by Wickham and Grolemund https://r4ds.had.co.nz/
    * Official R guide for beginners (a bit tech-savy!): https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf

* In German: 
    * "Angewandte Statistik - Methodensammlung mit R" Hedderich and Sachs (https://link.springer.com/book/10.1007%2F978-3-662-45691-0)

<br/><br/>
A good source to look for specific questions is the website of stackexchange:

* https://stackoverflow.com/questions/tagged/r

* ChatGPT: ChatGPT can also be extremely useful **understanding** code and get information why your code is not working. <font color=red> $\Rightarrow$ Do not use to ChatGPT to write all your code. You will not learn! Use it as an assistance, and when you get stuck. ChatGPT fails often to solve complex problems. The time you will spent to tell chatGPT the details of your problem is the same as learning it yourself.</font>
![chatgpt](figures/chatgpt_r.jpg)
*Figure 3: Examples using ChatGPT to get example code and understand an error message.*


<font color=blue> $\Rightarrow$ You will come accross the terms IDE (integrated development environment) and GUI (graphical user interface). If you are going to program more seriously in the future it is recommended to download such an IDE/GUI platform. They will help to organize bigger programming task. However, for this course it is absolutely not neccessary because Jupyter is providing us with all neccessary functionalities for the course.

</font>


***


***
## Programming

> From QM I / Geostatistic (SGG.00272/SGG.01101) Lecture 2:

The individual pieces of a program:

* **input:** Get data from the keyboard, a file, or some other device. 
* **output:** Display data on the screen or send data to a file or other device. 
* **math:** Perform basic mathematical operations like addition and multiplication.
* **conditional execution:** Check for certain conditions and execute the appropriate code. 
* **repetition:** Perform some action repeatedly, usually with some variation.

*(free after: Allen Downey - Think Python www.thinkpython.com)*

![image_programming](figures/programming_overview_bakery_code_v3.jpg)

*Figure 4: Programming analogy: A bakery where inputs (ingredients) used with the right methods (kneading, baking) produces the output (bread, pastries).*

<br/><br/>


### Why programming?

- <font color=red>Automated **problem solving**</font>
    - vs. manual work (comfort)
- <font color=red>Application of advanced methods</font>
    - Quantitative methods require (complex) mathematical operations, too difficult to apply by hand in many cases
- <font color=red>Recurring application</font>
    - Same data format and operations but different data (e.g. climate diagrams, population growth, demography, COVID)
        - Anything where the data is updated but the format stays the same
    - You can change the input data and all caluclations/analyses are performed the same way.
- <font color=red>Reproducibility </font>
    - Allow others to come to the same results (Scientific standard) 
- <font color=red>Large data amounts</font>
- <font color=red>Interdisciplinarity</font>
    - Dealing with data from different fields
    - Different formats
    - Merging them into combined analysis
    
### How to write your programs?

- <font color=red>Concept / Idea !</font>
    - Understand the problem (Human language)
    - Translate into logic and mathematical operations
    - Think of what kind of data needs to be used
- <font color=red>Single case solution</font>
    - Testing the logical and mathematical operations
    - Use training data or a small sample of the original dataset
- <font color=red>Upscaling </font>
    - Application to the general case / all the data
    - Loops
    - Functions (self-made)




<font color=green>**For newcomers (and those who forgot)**:
This course is build up on Quantitative Methods I. In QM I, an introduction to programming was given and we cannot repeat all of this here. You should be able to understand the programming examples that will follow. 

<font color=green>In order to "catch up", a deatailed tutorial script that goes through the most important aspects is provided on moodle. It is highly recommended that you practice by yourself so that at the end you can understand the code of this Lecture. You have three weeks time. Do little by little. Do not hesitate to play around in the script (change things and see what happens). If everything "breaks" you can just download the file again.

<font color=green>The renkulab setup is explained at the end - also look at the "Renku Setup PDF" on moodle. The practice script is provided in a plain `R` script format. This can be opened in `R` or `Rstudio`. **It is recommended to use `Rstudio` on your personal computers**. If you have `Jupyter` installed on your personal computer, you can copy the content of the script (plain text format) and paste it into multiple "coding" cells within a new `Jupyter` Notebook. The script is very long, so it would be difficult to read it if all is in one cell. 

<font color=green>Instructions on how to install `R` and `Rstudio` on your own personal computer are provided in a separate document on moodle.
</font>
***


#### Programming examples 

In quantitative analysis we deal with data and a question that we aim to answer. Examples from SGG.00272 are:
- *Did the COVID deaths increase in the second wave of the pandemic?*
- *Did air temperatures rise significantly over the last 50 years in Switzerland?*

We used statistical analysis, such as statistical tests and linear regression, as well as graphical respresentations of the data to answer these questions. The datasets that were the basis for these analyses are sometimes very large (too large in some cases to open e.g. in Excel). 

In `R` we can open many different data formats (e.g. tables in plain text format, images in various formats, satellite data in special formats). 



***
### How to write your programs - step by step
In SGG.00272, you learned already how to write programs in the Jupyter Notebook. The two types of **cells** (this one is a markdown cell, and the cells below are program code cells) serve the purpose of documenting and calculating. We will therefore use the markdown cells to write down the ideas, and the code cells to run the code we derive from our ideas. 

#### Data
A concept or an idea arises of course from a problem. We will generate one by creating data after this overview. The first one will be some numbers, and the second one a table of movies with ratings how suitable they are for different moods.

#### Concept/Idea
At the beginning of a program is your idea of what you want to do. If you have a concept of how to solve the problem, you can work on translating it into program code. The more concrete your idea is on what a program shall do, the better.

A simple example:

You want to find the smallest number in a vector; a real world example could be **who is the smallest person in the first row?**. We use the variable `mydata1` that represents a `vector`(it is like a list of values) of data as an example. The concept could look like this:
<font color=blue>
1. Start by assuming that the first value (**the first person**) is the smallest number (person)
2. Go through all the values (students), one by one, and test if maybe another value (person) is smaller than the first value
3. If that is the case, mark this value as the smallest value unless we find another value that is even smaller.
4. Continue until we went through all the values
</font>

The computer does of course not understand our instructions like this. We have to use operations to control the 
- **flow** and to do the 
- **logic**

<br/><br/>
#### Single case solution
The basis to solve if a value is smaller than another is a logical operator
- `<` smaller
- `<=` smaller or equal
- `>` greater
- `>=` greater or equal
- `!=` not equal
- `==` equal

<font color=red> $\Rightarrow$ The single equal sign `=` is used to assign a variable, not for logical operations!</font>

#### Upscaling
Once we created, tested, and veryfied the single case solution, we can apply our method to the entire dataset we want to investigate. This addresses points 2 to 4 of our concept outline.

The **flow** operations become important now. How do we ensure that we go through **all** the data, and implement the **logic**.

**Control flow statements** are the solution. They comprise loops and conditional execution commands:
- `for()` loops
- `if()` statements

<br/>

### Example 1

##### Data generation
The methods to open and access data are often the first step --> data basis. In the following we recap two methods to get started with data.

In [None]:
# Create a vector with data that we typed in by hand
mydata1 = c(190,154,169,187.8)  # the height of the students in the first row in cm
mydata1

##### Single case solution 

In [None]:
# single case solution with example data
smallest_number = 190  # corresponding to the first value of mydata1
value_to_compare = 154
smallest_number < value_to_compare 


The basis for our program will be this comparison. However, we have to now think of the **flow** so that we can do a comparison with all the values.

<br/><br/>

##### Upscaling 
Our concecpt requires to go bigger, because we do not just want to compare two values, but find the smallest value of all of the values in our vector `mydata1`.

We will translate all the individual parts of our concept into code, one by one.

In [None]:
# --- translating our concept --- #
# 1 - take first value as smallest number to start with
smallest_number = mydata1[1]

# 2 - there are two things we want to do:
# 2.1 - go through all values
for(value in mydata1){
    
}
# 2.2 - test if a values is smaller than the one we think is the smallest one
# value < smallest_number

# 3 - if we find a smaller value, then we assign it as the new smallest value
test_true_or_false = (value < smallest_number)
# test_true_or_false

if(test_true_or_false){
    smallest_number = value
}

# 4 - Point 4 is actually covered by point 2.1 (go through all the values)

In [None]:
for(value in mydata1){
    print(value)
    
}

The program is not ready yet. So far, each individual step stands by itself. We have to think about the **order of the indvidual tasks** now.

We have to add them together in a way that it corresponds to our concept. In particular, we have to do the test (2.2) for every value. 
- So the test must be **in** the `for`-loop (2.1)
- And point (3 - overwite the smallest number) needs to be **after** the test.

In [None]:
smallest_number = mydata1[1]
for(value in mydata1){
    if(value < smallest_number){
        smallest_number = value
#         print("Found smaller number:")
#         print(value)
    }
}

paste('The smallest number in our dataset is:',smallest_number)

In [None]:
mydata1


***
The **logic** in this example is the test, whether `value` is smaller than the value of the variable `smallest_number`.

The **flow** in this example is controled by the `for` loop to go through each value of the vector `mydata1`, and by the `if` statement, that only allows to overwrite the variable `smallest_number` **if** a value is smaller.

The first category is hence called:
- logical operation/statement

And the second one:
- control flow operation/statement

***

### Example 2

<font color=red>Find all the movies in `mydata2` (see below) that are good to watch while traveling.</font>

What do we need in our concept? Follow the link and select your answers:
https://tinyurl.com/r6h24edc


In [None]:
# read in external data (here in form of a text-file table)
mydata2 = read.table('../../data/movie_list.csv', sep=';', header=TRUE)
mydata2

#### Part 1
Start with your concept:

1. Select the data that describes if the movie is a **potential Travel** category movie
2. Test if the category is actually **Travel**
3. Remember **which** (indices!) of the movies is such a category
4. **Select** the data that will tell us the movie name
5. Use the indices to **select** the movie name



<br/>
<br/>

#### Part 2
Translating the concept into code:

In [None]:
mydata2$Mood

In [None]:
# 1 - where is the information stored?
# you can have also a look at the data again to find the respective column by calling the variable
# mydata2  # outcomment(remove first '#') and run
category = mydata2$Mood
category_to_find = 'Travel'

# 2 - test
category == category_to_find
# mydata2$Mood
# # 3 - remember the position
index_to_remember = which(category == category_to_find)

# index_to_remember
# # 4 - select movie name data
movie_names = mydata2$Name
# movie_names
# # 5 - select the movie names using the indetified indices
movie_names[index_to_remember]

In [None]:
mydata2

In [None]:
mydata2$Name[which(mydata2$Mood == 'Travel')]

***
Unlike the first example, we do not need a `loop` or `conditional` control flow statements. We directly get the results. If we had two movie files, we could need a loop. Imagine the data was split into two parts:

In [None]:
mydata3 = mydata2[1:4,] # rows 1 to 4 of mydata2
mydata4 = mydata2[5:NROW(mydata2),]  # rows 5 to the end of mydata2
mydata3
mydata4

# creates a list object that has mydata3 in position 1, and mydata4 in position 2
movie_list = list(mydata3, mydata4)

In [None]:
# using the same code as before but with a loop that goes though the different movie files

# What we are looking for does not change, so we can have it outside the loop
category_to_find = 'Travel'

for(data in movie_list){
    # we have to change the variable name, because we are not looking in mydata2 anymore
    category = data$Mood  # <-- we now look in "data", a variable that corresponds to either mydata3 or mydata4
    index_to_remember = which(category == category_to_find)
    movie_names = data$Name  # <-- changed mydata2 to "data"
    
    print(movie_names[index_to_remember])
}


In [None]:
a = list(mydata1, '5')
a

***

## Data, data types, indices, and functions

### Data and data types
Data refers to all the values, numbers, texts, variables that we deal with. That can be a single number or a huge data.frame or images.

For the purpose of demonstration, the files and datasets are small. Much bigger files, like the Worldbank data (SGG.00272) can be read in as well.

`mydata2` showcases a dataset that consists of different **data types**. The entire data object, saved as the variable `mydata2` is a `data.frame` object. The columns have different data types. In Jupyter Notebooks these are indicated with the `<chr>` and `<int>` descriptions underneath the column names.

`<chr>` stands for character, which is simply speaking any kind of text. The character data type - sometimes also called **string** - is non-numeric. We cannot perform numerical operations with it. We can, however, attach it to other characters, e.g. with the `paste()` function.


In [None]:
paste('Some text', 'and some more text')

The other data type in `mydata2` is **integer** (`<int>`) - whole numbers. Integers can be used in numerical operations. There are more data types like **floating point numbers** `<float>`. Integers and floats (short for floating point numbers) can be used in numerical operations, like addition, subtraction, etc. Integers, unlike floats, are **also used as indices** to get a value at a certain **position**. 
<br/>


In [None]:
# example numerical operation with an integer and a float
# 100 + 10.4

# # or
# mydata1
# mean(mydata1)
# mean(c(TRUE,TRUE))

TRUE == 1
FALSE == 0

In [None]:
# logical operations with numerics and characters
5 == 5
'acd' == 'acd'  # works also for characters

5 > 5
# 'Apple' > 'Pear' - NO, do not do that !!!!


In [None]:
# Object (data) types -  the class() function tells us what data or object type we have
class(5)
class(mydata1)
class(mydata2)
class('abc')

### Indices

The bread and butter of selecting the data that you need. Indices (sgl. index) are the positions (data type `int`) in vectors and data.frames. 
- In **1D** (e.g. vectors) objects, we need **1 index**, and 
- in **2D** (e.g. data.frame, image) we need **2 indices** to select data.

In [None]:
# example of using an integer as index in a 1D object
mydata1
mydata1[2]

In [None]:
# example of using integers as indices in 2 dimensions
# mydata2

mydata2[3,1]  # 3rd row, 1st column

#### Data selection with integers
Finding indices is very important. We need these to create subsets (data we are interested in), like the movies that are good for traveling, or if we want to inspect data of a country but the original file has data for many countries.

Two steps are required: 
1. a **logical** operation, like comparing a character string ("Travel") or numerical value (snow height > 20 cm) with the entries of the dataset that contains the relevant information (e.g. column "Mood" for the movies, or a dataset with multiple snow heights).
2. Identify **where** the logical expression is TRUE.

Step 1 gives TRUE and FALSE value. Step 2 only gives back the position, where step 1 yields TRUE.

- `>=`, `==`, etc. - Step 1
- `which()` - Step 2


In [None]:
mydata2$Mood[5] = NA

# mydata2$Mood == 'Travel'
which(mydata2$Mood == 'Travel')

<font color=green> There are several examples on indices in the `R` script. Try to understand the different ways on how to use indices and why it matters. </font>
***

## Plotting

`R` can produce very high quality figures. These can be saved in PDF and other formats.  Some different kind of plots are available. To create plots, one of the following functions is needed to start with:
- `plot()` - the most generic way to plot
- `barplot()` - barplots to show e.g. quantities of categories
- `hist()` - a plot for histograms and densities
- `boxplot()` - calculates the quartiles (box), 1.5* InterQaurtileRange(IQR), and points outside the 1.5\*IQR and plots the results


### The anatomy of a figure 
> Repetition from QM I, Lecture 5

The following figure contains most elements of an `R` plot and their names, as well as the functions to decorate it.


![figure anatomy](figures/overview_plot_units2.png)

*Figure 5: Layout of a plot and naming of central elements*

### `plot( )`
In Figure 4 several elements are shown that a figure consists of. The training script will showcase the addition of different elements and increasing control over them. The most basic `plot()` function does not need many arguments to be plotted.

Here is a list of all the different functions that we can use for plotting together with the most important arguments:

- `plot()` - the generic plotting function of `R` for data with cartesian coordinates (x and y)
    - Arguments:
    - `xlab` - text of the x-axis 
    - `ylab` - text of the y-axis
    - `axes` - plot the axes ? TRUE(standard) or FALSE  
    - `ann` - should `xlab` and `ylab` be plotted ? TRUE(standard) or FALSE
    - `col` - the color for the points/line
    - `type` - shall the data be plotted as points `"p"` / lines `"l"` /points and line both `"b"` / or bars `"h"`?
    - `xlim` - two values defining from where to where the axis should extend. E.g. xlim=c(0,100) will limit the x-axis to 0 to 100
    - `ylim` - two values from where to where the y-axis should extend
    - `main` - a title (e.g. main='This is the plot title')
- `axis()` - plot an axis
    - Arguments:
    - `side` - which side shall the axis be plotted; in clock-wise direction (1-bottom, 2-left, 3-top, 4-right)
- `par()` - Adjust the figure space, e.g. to create multiple sub-plots, or to plot another dataset with different values on top
    - Arguments:
    - `mfrow` - split the figure into multiple parts (`n*m` , with n=number of rows, m=number of columns). E.g. mfrow=c(2,3) will give you a total of 6 plotting spaces; 2 rows and 3 columns.
    - `new` - tell R that the next plotting command will plot on top of an exisitng plot `par(new=TRUE)`
    - `mar` - margins, i.e. the space from the figure border to the actual graph. 4 values (for each side individually must be provided, e.g. `mar=c(5,5,4,5)` are 5 spaces for all but the top margin (=4)
- `lines()` - add lines to an existing plot 
    - Arguments:
    - `x` and `y` positions (the same way as in the `plot()` function)
    - `lty` - line type - same as for `plot()`, e.g. 1-solid line; 2-dashed line; 3-dotted line
    - `col` - color
- `points()` - add points to an existing plot 
    - Arguments:
    - `x` and `y` positions (the same way as in the `plot()` and `lines()` functions)
    - `pch` - point character type - same as for `plot()`, e.g. 1 to 21 for various shapes
    - `col` - color
- `mtext()` - adds text to your plot
    - Arguments:
    - `text` - the text to be plotted
    - `side` - which side of the plot (1,2,3,4)
    - `line` - how far away from the figure border (positive-further outside; negative-inside the plot)
    
### Colors
This code is a representation of a color as combination of **Red**, **Green**, and **Blue** with values between 0 and 256 for each of the 3 RGB colors (**in hexadecimal: 00 to FF**)
- <font color='#FF0000'> red: `#FF0000`</font>
- <font color='#AA0000'> darkred: `#AA0000`</font>
- <font color='#660000'> darkred2: `#660000`</font>
- <font color='#00FF00'> green: `#00FF00`</font>
- <font color='#00AA00'> darkgreen: `#00AA00`</font>
- <font color='#0000FF'> blue: `#0000FF`</font>
- <font color='#00FFFF'> turquoise: `#00FFFF`</font>
- <font color='#FFFF00'> yellow: `#FFFF00`</font>
- <font color='#FFAA00'> orange: `#FFAA00`</font>
- <font color='#FF00FF'> pink: `#FF00FF`</font>
- <font color='#000000'> black: `#000000`</font>

##### plot( )

In [None]:
# Example elements of a plot
df = data.frame(x = c(3,5,7,9),  
                y = c(4,7,6,12)) 
plot(x = df$x, 
     y = df$y, 
     xlab = 'Label of the x-axis',
     ylab='Label of the y-axis',
     lwd=2,
     col='#00FF00',
     lty=1,
     main='Playing around with plot',
     pch=2,
     cex=9,    
     type = 'p')

In [None]:
# Example additional elements
plot(x = df$x, 
     y = df$y)
lines(x = df$x, 
      y = df$y + 1, # slightly higher
      col='orange',
      lwd=5,
      lty=3)


grid()
abline(a=0, b=1, lty=3, col='green', lwd=3)  # with slope and intercept
abline(h=10, lty=4, col='pink', lwd=3)  # horizontal fixed
abline(v=3, lty=5, col='red', lwd=3)    # vertical fixed
legend('bottomleft', 
       legend=c('original','shifted'),
       col=c('black', 'orange'),
       pch=c(1, -1),
       lty=c(-1, 3),
       bg='lightgrey',
       lwd=3,
       cex=1)

In [None]:
# Example plot over another plot
plot(x = df$x, 
     y = df$y,
     cex=3)
par(new=TRUE)
plot(x = df$x, 
     y = df$y + 20000,  # much higher
     col='orange',
     pch=2, axes=FALSE, ann=FALSE)
axis(4, col='orange')      

##### hist( )
The `hist()` function calculates and plots a histogram from a series of data points. It will divide the data into groups (`bins`,`breaks` - x-axis) and reports the count (number of observations) on the y-axis.

In [None]:
df$y

In [None]:
hist(df$y,
    xlab='The units of our data [?]')

##### barplot( )
A value (y-axis) for different categories or classes (not following a physically meaningful scale).

In [None]:
barplot(df$y,
       names.arg = c('Class1', 'Class2','Class100','another label'))

##### boxplot( )
Calculates the data distribution statistics:
- 25% of the data
- 50% of the data 
- 75% of the data
- Inter Quartile Range (IQR = distance between 75% and 25%) - the "box"
- 1.5* IQR - the "whiskers"
- and the points outside the 1.5*IQR

In [None]:
mydata2$Duration
boxplot(mydata2$Duration)

###  How to change the appearance?

Some things are easy to guess but some are not. Quick check if you know/remember: https://tinyurl.com/3pumejx6




### Legend
As soon as two or more datasets are used in one plot, we should use a legend to show which of the points or lines represent which dataset. The `legend()` function does that for us.
It requires the following arguments:
- `x` and `y` - x and y positions in the figure (alternatively one of the relative positions "top","right","bottom","left", or a combination of these like "topleft", "bottomright", etc.)
- `legend` - a vector of text strings (e.g. `legend=c('dataset 1', 'dataset 2')`)
- `pch` - which point character. Either a single value for all the legend entries or as a vector (`pch=c(1,3)` - `-1` if no point shall be plotted)
- `lty` - which line type shall be used. Either a single value for all the legend entries or as a vector (`lty=c(1,3)` - `-1` if no line shall be plotted)
- `bg` - the background color (e.g. 'white')



### Plot on top
- `par(new=TRUE)` - creates a new plot on top of the existing one (Attention: the value ranges in x and y are NOT the same as before!!!)
- `plot(same_x_values, new_y_values)`
- `axis(4)` - 4 = right-hand side axis ; side 1 = bottom, side 2 = left, side 3 = top

### Saving figures
- `pdf(filename)` - arguments on figure output size: `width = 6, height = 5` (units in inches)
- `dev.off()` - finishes writing the PDF output
     - The output PDF is written in the folder of your noteboook.


### Output Examples
All made only with `R`

![showcase](figures/image_example5.png)

*Figure 6: plotting examples using only `R`*


***
## Overview of functions used in QM I / Geostatistic

- `c()` - creates a vector
- `data.frame()` - creates a data frame (like a matrix but can contain different data types)
- `length()` - returns the length of a vector or a list
- `NROW()` - returns the number of rows of a matrix or data.frame
- `NCOL()` - returns the number of columns of a matrix or data.frame
- `seq()` - creates a sequence of numbers; if only one argument is given, the sequence will be from 1 to this argument.
    - `seq(3)` will return `1,2,3`
    - `seq(from=1, to=3)` will also return `1,2,3`
    - `seq(from=1, to=3, by=2)` will return `1,3`
- `x^n` - taking a value x to the power of n (e.g. `2^3 = 2*2*2`)
- `paste()` - combines all arguments into one `character` string 
- `print()` - prints a **single** `character` string 
- `return()` - **only** inside your self-written `functions` to give back an object (`character`, `a value`, an enitre `data.frame` )
- `for(x in sequence/vector/data.frame){do something}`
- `if(TRUE condition){do something}`
- `read.table()` - read a text file (**.txt, .csv** files)
    - Arguments:
    - `file` - the filename with path e.g. "C:\\Downloads\\myfile.txt" or "/Users/myname/Documents/anotherfile.csv"
    - `sep` - the data seperator, e.g. space (sep=' '), comma (sep=','), or tabstop (sep='\t')
    - `header` - is there a first row that represents the column names? (TRUE-standard/FALSE)
    - `na.strings` - the symbols in the dataset to represent missing data. Can be multiple ones, e.g. `na.strings=c(-9999,'NA','*')`
    - `check.names` - makes sure there are no strange symbols in the column names. Results, however, also in numeric column names to get a leading X, e.g. 2000 -> `X2000`. `check.names=FALSE` prevents that
- `head()` - shows the first few rows of a matrix/data frame
- `colnames()` - returns the column names of a data.frame or a matrix
- `rownames()` - returns the row names of a data.frame or a matrix

- `grepl()` - searches for strings/words in lists/vectors of strings
    - Arguments:
    - `pattern` - what do we search, e.g. `"CO2"`
    - `x` - where do we search for, e.g. vector/list of strings/words
    - returns TRUE/FALSE if the search found the string or not

- `which()` - returns the index/indices of vectors,matrix, data.frames, where a conditional statement is TRUE
    - Arguments:
    - a test in the form of `x >= 12`. If x is a list/column/row of values, the `which()` function will give back the positions in this list where x is equal or greater than 12

***
   
