<img src="../NAWI_Graz_Logo.png" align="right" width=150>

# Notebook 3: Python Functions and reading files

*Developed by Raoul Collenteur, University of Graz*

In this notebook we will learn how to write reusable piece of Python code: functions. Python functions are usefull when we need to perform a certain operation or calculation multiple times. We will also learn how to read csv (comma separated variables) files, a common format to store data.


In [None]:
# Import the python packages needed in this session
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

## 1. Previous lecture

Before we start with the new concepts we will learn in this lecture, let us have a look at the core concepts we learned in the previous notebook.


### 1a. Refreshing for-loops

In [None]:
numbers =  [1,2,3,4,5]

for item in numbers:
    # Execute this code in the for-loop
    print("the number is", item)
    
# We are done
print("We are done!")

### 1b. If/else statements and comparison operators
- equals: `==`
- not equal: `!=`
- larger than: `>`
- smaller than: `>`
- greater than or equal to: `>=`
- smaller than or equal to: `<=`

In [None]:
x = 3
y = 10
z = 1

if x == y:
    print("x is equal to y")
# We can check multiple conditions using an if/elif/else statement
elif x < z:
    print(" is smaller than z")
else:
    print("x is between y and z")

## 2. Python functions
In this lecture we will learn about Python functions. Functions are fundamental building blocks in most programming languages and are used to store repetitive sequences of code to perform a certain (single) task.

As you might have noticed, we have already used quite a few functions throughout our Python programming course. Remember for example `np.sin`, `plt.plot`, or `type`? All of these were in fact functions that defined a piece of code to perform a certain task. Python packages can also be seen as a collection of functions for tasks we have to perform more often, such as calculating the sine, plot a variable or discover the type of a variable.

A function is 'called' by typing in the function name, followed by an opening bracket (`(`), the functions' input arguments, and a closing round bracket (`)`). For example:

In [None]:
np.sin(2)

In the above code-block we use the function `sin` within the numpy package to calculate the $\sin$ of 2. The number 2 is the input argument, and `sin` is the function. Let's look how we can define a function ourselves.

The following code block defines a function named `multiply` that can be used to multiply `a` by `b`.

In [None]:
def multiply(a, b):
    val = a * b
    return val

In [None]:
# Let's use our multiply function
print("2 times 3 is :", multiply(2,3))

To define a function we start with writing `def` followed a space and then the name of the new function. This can be anything but it is common practice to stick to letters, underscores (`_`) and sometimes numbers. It is also good practice to give the function a name that makes clear what the function does (as `multiply` above for example).

After the opening and closing brackets with the arguments in between, the actual operation begins in the code block starting on the next line. This is quite similar to how for-loops are written. At the end of this block (indented by four spaces or a tab), there is always the return value or values. This is the values that will be returned when the function is called.

### Arguments and keyword arguments
A function can take one or more input arguments, as for example `a` and `b` in the `multiply` function. In many cases, these arguments have some kind of default value. If this is the case, you can provide the default value in the function definition to make your function easier to use. Have a look at this example:

In [None]:
def multiply2(a, b=2):
    val = a * b
    return val

In [None]:
print(multiply2(5)) # No need to fill in the value b
print(multiply2(5, 3)) # But we still can

Remeber you can use shift-tab to find out about the input arguments of the function. This will also tell you about the default values.

### Local vs. Global variables
The last thing we want to know about functions (for now at least) is about local versus global variables. local variables are variables that are defined within the function definition. These are not available outside the function. Global variables are defined outside the function, but are available for use within the function. 

It is good practice however, to not use any global variables in a python function. This ensures that we can use to function in other places. Python first checks for local variables and, if it cannot find any, will go to global variables.

In [None]:
b = 2 # Global variable

def add(a, b=4):
    c = 4 # Local variable
    #d = a + b + c
    #return d
    return a + b + c

add(3)
#print(c) # Will give an error

### In-class exercise: calculate and plot the cumulative sum
In this exercise we will calculate the cumulative sum of a list of numbers. Write a function named `cumsum` that calculate the cumulative sum for a list of numbers. For example, the cumsum of the list [1,2,3] is [1, 4, 7]. Make a plot of the result.

In [None]:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Write your function here

cumsum(numbers) # => should return [0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55]
# np.cumsum(numbers)

plt.plot(cumsum(numbers))

In [None]:
# Possible solutions

def cumsum(numbers):
    cs = [numbers[0]]
    for num in numbers[1:]:
        cs.append(cs[-1] + num)
    return cs

cumsum(numbers)

## 3. Read CSV-files
So far, we have only worked with data that was directly defined in our Jupyter Notebook (see e.g., the list with numbers in the above code blocks). In real life however, we often need to import some data that has been stored in different types of files. A common format is CSV, or comma separated variables, files. These files can be recognized by the file extension .csv (e.g., `myfile.csv`).

There are many function available through different python packages to import csv-files. A common one is numpy's `loadtxt` method. Here you can find more about this method: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.loadtxt.html

Another option is available through the Pandas package, Python's go-to package for data analysis. This method is called `read_csv` and contains many options for importing different forms of CSV-files. More information on this method can be found here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html 

Let's import pandas and have a look at this powerfull method:

In [None]:
import pandas as pd
pd.read_csv # Use shift_tab or ? to get more information

### Pandas read_csv 
Let's look at some real data now! To access hydrological data, the Austrian government has created a web portal named EHYD: http://ehyd.gv.at/. Through this website you can download a variety of hydrological data ranging from water levels to groundwater temperature and rainfall time series. 

Although the datawe can download from EHYD are CSV-files, we need to provide some arguments to Pandas read_csv method to be able to import it. This is mostly because German formatting is used for the files. You can use the file provided for this lecture or download a file from the website yourselves.

NB. (If you have downloaded a file yourselves, please replace all words "Lücke" in the csv-file by -999)

In [None]:
# Make sure this file is in the same folder as this notebook
fname = "Grundwasserstand-Monatsmittel-310532.csv"

# Go through all of the (keyword) arguments to see what they do
data = pd.read_csv(fname, sep=";", decimal=",", dayfirst=True, index_col=0, 
                   parse_dates=True, usecols=[0,1], skiprows=50, squeeze=True, na_values=[-999])

### Pandas datatypes
Pandas has two new data types that are are used a lot and are very powerfull: `Series` and `DataFrame`. A `Series` is a 1D-matrix and a `DataFrame` is a 2D-matrix, comparable with a table in Excel. Both of these matrices have an index, which can be really helpfull when indexing data. In this example above, we index has a DateTime format, as the data we are looking at is a time series of the groundwater level. Pandas converted the index to a datetime format automatically for us, by using the argument `parse_dates=True` in the function `read_csv`. In the final lecture we will learn more about Pandas and these data types.

In [None]:
print("The data type os the variable data is: ", type(data))

# Print the first five rows of data to see what is inside
data.head(5)

In [None]:
# Pandas Series and DataFrames have some pretty cool 
data.plot()

### Exercise 1. Larger than function

This exercise you can do at home if you want to practice a little more. Write a function named `is_larger_than` that returns `True` if `x` is larger than `y`, and `False` otherwise. Use a for-loop to test the method by comparing the numbers in the following list to `y=2`:

In [None]:
values = [1, 2, 7, 0, -1, 4, 3, 1.5]
y = 2

# Your function goes here

for x in values:
    print(is_larger_than(x, y))