# Python Block Course
# Session 2: Functions and modules in focus

Prof. Dr. Karsten Donnay, Stefan Scholz

Winter Term 2019 / 2020

In this second session we will learn for (simple) applications how to write our own Python code. At the same time we also learn how to write our **own functions**, how **built-in modules** support us and how we make the most out of them. 

## 2.1 Recap: Built-In Functions

A convenient way to get things done in Python is to use functions. **Functions** are indicated by **round brackets** which are appended directly after the **name** of the **function**. **Inside** the **brackets** the **input** is handed over to the function.

Yesterday we used the functions `print()`, `help()`, `len()`, `sum()`, `type()` among others. When we used these functions, they had in common that they always came from Python's standard library, and we used only one input parameter most of the times. 

Let us call a simple built-in function.

In [1]:
# define tuple
points = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# print length
print(len(points))

11


In the following section we will see that we can write our own functions, and that functions can take more than one input parameter in general. 

## 2.2 User-Defined Functions

**User-defined functions** have the goal to **recycle** code blocks, such that the same code block can be executed **several times**. With functions, the code can be made more **understandable** and **modular**. In addition, functions offer a certain **flexibility** through their input parameters, so that the code can be used for the same purpose in the broadest sense. 

To declare your own function in Python, you must first write the **keyword** `def`, then the **name** of the function itself, followed by the **parameteres** in **round brackets** `(` `)`, and end the declaration with a **colon** `:`. This is followed by the **code** to be executed in the function which is **indented**. A the end of the code there can be a `return` **statement**, which returns one or more values, so that they can be used **outside** the function. 

Let us look at the **syntax** of a function.

```python
def function_name(argument_1, argument_2):
    {this is the code in the function}
    {more code doing something with the arguments}
    {more code}
    return {value to return to the main program}
```

Suppose we have a list of student we want to **welcome** personally. We could write the greeting for each student individually, or better **automate** it with a **function**. 

In [None]:
print("Hello Hans!")
print("Hello Adam!")
print("Hello Christine!")

In [2]:
# define function
def hello(name):
    print("Hello {}!".format(name))

# define list of students
students = ["Hans", "Adam", "Christine"]

# loop over students
for student in students:
    hello(student)

Hello Hans!
Hello Adam!
Hello Christine!


In the function above, the function has a **required parameter**. If we would call the function without a name, the function would abort. However, if we set a **default parameter**, then we take the default in case no other parameter is passed. 

Let us set a **default parameter**. 

In [3]:
# define function
def hello(name = "Lisa"):
    print("Hello {}!".format(name))
    
# call function without name
hello()

Hello Lisa!


In [4]:
# define function
def hello(name = "Lisa"):
    print("Hello {}!".format(name))
    
# call function without name
hello("Eva")

Hello Eva!


If your functions get more complicated and have **several input parameters**, it is a good idea to **explicitly mention** the **parameter names** in the **function call**. If no parameter names are given, then the values passed must be in the order of the parameters. If parameter names are given, the values can be written in a mixed order too. 

Let us write a function with **two inputs** and pass its **parameters explicitly**. 

In [5]:
# define function
def hello(name = "Lisa", location = "Konstanz"):
    print("Hello {} in {}!".format(name, location))
    
# call function with explicit parameters
hello(name = "Anton", location = "Litzelstetten")

Hello Anton in Litzelstetten!


If you are not sure how many input parameters you want to pass, you can write the function in such a way that it takes **any number of parameters**. All what you have to do is write an **asteriks** `*` before a variable. This variable will capture all additional parameters and store them in a `Tuple`. Basically you can name this variable as you like, but by convention it is named `*args`. 

Let us write a function in which we greet an unknown number of students. 

In [6]:
# define function
def hello(location, *names):
    print("Hello {} in {}!".format(" and ".join(names), location))
    
# call function with unknown number parameters # order matters!!
hello("Konstanz", "Egon", "Sarah", "Linda", "Chris")

Hello Egon and Sarah and Linda and Chris in Konstanz!


Besides the **keyword** `*args`, there is also `**kwargs`, where you can pass **named parameters** instead of unnamed ones. You can pass as many parameters as you want to `**kwargs`. They will later be stored in a `Dictionary` where the parameter names will be the keys and the parameter values will be the values. You can also use `*args` and `**kwargs` together, but we will not go into detail here. At the moment these keywords may not be very interesting, but you will see them very often in the code of **large packages**. 

Let us rewrite our **example** using **named parameters**. 

In [7]:
# define function
def hello(**kwargs):
    for name, location in kwargs.items():
        print("Hello {} in {}!".format(name, location))
        
# call function with unknown named parameters
hello(Lea = "Zurich", Jose = "Geneva", Konrad = "Stuttgart")

Hello Lea in Zurich!
Hello Jose in Geneva!
Hello Konrad in Stuttgart!


Now you have seen some simple examples of what you can do with functions. But so far our examples have been very shortened, in pratice your would probably use a more meaningful name, a `return` **statement** and **print** the message **outside** the functions, and you would have to write **docstrings**.

Wait, what is a docsting? A **docstring** is a **multiline string** after you have declared a function in which its **purpose** and **parameters** are explained. If you do not how to use a function, you can call its docstring later with the function `help()`. Well programmed code always includes docstrings. You can get an idea of well written docstrings in **style guides**. For now, we will look at a complete example of a function. 

Let us write a **complete function** as it could be used in practice. 

In [8]:
def hello_message(location, *names):
    """
    Function that returns welcome message for people in location
    
    Parameters
    ----------
    location: String
        Name of location
        
    *names: String(s)
        Name(s) of people
    
    Returns
    -------
    String
        Welcome message
    
    """
    
    message = "Hello {} in {}!".format(" and ".join(names), location)
    
    return message

# call function and print message
print(hello_message("Konstanz", "all"))

Hello all in Konstanz!


## 2.3 Anonymous Functions

Small **anonymous functions** can be created with the `lambda` **keyword**, they are also called lambda functions in Python because instead of declaring them with the standard `def` keyword, you use the `lambda` keyword. What is special about these functions is that they have **no name**. Lambda functions can be used wherever function objects are required. They are syntactically restricted to a **single expression**. You can use anonymous functions when you require a nameless function for a short period of time and that is created at runtime. 

```python
lambda argument_1, argument_2 : expression
```

In the following example we will use `lambda` functions to prepare dates inside a list to sort them by year. Like this, `lambda` are very typically used. 

Let us sort a **list of dates** by their years with an **anonymous function**. To **sort** the list, we use the function `sorted()`. 

In [9]:
# define list of dates
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]

# sort by default
dates = sorted(dates)

# print sorted dates
print(dates)

['01/05/2018', '02/04/2013', '13/02/2017', '28/07/2016', '30/09/2018']


In [12]:
# define list of dates
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]

# sort by year
dates = sorted(dates, key=lambda x: x.split('/')[-1])

# print dates sorted by year
print(dates)

['02/04/2013', '28/07/2016', '13/02/2017', '30/09/2018', '01/05/2018']


## 2.4 Local and Global Variables

**Local variables** are assigned inside a called **function** and exist only in the scope of the function's local scope. In comparison, **global variables** are assigned outside of functions and exist in the entire **program**, also inside called functions. If a variable is defined inside the scope of a function, and a variable with the same name exists already in the global scope, then Python will work with the local instead of the global variables.

**Global variables** can be printed or used within a **function** without any problems, but **cannot** be **assigned** or **changed**. What is the reason for this behavior? by default, each variable in a function is local. To work with the global variable we have to use the keyword `global`. However, it is recommended to **pass** all **variables** used inside a function explicitly over to avoid this problem. 

The first example shows the problem of mixing local and global variables. The second example shows how this problem can be solved with the keyword `global`. 

Let us see how we can use **local** and **global variables** in a **function**. 

In [13]:
# define global variable
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]

# define function
def first_date():
    # sort dates
    dates = sorted(dates, key=lambda x: x.split('/')[-1]) 
    
    return dates[0]

# call function
first_date()

# print dates
print(dates)

UnboundLocalError: local variable 'dates' referenced before assignment

In [19]:
# define global variable
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]

# define function
def first_date():
    # use global variable
    global dates
    
    # sort dates
    dates = sorted(dates, key=lambda x: x.split('/')[-1]) 
    
    return dates[-1]

# call function
first_date()

# print dates
print(dates)

['02/04/2013', '28/07/2016', '13/02/2017', '30/09/2018', '01/05/2018']


<div class="alert alert-block alert-info">
    <b>Exercise</b>: Write a function that sorts any list of strings with dates in the format DD-MM-YYYY. Sort by year, month and day. Define an additional parameter to decide whether you want to have the dates sorted ascendingly or descendingly. Finally sort ascendingly the list of dates above. 
</div>

In [3]:
# define global variable
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]



# define function
def first_date(i):
    # use global variable
    global dates
    
    # sort dates
    dates = sorted(dates, key=lambda x: x.split('/')[i], reverse= True) 
    
    return dates

# call function
for i in range(3):
    first_date(i)

# print dates
print(dates)

['30/09/2018', '01/05/2018', '13/02/2017', '28/07/2016', '02/04/2013']


## 2.5 Built-In Modules

We already looked at some built-in functions yesterday. But there are many more functions available in Python. All you have to do is **import modules** in which more **functions**, **classes** and **variables** are included. Each of these modules provides functionalities in a certain category. These built-in modules can be imported directly without the need to install them. 

The **syntax** for **importing** a **module** looks as follows. This statement is executed at the beginning of code so that all its functionalities are available in the actual code later.

```python
import module
```

We will discuss a few of these **modules** and have a **quick look** at some **use cases** soon. Here is a first **overview** of these **modules**:

| Module | Description |
| -------- | ------- |
| `os` | Miscellaneous operating system interfaces |
| `random` | Generate pseudo-random numbers |
| `datetime` | Basic date and time types |
| `re` | Regular expression operations |
| `csv` | Read and write tabular data in CSV format |

To get a **complete list** of all modules, you can use the function `help()` and search for modules. 

In [43]:
print(help("modules"))


Please wait a moment while I gather a list of all available modules...



  "The twython library has not been installed. "


DEBUG:pip._internal.vcs:Registered VCS backend: git
DEBUG:pip._internal.vcs:Registered VCS backend: hg
DEBUG:pip._internal.vcs:Registered VCS backend: svn
DEBUG:pip._internal.vcs:Registered VCS backend: bzr


  warn("Recommended matplotlib backend is `Agg` for full "
    Install tornado itself to use zmq with the tornado IOLoop.
    
  yield from walk_packages(path, info.name+'.', onerror)


Crypto              builtins            menuinst            sockshandler
Cython              bz2                 mimetypes           sortedcollections
IPython             cProfile            mistune             sortedcontainers
OpenSSL             calendar            mkl                 soupsieve
PIL                 certifi             mkl_fft             sphinx
PyQt5               cffi                mkl_random          sphinxcontrib
__future__          cgi                 mmap                spyder
_abc                cgitb               mmapfile            spyder_breakpoints
_ast                chardet             mmsystem            spyder_io_dcm
_asyncio            chunk               mock                spyder_io_hdf5
_bisect             click               modulefinder        spyder_kernels
_blake2             cloudpickle         more_itertools      spyder_profiler
_bootlocale         clyent              mpmath              spyder_pylint
_bz2                cmath               m

### Os

The module `os` allows for many **operating system tasks** with dozens of functions. For example, **files** and **directories** can be **localized**, **deleted** or **created**. 

Let us first **import** the **module**. 

In [45]:
import os

To find out where Python works on your storage, and what its **working directory** is, you can use the function `os.getcwd()`.

Let us find out the **path** of our **current working directory**.

In [46]:
# define working directory
work_dir  = os.getcwd()

# print working directory
print(work_dir)

C:\Users\JohnDoe\Documents\GitHub\python-block-course-2019-SebastianStaab\2-modules


Next, we may be interested which **files** are in our **working directory**. This job can be done with the function `os.listdir()`. 

Let us see which **files** are in our **working directory**. 

In [None]:
# define working directory
work_dir  = os.getcwd()

# define files in working directory
work_files = os.listdir(work_dir)

# print files in working directory
print(work_files)

You should see that one of the files is the Jupyter notebook you are currently working in, i.e. one of the files should be called *2-session.py*. 

Suppose we want to test in our code whether a certain **file exists** with a certain path. First, we would need to create a **valid path**, with the **absolute directory** and the **file name**, and join them together with the function `os.path.join()`. Second, we should check whether this **file exists** by its joined path using the function `os.path.exists()`. 

Let us try if we can **find** the **files** of todays session and assignment. 

In [47]:
# define working directory
work_dir  = os.getcwd()

# define file of session
session_file = os.path.join(work_dir, "2-session.ipynb")

# check file of session
print(os.path.exists(session_file))

True


In [48]:
# define working directory
work_dir  = os.getcwd()

# define file of assignment
assignment_file = os.path.join(work_dir, "2-assignment.ipynb")

# check file of assignment
print(os.path.exists(assignment_file))

False


### Random

The module `random` implements a generator for **pseudo-random numbers**. Those can be used for instance to return a **random number** between 0 and 1, return random integer between certain range, make a **random pick** from a list and a **random shuffle** of a list. 

Let us first **import** the **module**.

In [49]:
import random

As noted above, these numbers are only **pseudo-random**, because they are generated by default with the help of the **system time**. That means you will very very probably not get the same random number when you generate multiple numbers in a row. To make your code **deterministic**, you need to use a **function** `random.seed()` and give it an `Integer` or `String` as input. This seed is used to generate the next random number. All other drawn numbers are like a chain and build on the previous number. With the **function** `random.random()`, you can **draw numbers** in the **interval** of `[0,1)`.

Let us first consider the **stochastic** and then the **deterministic** behavior of **random numbers** without and with a **seed**. 

In [50]:
# generate random number
random_num = random.random()

# print random number
print(random_num)

0.11368541106186603


In [51]:
# seed 
random.seed(2019)

# generate random number
random_num = random.random()

# print random number
print(random_num)

0.8323024001314224


Obviously there are plenty more functions to draw numbers from other distributions too. A helpful function, especially for programming, is the **function** `random.randint()` which can be used to draw a **random integer** in an **interval** `[a,b]`. For example, if you need a **random index**, this function is very popular. 

Let us draw a **random integer** between 0 and 100.

In [52]:
# seed 
random.seed(2019)

# generate random integer
random_num = random.randint(1,100)

# print random integer
print(random_num)

20


Instead of drawing a random integer for an index, you can also directly draw a **random element** from a **sequence** with the **function** `random.choice()`. 

Let us try to draw a **random date** from a **list** of dates. 

In [55]:
# seed 
random.seed(2019)

# define dates
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]

# draw random date
random_date = random.choice(dates)

# print random date
print(random_date)

28/07/2016


### Datetime

The module `datetime` provides various functionalities to work with **dates** and **times**. It offers the possibility to read, create, modify and return dates in any **type of format**. 

Let us first **import** the **module**.

In [56]:
import datetime

To get the **current date**, the **function** `datetime.datetime.now()` can be used. This date includes year, month, day, hour, minute, second and microsecond. 

Let us find out the **current date**.

In [58]:
# define current date
date_now = datetime.datetime.now()

# print current date
print(date_now)

2019-10-15 12:06:06.106467


To create a **specific date** you can use the **function** `datetime.datetime()`. The function requires at least parameters for the year, month and day. Optionally, it also takes parameters for the hour, minute, second, microsecond and timezone. 

Let us create the **date** when the **course started**. 

In [59]:
# define date when course started
date_start = datetime.datetime(2019, 10, 14)

# print date when course started
print(date_start)

2019-10-14 00:00:00


If you want to jump from a certain date to another date, and you know the **difference** between them, you can use the **function** `datetime.timedelta()` to generate a time difference. The function accepts differences in days, hours, minutes, seconds and milliseconds. 

Let us determine the date of the **second day** of the course from the first day.

In [60]:
# define date when course started
date_start = datetime.datetime(2019, 10, 14)

# define date delta
date_day = datetime.timedelta(days = 1) 

# compute date when day 2
date_second = date_start + date_day

# print date when day 2
print(date_second)

2019-10-15 00:00:00


The dates you have created so far are a certain data type, but to actually get a string from them you have to use the **method** `strftime()`. This method also allows you to specify in which **format** the **date** should be returned. For example, the date format `DD/MM/YYYY` is defined like `%d/%m/%Y`. But there are many other ways to build your own date format, like you can see in the [Python documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior). 

Let us write the date of the **first day** as a **string**. 

In [61]:
# define date when day 1
date_start = datetime.datetime(2019, 10, 14)

# define date as string in specific format
date_str = date_start.strftime("%d/%m/%Y")

# print date when day 1
print(date_str)

14/10/2019


The other way around you can also **parse** a **string** in any date format as datetime object. For that, the function `datetime.datetime.strptime()` requires the **string** with the **date**, its corresponding **date format** and then returns the **date**. Like this you can perfom all steps shown so far and many others.

Let us load the **date string** of the **first day** with **datetime**. 

In [64]:
# define date string
date_str = "14/10/2019"

# parse date string as date
date_par = datetime.datetime.strptime(date_str, "%d/%m/%Y")

# print parsed date
print(date_par)

2019-10-14 00:00:00


<div class="alert alert-block alert-info">
    <b>Exercise</b>: Repeat the previous task where you wrote a function to sort a list of dates. This time use the datetime module to sort the list. Return the dates in the same date format again. 
</div>

In [77]:
# define global variable
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]
i=0
dates = datetime.datetime.strptime(dates[i], "%d/%m/%Y")

for i in range (5):
print(dates)


# define function
def first_date(i):
    # use global variable
    global dates
    
    # sort dates
    dates = datetime.datetime.strptime(dates[i], "%d/%m/%Y")
    #sorted(dates, key=lambda x: x.split('/')[i], reverse= False) 
    
    return dates

# call function
for i in range(5):
    first_date(i)

# print dates
print(dates)

IndentationError: expected an indented block (<ipython-input-77-6f1b7f7700e0>, line 7)

<div class="alert alert-block alert-info">
    <b>Exercise</b>: Write a function that generates a random date. Define two additional parameters which take strings with the earliest and latest date. The dates handed in and out of the function are strings in the format DD/MM/YYYY. Use the modules random and datetime.
</div>

### Re

The module `re` allows us to work with **regular expressions**, which are **search patterns** for **strings**. These regular expressions can be arbitrarily complicated, also to find **complicated** and **varying character sequences**. If you have written a regular expression, you can check if such a string is inside another string and what it looks like. 

Let us first **import** the **module**.

In [89]:
import re

First, we need to get an idea how to **write** a **regular expression**. In regular expressions some **characters** are reserved for a **specific meaning**, i.e. they are not matched with themselves but with **another meaning**. If you want to match a special character literally, you need to **escape** it with a **backslash** `\`. We will only be able to discuss the simplest characters here. A few of these characteres are listed in the table below. 

| Character | Description |
| -------- | ------- |
| `.` | any character |
| `^` | starts with |
| `$` | ends with |
| `[]` | set of characters |
| `\|` | either or |
| `*` | zero or more occurrences |
| `+` | one or more occurrences |
| `\d` | digit |
| `\D` | no digit |
| `\w` | word character |
| `\W` | no word character |
| `\s` | whitespace character |
| `\S` | no whitespace character |

Suppose we want to check whether strings have a certain **date format**. For that, we want to write a suitable **regular expression**. We want the dates to be in the **format** `DD/MM/YYYY`. 

Let us check with a **regular expression** whether the correct date format is found. To **match** a string with a regular expression, we use the **function** `re.match()`. 

In [90]:
# define date string in correct format
date_str = "14/10/2019"

# write regular expression
reg_ex = "^\d\d\/\d\d\/\d\d\d\d$"

# search with regular expression
reg_match = re.match(reg_ex, date_str)

# print whether match
print(reg_match)

<re.Match object; span=(0, 10), match='14/10/2019'>


In [93]:
# define date string in wrong format
date_str = "14-10-19"

# write regular expression
reg_ex = "^\d\d\.\d\d\.\d\d$"

# search with regular expression
reg_match = re.match(reg_ex, date_str)

# print whether match
print(reg_match)

None


### Csv

The **module** `csv` provides functionalities to **read** and **write** to **CSV files**. This way you can conveniently get informationen **into** and **out** of your **program**. In CSV files you can find tabular data as **c**omma-**s**eparated **v**alues. Each line is a data record. And each record has one or more fields which are comma separated. 

Let us first **import** the **module**.

In [82]:
import csv

Suppose we have calculated results in our code which we want to save now. For this, we want to use a CSV file. Therefore, we have to open a CSV file first, where we will write later. The most common way to **open** a **file** is to open it in a `with` **block** with the **function** `open()` given a certain **variable name**. Inside the `with` block you can access the opened file with the variable name. Next, you create a **writer** like `csv.writer()` who will write your information into the respective file. There you can also specify in which **format** you want to write your CSV file. Once the writer is created, you can use its **method** `writerow()` to **write** a **sequence of information** into the file. 

Let us try to **write** some **test data** into a CSV file. 

In [83]:
# open file
with open(file="test.csv", mode="w") as csv_file:
    # create writer
    writer = csv.writer(csv_file, delimiter=',')
    
    # witer row with header
    writer.writerow(["number1", "number2", "number3", "number4", "number5"])
    
    # write row with zeros
    writer.writerow([0,0,0,0,0])
    
    # write row with random numbers and list comprehension
    writer.writerow([random.random() for i in range(5)])

You can also **load data** from a CSV file in the same way. In principle, this works exactly the same, except that the file is now opened in **read mode** `r` and a **reader** is created like `csv.reader()`. Then you can get the rows of the CSV file as a **sequence** from the **reader**, e.g. with a `for` loop. 

Let us try to **read** our **test data** from the CSV file again. 

In [84]:
# open file
with open(file="test.csv", mode="r") as csv_file:
    # create reader
    csv_reader = csv.reader(csv_file, delimiter=',')
    
    # iterate over rows in file
    for row in csv_reader:
        # print row
        print(row)

['number1', 'number2', 'number3', 'number4', 'number5']
[]
['0', '0', '0', '0', '0']
[]
['0.7889290619064494', '0.9385678954207153', '0.16059637997392429', '0.6488539744120456', '0.8325765404421139']
[]


<div class="alert alert-block alert-info">
    <b>Exercise</b>: Download the table available <a href="https://raw.githubusercontent.com/therbootcamp/BaselRBootcamp_2018July/master/_sessions/_data/baselrbootcamp_data/%5EGDAXI.csv">here</a>. Import the data as nested list. Use the module csv. Inspect the data. 
</div>