# Einführung in das Programmieren mit Python
# Session 3-2: Modules

Jack Krüger, Sebastian Staab  
SS 21

In this session we will learn for (simple) applications how to write our own Python code. At the same time we also learn how **built-in modules** support us and how we make the most out of them. 

## 3.4 Built-In Modules

We already looked at some built-in functions last week. But there are many more functions available in Python. All you have to do is **import modules** in which more **functions**, **classes** and **variables** are included. Each of these modules provides functionalities in a certain category. These built-in modules can be imported directly without the need to install them. 

The **syntax** for **importing** a **module** looks as follows. This statement is executed at the beginning of code so that all its functionalities are available in the actual code later.

```python
import module
```

We will discuss a few of these **modules** and have a **quick look** at some **use cases** soon. Here is a first **overview** of these **modules**:

| Module | Description |
| -------- | ------- |
| `os` | Miscellaneous operating system interfaces |
| `random` | Generate pseudo-random numbers |
| `datetime` | Basic date and time types |
| `re` | Regular expression operations |
| `csv` | Read and write tabular data in CSV format |

To get a **complete list** of all modules, you can use the function `help()` and search for modules. 

In [1]:
print(help("modules"))


Please wait a moment while I gather a list of all available modules...



  warn("The `IPython.kernel` package has been deprecated since IPython 4.0."

Recommended matplotlib backend is `Agg` for full skimage.viewer functionality.



Update LANGUAGE_CODES (inside config/base.py) if a new translation has been added to Spyder



You need to have a running QApplication to use QtAwesome!


The sklearn.metrics.classification module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.


zmq.eventloop.minitornado is deprecated in pyzmq 14.0 and will be removed.
    Install tornado itself to use zmq with the tornado IOLoop.
    



Cython              calendar            mkl_fft             spacy_legacy
GetOldTweets3       catalogue           mkl_random          sphinx
IPython             certifi             mmap                sphinxcontrib
OpenSSL             cffi                mmapfile            spyder
PIL                 cgi                 mmsystem            spyder_kernels
PyQt5               cgitb               mock                sqlalchemy
__future__          chardet             modulefinder        sqlite3
_abc                chromedriver_py     more_itertools      sre_compile
_ast                chunk               mpmath              sre_constants
_asyncio            click               msgpack             sre_parse
_bisect             cloudpickle         msilib              srsly
_blake2             clyent              msvcrt              ssl
_bootlocale         cmath               multidict           sspi
_bz2                cmd                 multipledispatch    sspicon
_cffi_backend       code  

### Os

In [10]:
import pyppeteer 

The module `os` allows for many **operating system tasks** with dozens of functions. For example, **files** and **directories** can be **localized**, **deleted** or **created**. 

Let us first **import** the **module**. 

In [1]:
import os

To find out where Python works on your storage, and what its **working directory** is, you can use the function `os.getcwd()`.

Let us find out the **path** of our **current working directory**.

In [2]:
# define working directory
work_dir  = os.getcwd()

# print working directory
print(work_dir)

C:\Users\Jack\Python Course


Next, we may be interested which **files** are in our **working directory**. This job can be done with the function `os.listdir()`. 

Let us see which **files** are in our **working directory**. 

In [3]:
# define working directory
work_dir  = os.getcwd()

# define files in working directory
work_files = os.listdir(work_dir)

# print files in working directory
print(work_files)

['.ipynb_checkpoints', '1-2 Answers.ipynb', '1-2.ipynb', '2-1 Answers.ipynb', '2-1.ipynb', '2-2.ipynb', '2-session.ipynb', '3-2 (1).ipynb', '3-2 Answers.ipynb', '3-2.ipynb', '4-1 Answers.ipynb', '4-1.ipynb', 'CASchools.csv', 'Downloads', 'flights.csv', 'Python Project Answers.ipynb', 'Python Project.ipynb', 'test.csv']


You should see that one of the files is the Jupyter notebook you are currently working in, i.e. one of the files should be called *3-1.ipynb*. 

Suppose we want to test in our code whether a certain **file exists** with a certain path. First, we would need to create a **valid path**, with the **absolute directory** and the **file name**, and join them together with the function `os.path.join()`. Second, we should check whether this **file exists** by its joined path using the function `os.path.exists()`. 

Let us try if we can **find** the **files** of todays session and assignment. 

In [5]:
# define working directory
work_dir  = os.getcwd()

# define file of session
session_file = os.path.join(work_dir, "3-2.ipynb")

# check file of session
print(os.path.exists(session_file))

True


In [6]:
# define working directory
work_dir  = os.getcwd()

# define file of assignment
second_file = os.path.join(work_dir, "3-2.ipynb")

# check file of assignment
print(os.path.exists(second_file))

True


### Random

The module `random` implements a generator for **pseudo-random numbers**. Those can be used for instance to return a **random number** between 0 and 1, return random integer between certain range, make a **random pick** from a list and a **random shuffle** of a list. 

Let us first **import** the **module**.

In [7]:
import random

As noted above, these numbers are only **pseudo-random**, because they are generated by default with the help of the **system time**. That means you will very very probably not get the same random number when you generate multiple numbers in a row. To make your code **deterministic**, you need to use a **function** `random.seed()` and give it an `Integer` or `String` as input. This seed is used to generate the next random number. All other drawn numbers are like a chain and build on the previous number. With the **function** `random.random()`, you can **draw numbers** in the **interval** of `[0,1)`.

Let us first consider the **stochastic** and then the **deterministic** behavior of **random numbers** without and with a **seed**. 

In [8]:
# generate random number
random_num = random.random()

# print random number
print(random_num)

0.9450382830261884


In [9]:
# seed 
random.seed(2019)

# generate random number
random_num = random.random()

# print random number
print(random_num)

0.8323024001314224


Obviously there are plenty more functions to draw numbers from other distributions too. A helpful function, especially for programming, is the **function** `random.randint()` which can be used to draw a **random integer** in an **interval** `[a,b]`. For example, if you need a **random index**, this function is very popular. 

Let us draw a **random integer** between 0 and 100.

In [10]:
# seed 
random.seed(2019)

# generate random integer
random_num = random.randint(1,100)

# print random integer
print(random_num)

20


Instead of drawing a random integer for an index, you can also directly draw a **random element** from a **sequence** with the **function** `random.choice()`. 

Let us try to draw a **random date** from a **list** of dates. 

In [11]:
# seed 
random.seed(2019)

# define dates
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]

# draw random date
random_date = random.choice(dates)

# print random date
print(random_date)

28/07/2016


### Datetime

The module `datetime` provides various functionalities to work with **dates** and **times**. It offers the possibility to read, create, modify and return dates in any **type of format**. 

Let us first **import** the **module**.

In [12]:
import datetime

To get the **current date**, the **function** `datetime.datetime.now()` can be used. This date includes year, month, day, hour, minute, second and microsecond. 

Let us find out the **current date**.

In [13]:
# define current date
date_now = datetime.datetime.now()

# print current date
print(date_now)

2021-05-06 18:51:28.371345


To create a **specific date** you can use the **function** `datetime.datetime()`. The function requires at least parameters for the year, month and day. Optionally, it also takes parameters for the hour, minute, second, microsecond and timezone. 

Let us create the **date** when the **course started**. 

In [14]:
# define date when course started
date_start = datetime.datetime(2021, 2, 12)

# print date when course started
print(date_start)

2021-02-12 00:00:00


If you want to jump from a certain date to another date, and you know the **difference** between them, you can use the **function** `datetime.timedelta()` to generate a time difference. The function accepts differences in days, hours, minutes, seconds and milliseconds. 

Let us determine the date of the **second day** of the course from the first day.

In [15]:
# define date when course started
date_start = datetime.datetime(2021, 2, 12)

# define date delta
date_day = datetime.timedelta(days = 1) 

# compute date when day 2
date_second = date_start + date_day

# print date when day 2
print(date_second)

2021-02-13 00:00:00


The dates you have created so far are a certain data type, but to actually get a string from them you have to use the **method** `strftime()`. This method also allows you to specify in which **format** the **date** should be returned. For example, the date format `DD/MM/YYYY` is defined like `%d/%m/%Y`. But there are many other ways to build your own date format, like you can see in the [Python documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior). 

Let us write the date of the **first day** as a **string**. 

In [16]:
# define date when day 1
date_start = datetime.datetime(2021, 2, 12)

# define date as string in specific format
date_str = date_start.strftime("%d/%m/%Y")

# print date when day 1
print(date_str)

12/02/2021


The other way around you can also **parse** a **string** in any date format as datetime object. For that, the function `datetime.datetime.strptime()` requires the **string** with the **date**, its corresponding **date format** and then returns the **date**. Like this you can perfom all steps shown so far and many others.

Let us load the **date string** of the **first day** with **datetime**. 

In [17]:
# define date string
date_str = "12/2/2021"

# parse date string as date
date_par = datetime.datetime.strptime(date_str, "%d/%m/%Y")

# print parsed date
print(date_par)

2021-02-12 00:00:00


In [2]:
import datetime

In [None]:
lis

<div class="alert alert-block alert-info">
    <b>Exercise</b>: Repeat the previous task where you wrote a function to sort a list of dates. This time use the datetime module to sort the list. Return the dates in the same date format again. 
</div>

In [10]:
# define global variable
dates = ["13/02/2017", "28/07/2016", "02/04/2013", "30/09/2018", "01/05/2018"]

# convert to datetime
for i in range(5):
    dates[i] = datetime.datetime.strptime(dates[i], "%d/%m/%Y")
    print(dates[i])

print(dates) #looks a bit weird

# define function
def datetime_sort(list):
    dates = sorted(list)
    return dates

# call function
new_dates = datetime_sort(list=dates)

# print dates
for j in range(5):
    print(new_dates[j])

2017-02-13 00:00:00
2016-07-28 00:00:00
2013-04-02 00:00:00
2018-09-30 00:00:00
2018-05-01 00:00:00
[datetime.datetime(2017, 2, 13, 0, 0), datetime.datetime(2016, 7, 28, 0, 0), datetime.datetime(2013, 4, 2, 0, 0), datetime.datetime(2018, 9, 30, 0, 0), datetime.datetime(2018, 5, 1, 0, 0)]
2013-04-02 00:00:00
2016-07-28 00:00:00
2017-02-13 00:00:00
2018-05-01 00:00:00
2018-09-30 00:00:00


In [6]:
import random

<div class="alert alert-block alert-info">
    <b>Exercise</b>: Write a function that generates a random date. Define two additional parameters which take strings with the earliest and latest date. The dates handed in and out of the function are strings in the format DD/MM/YYYY. Use the modules random and datetime.
</div>

In [9]:
def random_date(earliest = "01/01/2000", latest = "21/05/2021"):
    latest = datetime.datetime.strptime(latest, "%d/%m/%Y")
    earliest = datetime.datetime.strptime(earliest, "%d/%m/%Y")
    delta = latest - earliest
    time = earliest + random.random() * delta
    return time.strftime("%d/%m/%Y")
    
random_date()

'31/03/2004'

### Re

The module `re` allows us to work with **regular expressions**, which are **search patterns** for **strings**. These regular expressions can be arbitrarily complicated, also to find **complicated** and **varying character sequences**. If you have written a regular expression, you can check if such a string is inside another string and what it looks like. 

Let us first **import** the **module**.

In [53]:
import re

First, we need to get an idea how to **write** a **regular expression**. In regular expressions some **characters** are reserved for a **specific meaning**, i.e. they are not matched with themselves but with **another meaning**. If you want to match a special character literally, you need to **escape** it with a **backslash** `\`. We will only be able to discuss the simplest characters here. A few of these characteres are listed in the table below. 

| Character | Description |
| -------- | ------- |
| `.` | any character |
| `^` | starts with |
| `$` | ends with |
| `[]` | set of characters |
| `\|` | either or |
| `*` | zero or more occurrences |
| `+` | one or more occurrences |
| `\d` | digit |
| `\D` | no digit |
| `\w` | word character |
| `\W` | no word character |
| `\s` | whitespace character |
| `\S` | no whitespace character |

Suppose we want to check whether strings have a certain **date format**. For that, we want to write a suitable **regular expression**. We want the dates to be in the **format** `DD/MM/YYYY`. 

Let us check with a **regular expression** whether the correct date format is found. To **match** a string with a regular expression, we use the **function** `re.match()`. 

In [54]:
# define date string in correct format
date_str = "14/10/2019"

# write regular expression
reg_ex = "^\d\d\/\d\d\/\d\d\d\d$"

# search with regular expression
reg_match = re.match(reg_ex, date_str)

# print whether match
print(reg_match)

<re.Match object; span=(0, 10), match='14/10/2019'>


In [55]:
# define date string in wrong format
date_str = "14-10-19"

# write regular expression
reg_ex = "^\d\d\.\d\d\.\d\d$"

# search with regular expression
reg_match = re.match(reg_ex, date_str)

# print whether match
print(reg_match)

None


### Csv

The **module** `csv` provides functionalities to **read** and **write** to **CSV files**. This way you can conveniently get informationen **into** and **out** of your **program**. In CSV files you can find tabular data as **c**omma-**s**eparated **v**alues. Each line is a data record. And each record has one or more fields which are comma separated. 

Let us first **import** the **module**.

In [56]:
import csv

Suppose we have calculated results in our code which we want to save now. For this, we want to use a CSV file. Therefore, we have to open a CSV file first, where we will write later. The most common way to **open** a **file** is to open it in a `with` **block** with the **function** `open()` given a certain **variable name**. Inside the `with` block you can access the opened file with the variable name. Next, you create a **writer** like `csv.writer()` who will write your information into the respective file. There you can also specify in which **format** you want to write your CSV file. Once the writer is created, you can use its **method** `writerow()` to **write** a **sequence of information** into the file. 

Let us try to **write** some **test data** into a CSV file. 

In [57]:
# open file
with open(file="test.csv", mode="w") as csv_file:
    # create writer
    writer = csv.writer(csv_file, delimiter=',')
    
    # witer row with header
    writer.writerow(["number1", "number2", "number3", "number4", "number5"])
    
    # write row with zeros
    writer.writerow([0,0,0,0,0])
    
    # write row with random numbers and list comprehension
    writer.writerow([random.random() for i in range(5)])

You can also **load data** from a CSV file in the same way. In principle, this works exactly the same, except that the file is now opened in **read mode** `r` and a **reader** is created like `csv.reader()`. Then you can get the rows of the CSV file as a **sequence** from the **reader**, e.g. with a `for` loop. 

Let us try to **read** our **test data** from the CSV file again. 

In [58]:
# open file
with open(file="test.csv", mode="r") as csv_file:
    # create reader
    csv_reader = csv.reader(csv_file, delimiter=',')
    
    # iterate over rows in file
    for row in csv_reader:
        # print row
        print(row)

['number1', 'number2', 'number3', 'number4', 'number5']
[]
['0', '0', '0', '0', '0']
[]
['0.6488539744120456', '0.8325765404421139', '0.9888735139157468', '0.9322143923769549', '0.7933123406870071']
[]


<div class="alert alert-block alert-info">
    <b>Exercise</b>: Download the table available <a href="https://raw.githubusercontent.com/therbootcamp/BaselRBootcamp_2018July/master/_sessions/_data/baselrbootcamp_data/%5EGDAXI.csv">here</a>. Import the data as nested list. Use the module csv. Inspect the data. 
</div>

In [19]:
# open file
with open(file="GDAXI.csv", mode="r") as csv_file:
    # create reader
    csv_reader = csv.reader(csv_file, delimiter=',')
    # create list
    gdaxi = []
    
    # iterate over rows in file
    for row in csv_reader:
        # add row to list
        gdaxi.append(row)
    


FileNotFoundError: [Errno 2] No such file or directory: 'GDAXI.csv'

## 3.5 External Packages

One of the major advantages of Python is its variety of **external packages**. In addition to built-in modukes, like they are not enough already, there is an enormous amount of external packages. These external packages have to be **installed** before they can actually be imported and used. After they are installed they can be used like built-in modules. The following list shows which **packages** we will use today and in the rest of the week. 

| Package | Description |
| -------- | ------- |
| `numpy` | scientific computing with arrays |
| `sklearn` | machine learning |
| `pandas` | data structures and data analysis |
| `matplotlib` | figures |
| `beautifulsoup4` | parsing of web pages |


In [None]:
pip install numpy

In [None]:
pip install sklearn

In [None]:
pip install pandas

In [None]:
pip install matplotlib

In [None]:
pip install beautifulsoup4

Original Source: 
Python Block Course; Prof. Dr. Karsten Donnay, Stefan Scholz; Winter Term 2019 / 2020