# Python Workshop
# Session 1: Python Basics and Programming Fundamentals

Stefan Scholz

In this first session we will learn how to use **Jupyter Notebooks** and other alternatives how to execute Python code. Furthermore, by doing so we will see how basic **functions** are executed, information is stored in **variables** and code is structured with **conditional** and **control flow statements**. We will also see how to use **object-oriented programming** with **classes** and **modules**.

## 1.1 Python

Python is a programming language that enjoys great popularity because of various advantages. Some advantages are:

- Fast to Learn
- Easy to Write
- Flexible
- Extensive Library Support
- Hidden Low-Level Aspects of Computer Architecture

This makes Python an ideal basis for beginners like us to start programming because of its simplicity. But beyond that it is also used by experts worldwide, e.g. in machine learning and software development, because of its extensive libraries.


### Installation

- Installed from [python.org](https://docs.python.org/3/using/index.html)
- On Linux: already installed or installable as package of the Linux Distribution (Debian, Ubuntu, Red Hat, SuSE, etc.)
- Otherwise: it's recommended to rely on a distribution which bundles the Python interpreter with common Python modules and tools - esp. [Anaconda](https://docs.anaconda.com/anaconda/), a distribution of Python and R for scientific computing

## 1.2 Development Environments

### Jupyter Lab

Jupyter Lab is a very useful interface for Python. Inside Jupyer Lab you can work with Jupyter Notebooks, exactly one like you are looking into right now. These notebooks are one of the possibilities how to write and execute Python code. The major advantage over others is that code can be written very interactively, because the entire code can be divided into smaller snippets, and these snippets are directly next to their output.

A Jupyter notebook consists of separate text cells and code cells. A cell can be executed either by clicking on the play button in the tool bar on top or by pressing __shift + enter__.

If you make a mistake you can always edit the content of a cell and execute it again to update it. You can insert additional cells by clicking on the plus button in the tool bar on top.


### Editors

A good editor or an [integrated development environment (IDE)](https://en.wikipedia.org/wiki/Integrated_development_environment) will speed up coding by providing autocompletion, syntax highlighting and syntax checking. If your code gets bigger, an IDE supports the development by automated builds and deployments of the code, a runtime for tests and a visual debugger to locate errors ("bugs") in your code.

Unfortunately, there are many good IDEs available for Python, to list just a few:

- [PyDev](https://www.pydev.org/)
- [Visual Studio Code](https://code.visualstudio.com/docs/languages/python)
- [PyCharm](https://www.jetbrains.com/pycharm/) (commercial)


### Virtual Environments

Why you need encapsulated environments to run applications or projects? The documentation of the [Python virtual environments](https://docs.python.org/3/tutorial/venv.html) explains...

> Python applications will often use packages and modules that don’t come as part of the standard library. Applications will sometimes need a specific version of a library, because the application may require that a particular bug has been fixed or the application may be written using an obsolete version of the library’s interface.
>
> This means it may not be possible for one Python installation to meet the requirements of every application. If application A needs version 1.0 of a particular module but application B needs version 2.0, then the requirements are in conflict and installing either version 1.0 or 2.0 will leave one application unable to run.

1. create a virtual environment in current director in the subfolder `.env/`
   ```
   python3 -m venv .env
   ```
2. activate the environment
   ```
   source .env/bin/activate
   ```
3. install packages (placed below `./.env/`)
   ```
   pip install ...
   ```
4. run Python...
5. deactivate the environment
   ```
   deactivate
   ```

If more than Python modules are project-specific: [Docker](https://docs.docker.com/get-started/) allows to bundle a Python interpreter (eg. an older version), specific modules and additional software, pack it as runtime image and run it in a "container" without the need to install anything on the host system.


### Google Colaboratory

[Google Colaboratory](https://colab.research.google.com/) or "Colab" is a Jupyter notebook environment running in the Google cloud. The notebooks are stored on Google Drive. Basic usage is free but requires a Google user account. The paid "Colab Pro" allows to use more hardware resources (RAM, CPU, GPU, TPU) and to run the notebooks without being connected to a web browser.

Colab supports [loading notebooks on Github](https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb). To load one of the workshop notebooks, please navigate to https://colab.research.google.com/github/stefan-scholz/python-workshop-2023/blob/main/.

Running a notebook in the cloud requires that analyzed data is uploaded to the cloud or is available online. This might be a hurdle if the data is private or sensitive. In order to load the workshop data from Github into the workspace of a Colab notebook, simply add a cell to the beginning of a notebook with the following two instructions:

```
!git clone https://github.com/stefan-scholz/python-workshop-2023.git
%cd /content/python-workshop-2023/
```

<div class="alert alert-block alert-info">
    <b>Discussion</b>: Which Python environment are you planning on using?
</div>

## 1.3 Calculations

As you will see, you can use Python like a simple calculator. You can use the arithmetic operators `+`, `-`, `*`, `/`  and parentheses `()` just like you would expect for a normal calculator. The Python symbol to calculate powers is `**`.

Let us give it a try: What is the result of $\sqrt{5-3}$?

In [None]:
(5-3)**(1/2)

## 1.4 Built-In Functions

A convenient way to get things done in Python is to use functions. Functions are indicated by round brackets which are appended directly after the name of the function. Inside the brackets the input is handed over to the function.

We have seen above that after our simple calculation the result is automatically shown, but in pratice it is better if the output is explicitly called by a function. This is done with the function `print()`, which prompts all inputs on the screen.

As we can see, the following code gives us the same result, but we make clear that we want to get the result prompted.

In [None]:
# print result
print((5-3)**(1/2))

However, the function `print()` can do much more that just prompt simple inputs. It allows you to combine multiple inputs, either as list of inputs or as a formatted string.

In [None]:
# print as list of inputs
print("The result is:", (5-3)**(1/2))

In [None]:
# print as formatted string
print("The result is: {}".format((5-3)**(1/2)))

What else you can do with the function `print()`, you can find out with another function: The function `help()` gives you the purpose of a function, its input parameters and descriptions.

The help for the function `print()` looks as follows, where value are the input arguments we have given into the function so far.

In [None]:
help(print)

If you do not know a certain function, or if you are uncertain, then please use the function `help()` or check it on the internet.

To name a few more functions besides `print()` and `help()`, these are also popular functions:

| Function | Purpose |
| -------- | ------- |
| `abs()` | absolute value of the argument |
| `dir()` | list of arguments and methods |
| `len()` | number of items in a container |
| `max()` | with a single iterable argument, return its biggest item |
| `min()` | With a single iterable argument, return its smallest item |
| `open()` | open file and return a stream |
| `range()` | produces a sequence of integers from start (inclusive) to stop (exclusive) by step |
| `round()` | round a number to a given precision in decimal digits (default 0 digits) |
| `sorted()` | new list containing all items from the iterable in ascending order |
| `sum()` | sum of iterable of numbers|
| `type()` | objects type |
| `zip()` | tuple where the i-th element comes from the i-th iterable argument |

For a complete list of functions which are always available in the Python interpreter, please have a look at the [Python documentation](https://docs.python.org/3/library/functions.html).

## 1.5 Variables

For more complex calculations it is convenient to store numbers as variables. As a variable name, any combination of letters, numbers, and underscores which is not starting with a number can be used in principle. For readability purposes, it is recommended to use lowercase words separated by underscores as variable names. The equals sign `=` is used to assign a value to a variable.

Let us try it with variables again: What is the result of $\sqrt{5-3}$?

In [None]:
# define variables
base = 5 - 3
exponent = 1 / 2

# calculate result
result = base ** exponent

# print result
print(result)

Note, that the `=` in code is similar to the equal sign `=` in math, but it is not quite the same. Python always evaluates code on the right of the equal sign and assigns it to the variable on the left.

So while the first snippet works, the second one does not:

In [None]:
# define variable correctly
base = 5 - 3

In [None]:
# define variable incorrectly
5 - 3 = base

However, variables can also be used on the right side. Note, that the right side is evaluated first and then assigned to the variable.

In [None]:
# define base with right side variable
base = 5
base = base - 3

# print base
print(base)

Here, we took the value of `base`, subtracted 3 from it and then assigned the result back to the same variable name `base`. As a shortcut for such an operation we can use the operator for subtraction assignment `-=`.

In [None]:
# define base with subtraction assignment
base = 5
base -= 3

# print base
print(base)

This shortcut works for all common operators with so-called assignment operators:

| Operator | Description |
| -------- | ------- |
| `+=` | adds right side to left variable and assigns the result to left variable |
| `-=` | subtracts right side from left variable and assigns the result to left variable |
| `*=` | multiplies right side to left variable and assigns the result to left variable |
| `/=` | divides right side from left variable and assigns the result to left variable |

As we said earlier, always make sure that your variable names do not start with a number, are written in lowercase words, seperated by underscores and clearly identify your variable. Pay attentation to their upper and lower case letters as well, because variables names are case sensitive in general. Also make sure that none of your variable names conincides with a set of keywords which are reserved and have a specific meaning.

The following set of keywords cannot be used in Python as variable names:

In [None]:
import keyword

# print keywords occupied by interpreter
print(keyword.kwlist)

<div class="alert alert-block alert-info">
    <b>Exercise</b>: Compute numerically the sum and average of all integers from 0 to 1,000. Use variables for all intermediate results. Print your results.
</div>

## 1.6 Data Types

In Python, every variable value has a certain data type. Actually every data type is a class and every value is an instance of any of these classes. When we declare a variable, we do not need to explicitly mention the data type. This feature is famously known as dynamic typing.

We will discuss the standard data types in Python step by step. But first let us get an overview of them all:

| Data Type | Description |
| -------- | ------- |
| `Integer` | integer number |
| `Float` | floating point number |
| `Boolean` | truth value (either true or false) |
| `String` | text |
| `Array` | mutable sequence of values with same type |
| `Tuple` | immutable sequence of values of any types |
| `List` | mutable sequence of values of any types |
| `Dictionary` | associative mapping with keys and values |
| `Set` | unordered set of distinct values |

### Integer and Float

Ideally we start with the numerical types because we have already got to know them.

The first is `Integer` and can hold whole numbers, positive or negative. `Integer` cannot hold decimals. In order use decimals, a `Float` must be used. `Float` can also hold positive and negative numbers.

Let us try this with an example. To verify the type of any object, we use the function `type()`.

In [None]:
# define integer number
base = 5

# check type of integer number
print(type(base))

In [None]:
# define decimal number
result = 1.4142

# check type of decimal number
print(type(result))

### Boolean

The next data type is of great importance in programming, as we will see with the conditional and control flow statements today. `Boolean` can hold truth values, which are either `True` or `False`. In comparison to the numerical context they behave like `1` and `0`.

Let us initialize our first boolean. To verify the type of any object, we use the function `type()`.

In [None]:
# initialize boolean
difficult = False

# check type of boolean
print(type(difficult))

In [None]:
# initialize boolean
understood = True

# check type of boolean
print(type(understood))

### String

We have already seen the next data type too, as we printed text with the function `print()`. In general, text can be held in `String`, with the text surrounded either by single quotation marks or double quotation marks. Accordingly `"Hello world"` is the same as `'Hello world'`. To write a `String` over multiple lines, you can use three quotation marks.

Let us write some first texts. To prompt the texts, we use the function `print()`. To verify the type of any object, we use the function `type()`.

In [None]:
# initialize string
sentence = "I love Python!"

# check type of string
print(type(sentence))

### Array

In addition to the primitive data types that we have dealt with so far, there are also data types which are rather a collection of these. Traditionally, the `Array` is the first non-primitive data type that is dealt with. In general, `Array` in Python are a compact way of collecting basic data types, all the entries in an `Array` must be of the same data type. However, `Array` is not popularly used in Python and not a build in data type, unlike in other programming languages. To work with them you would need to `import` additional libraries.

In general, when people talk of an `Array` in Python, they are actually referring to `List`. However, when we will work with the `numpy` library, you will see that there are fundamental differences between them. But at this point, we will first discuss what a `List` is.

### Tuple

But before we actually get to `List`, we will discuss another data type that exists in Python. `Tuple` is another standard sequence data type. The main difference between `Tuple` and `List` is that `Tuple` is immutable, which means once defined you cannot delete, add or edit any values inside it. Due to this property, `Tuple` is mainly suitable for collections which are fixed and will not change.

`Tuple` is typed with round brackets, `(` and `)`, and its elements are separated by a comma `,`. To access certain values inside your `Tuple`, you can use their indexes.

Let us create our first `Tuple`. To verify the type of any object, we use the function `type()`.

In [None]:
# define tuple
points = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# check type of tuple
print(type(points))

### List

A `List` is a data structure that contains multiple values in an ordered sequence. These are mutable, which means that you can change their content without changing their identity. You can recognize a `List` by its square brackets `[` and `]` that hold elements, separated by a comma `,`. `List` is built into Python: you do not need to invoke them separately.

Let us create our first list. To verify the type of any object, we use the function `type()`.

In [None]:
# define list of decimal numbers
points = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# check type of list with decimal numbers
print(type(points))

### Dictionary

Like a `List`, a `Dictionary` is a collection of many values. Unlike `List`, which is indexed by a range of numbers, `Dictionary` is indexed by keys. Keys for `Dictionary` can use many different data types, not just integers. It is best to think of a dictionary as an unordered set of key-value-pairs, with the requirement that the keys are unique (within one dictionary). A pair of curly brackets creates an empty `Dictionary`: `{` and `}`. In them you can add as many key-value-pairs as you like, each written in the notation `key: value`, and separated by commas `,`.

`Dictionary` is exactly what you need if you want to implement something similar to a telephone book. None of the data structures that you have seen before are suitable for a telephone book.

Let us create our first `Dictionary`. To verify the type of any object, we use the function `type()`.

In [None]:
# define dictionary
happy = {"no": [0, 1, 2, 3, 4], "yes": [5, 6, 7, 8, 9, 10]}

# check type of dictionary
print(type(happy))

### Set

A `Set` is an unordered and mutable collection of distinct (unique) elements. It is useful to create something like a `List` that can only hold unique values. This is particularly helpful when going through a huge dataset. `Set` objects also support mathematical operations like union `|`, intersection `&`, difference `-`, and symmetric difference `^`. Within a pair of curly brackets `{` and `}`, you can add as many elements to the `Set` as you like, each separated by a comma `,`.

Note, however, that if you want to create an empty `Set`, you do not use the curly brackets because they will create a dictionary, but you rather call the function `set()` explicitly.

In comparison to a `List` or `Tuple`, a `Set` cannot be accessed with indexing or slicing because it is an unordered data type. However, a `Set` can be edited, but this only works with methods which we will introduce later.

Let us create a first set anyway and show that the elements are deduplicated automatically.

In [None]:
# define set
points = set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10])

# check type of set
print(type(points))

## 1.7 Conditional Statements

So far you know the basics of individual instructions and that a program is just a series of instructions. But the real strength of programming is not just executing one instruction after another. Based on how the expressions evaluate, the program can decide to skip instructions, repeat them, or choose one of several instructions to run. In fact, you almost never want your programs to start from the first line of code and simply execute every line, straight to the end. Control flow statements can decide which Python instructions to execute under which conditions.

For this, we have to understand the different types of operators in the first place.

### Comparison Operator

Comparison operators are used to compare two values and evaluate down to a single `Boolean`. They evaluate either as `True` or `False`. In Python, you can can use the following comparison operators.

| Operator | Description  |
| -------- | ------- |
| `==` | equal |
| `!=` | not equal |
| `>` | greater than |
| `<` | less than |
| `>=` | greater than or equal to |
| `<=` | less than or equal to |

Let us try comparison operators out with a little example.

In [None]:
# define age
age = 17

# print first comparison
print("I am allowed to get beer:", age >= 16)

# print second comparison
print("I am allowed to get spirits:", age >= 18)

### Logical Operator

Logical operators are used to compare multiple `Boolean`. Like comparison operators, they evaluate these expressions down to a `Boolean`. Since the comparison operators evaluate to `Boolean`, you can use them in expressions with the logical operators. Note that these operators are often grouped in smaller statements with round brackets `(` `)`. In Python, you can use the following logical oporators.

| Operator | Description  |
| -------- | ------- |
| `and` | returns `True` if both statements are `True` |
| `or` | returns `True` if at least one of the statements is `True` |
| `not` | returns `True` if result is `False` and vice versa |

Let us put logical operators in our previous example.

In [None]:
# define age
age = 17

# print first comparison
print("I am allowed to get beer and spirits:", age >= 16 and age >= 18)

# print second comparison
print("I am not allowed to get beer and spirits:", not (age >= 16 and age >= 18))

# print third comparison
print("I am allowed to get either beer or spirits:", age >= 16 or age >= 18)

## 1.8 Control Flow Statements

With the help of conditional statements, we can decide which parts of our code are skipped, repeated or select which part of code is executed. To do this, we write control flow statements. We will now discuss each of these statements one after the other.

### If Statement

The most common type of control flow statements is the `if` statement. An `if` statement's clause will execute only if the statement's condition is `True`. The clause is skipped if the condition is `False`. The `else` clause is executed only when the `if` statement's condition is `False`. While only one of the `if` or `else` clauses will execute, you may have a case where you want one of many possible clauses to execute. The `elif` statement is optional and is an else if statement that always follows an `if` or another `elif` statement. The `if` and `elif` statements' condition are gone through from top to bottom, and once a condition is `True`, the respective clause will be executed and the remaining statements will be skipped.

The syntax of an `if` statement looks as follows.

```python
if condition1:
    clause(s)
elif condition2:
    clause(s)
elif condition3:
    clause(s)
...
else:
    clause(s)
```

Let us make an example where we check our statements before we print the results.

In [None]:
# define age
age = 17

# check conditions and print results
if age < 16:
    print("I am allowed to get water!")
elif age < 18:
    print("I am allowed to get water and beer!")
else:
    print("I am allowed to get water, beer and spirits!")

### For Loop

The `for` statement supports repeated execution of a block of code. We also speak of a `for` loop. The number of repetitions is specified by the for statement, more precisely by an iterable object, e.g. `List`.  We will also discuss shortly what a `break` and `continue` statement is.

The syntax of a `for` loop looks as follows.

```python
for target in iterable:
    clause(s)
```

Let us make an example where we loop over many ages and print their results.

In [None]:
# define ages
ages = range(0, 19)

# print ages
print(list(ages))

# for loop over ages
for age in ages:
    # check conditions and print results
    if age < 16:
        print("I am {} years old and allowed to get water!".format(age))
    elif age < 18:
        print("I am {} years old and allowed to get water and beer!".format(age))
    else:
        print("I am {} years old and allowed to get water, beer and spirits!".format(age))

### Break Statement

If inside a `for` loop the execution reaches a `break` statement, it immediately exits the `for` loop's clause. The `break` statement is only allowed inside a loop body. It should also be noted that when you have multiple nested `for` loops, only the inner-most nested loop will be exited. A common reason to use a `break` statement is when a certain point has already been reached before the `for` loop finished.

Let us make an example where we exit the previous loop.

In [None]:
# define ages
ages = range(0, 19)

# print ages
print(list(ages))

# for loop over ages
for age in ages:
    # check conditions and print results
    if age < 16:
        print("I am {} years old and allowed to get water!".format(age))
    elif age < 18:
        print("I am {} years old and allowed to get water and beer!".format(age))
        break
    else:
        print("I am {} years old and allowed to get water, beer and spirits!".format(age))

### Continue Statement

If inside a `for` loop the execution reaches a `continue` statement, it immediately exits the current iteration of the loop body and it continues with the next iterations of the loop. The `continue` statement is only allowed inside a loop body. It should also be noted that when you have multiple nested `for` loops, only the inner-most nested loop will be continued. A common reason to use a `continue` statement is when a certain point has been reached when the result already cannot be reached inside this interation that the rest of it can be skipped.

Let us make an example where we skip some iterations.

In [None]:
# define ages
ages = range(0, 19)

# print ages
print(list(ages))

# for loop over ages
for age in ages:
    # check conditions and print results
    if age < 16:
        continue
        print("I am {} years old and allowed to get water!".format(age))
    elif age < 18:
        print("I am {} years old and allowed to get water and beer!".format(age))
    else:
        print("I am {} years old and allowed to get water, beer and spirits!".format(age))

### While Loop

In the `while` statement, the condition is always checked at the start of each iteration. That is, each time the loop is executed. We also speak of a `while` loop. If the condition is `True`, then the clause is executed, and afterward, the condition is checked again. The first time the condition is found to be `False`, the `while` clause is skipped. If the execution reaches a `break` statement, it immediately exits the `while` loop's clause. When the program execution reaches a `continue` statement, the program execution immediately jumps back to the start of the loop and reevaluates the loop's condition. This is also what happens when the execution reaches the end of the loop.

The syntax of a `while` loop looks as follows.

```python
while condition:
    clause(s)
```

Let us make an example where we loop over many ages and print their results.

In [None]:
# define ages
age = 0

# for loop over ages
while age < 19:
    # check conditions and print results
    if age < 16:
        print("I am {} years old and allowed to get water!".format(age))
    elif age < 18:
        print("I am {} years old and allowed to get water and beer!".format(age))
    else:
        print("I am {} years old and allowed to get water, beer and spirits!".format(age))
    # increment age
    age += 1

<div class="alert alert-block alert-info">
    <b>Exercise</b>: Suppose Hans goes out for a pub crawl tonight. He is 18 years old, drinks only Tequilas and has 35 Euros in his pocket. If he joins group 1 first, then he goes into the Klimperkasten and wants to drink at least 5 tequilas there before he moves on to the Steigenberger. But if he joins group 2 first, then he goes into the Steigenberger and wants to drink at least 5 tequilas there before he moves on to the Klimperkasten. A Tequila costs 4 € in the Klimperkasten and 10€ in the Steigenberger. Define budget, prices and the first group as variables. Prompt where Hans goes, what he drinks and how much money he has left. How does Hans' evening go?
</div>

## 1.9 Functions

User-defined functions have the goal to recycle code blocks, such that the same code block can be executed several times. With functions, the code can be made more understandable and modular. In addition, functions offer a certain flexibility through their input parameters, so that the code can be used for the same purpose in the broadest sense.

To declare your own function in Python, you must first write the keyword `def`, then the name of the function itself, followed by the parameteres in round brackets `(` `)`, and end the declaration with a colon `:`. This is followed by the code to be executed in the function which is indented. At the end of the code there can be a `return` statement, which returns one or more values, so that they can be used outside the function.

Let us look at the syntax of a function.

```python
def function_name(argument_1, argument_2):
    {this is the code in the function}
    {more code doing something with the arguments}
    {more code}
    return {value to return to the main program}
```

Suppose we have a list of student we want to welcome personally. We could write the greeting for each student individually, or better automate it with a function.

In [None]:
# define function
def hello(name):
    print("Hello {}!".format(name))

# define list of students
students = ["Hans", "Adam", "Christine"]

# loop over students
for student in students:
    hello(student)

In the function above, the function has a required parameter. If we would call the function without a name, the function would abort. However, if we set a default parameter, then we take the default in case no other parameter is passed.

Let us set a default parameter.

In [None]:
# define function
def hello(name = "Lisa"):
    print("Hello {}!".format(name))

# call function without name
hello()

# call function with name
hello("Eva")

## 1.10 Classes and Objects

Object-oriented programming, or short OOP, is a programming paradigm which provides a means of structuring programs such that properties and behaviors are bundled into individual objects. Another common programming paradigm is procedural programming which structures a program like a recipe in that it provides a set of steps, in the form of functions and code blocks, which flow sequentially in order to complete a task. Objects are at the center of the object-oriented programming paradigm, not only representing the data, as in procedural programming, but in the overall structure of the program as well. An object can be anything that has some characteristics and functions. Focusing first on the data, each thing or object is an instance of some class. Also classes are used to create new user-defined data structures that contain arbitrary information about something.

Object-oriented programming offers the following advantages:

- If code is written in classes, it can be shared by multiple instances and reused multiple times.
- The modular structure, in which classes are strictly separated, ensures clear and maintable code.
- Through the logical separation of each object, possible errors can be traced back to the actual problem more easily, especially with highly nested code.
- When a user writes his code, then it is more clear which object and which data he is working with.

Despite these advantages, object-oriented programming also has a few drawbacks:

- With the amount of code and the number of classes, also the overall complexity of the program increases.
- If real-world objects and their relations are unclear, then it can be very difficult to find an object-oriented structure.

In a few seconds you will see some examples of how object-oriented programming may look like.


### Classes

A class is a blueprint for an object. A class contains all the attributes and methods related to the real-world object. With the keyword `class` you can create such a class, followed by the class name and a colon `:`. You can initialize an object of a class exactly like you use a function. With this object you can work with its attributes and methods over attribute references `object.attribute` as it is common in Python.

As you can see you can initialize an instance like a parameterless function and assign it to a variable. Then you can work with the instance using the corresponding variable. In this first example we will create an empty object, without specialising it further during its initialization. But often we want to pass certain attributes to new objects. To do this, you can write an initialization method `__init__`. When you create your object, you can pass the parameters for that specific object into the class. This syntax is basically the same as with a function again.

In the initialization method, `self` refers to the newly created object. In any other method, it refers to the instance whose method was called. However it is nothing more than a convention: the name `self` has absolutely no special meaning to Python.

Let us create a class for estimating sentiments with one attribute and four methods.

In [None]:
class Sentiment:

    values = {"sad", "neutral", "happy"}

    def __init__(self, value="neutral"):
        if value not in Sentiment.values:
            raise ValueError("Only the following values are supported: %s" % Sentiment.values)
        self.value = value

    def __repr__(self):
        return self.value

    def get(self):
        return self.value

    @staticmethod
    def guess(text):
        if "happy" in text or "excited" in text:
            return Sentiment("happy")
        if "sad" in text or "angry" in text:
            return Sentiment("sad")
        return Sentiment("neutral")

im_feeling = Sentiment.guess("I'm really happy!")

print(im_feeling)

## 1.11 Modules

We already looked at some built-in functions. But there are many more functions available in Python. All you have to do is import modules in which more functions, classes and variables are included. Each of these modules provides functionalities in a certain category. These built-in modules can be imported directly without the need to install them.

The syntax for importing a module looks as follows. This statement is executed at the beginning of code so that all its functionalities are available in the actual code later.

```python
import module
```

We will discuss a few of these modules and have a quick look at some use cases soon. Here is a first overview of these modules:

| Module | Description |
| -------- | ------- |
| `os` | Miscellaneous operating system interfaces |
| `random` | Generate pseudo-random numbers |
| `datetime` | Basic date and time types |
| `re` | Regular expression operations |
| `csv` | Read and write tabular data in CSV format |


### Os

The module `os` allows for many operating system tasks with dozens of functions. For example, files and directories can be localized, deleted or created.

Let us first import the module.

In [None]:
import os

To find out where Python works on your storage, and what its working directory is, you can use the function `os.getcwd()`.

Let us find out the path of our current working directory.

In [None]:
# define working directory
work_dir  = os.getcwd()

# print working directory
print(work_dir)

Next, we may be interested which files are in our working directory. This job can be done with the function `os.listdir()`.

Let us see which files are in our working directory.

In [None]:
# define working directory
work_dir  = os.getcwd()

# define files in working directory
work_files = os.listdir(work_dir)

# print files in working directory
print(work_files)

### Random

The module `random` implements a generator for pseudo-random numbers. Those can be used for instance to return a random number between 0 and 1, return random integer between certain range, make a random pick from a list and a random shuffle of a list.

Let us first import the module.

In [None]:
import random

As noted above, these numbers are only pseudo-random, because they are generated by default with the help of the system time. That means you will very very probably not get the same random number when you generate multiple numbers in a row. To make your code deterministic, you need to use a function `random.seed()` and give it an `Integer` or `String` as input. This seed is used to generate the next random number. All other drawn numbers are like a chain and build on the previous number. With the function `random.random()`, you can draw numbers in the interval of `[0,1)`.

Let us first consider the stochastic and then the deterministic behavior of random numbers without and with a seed.

In [None]:
# generate random number
random_num = random.random()

# print random number
print(random_num)

In [None]:
# seed
random.seed(2019)

# generate random number
random_num = random.random()

# print random number
print(random_num)

### Custom Modules

Object-oriented programming is particularly used in large packages. But it is worthwhile to write even for smaller projects the code in packages. The structure of packages allows the code and its classes to be structured in folders and files, which can relate to one another. At the end you can use the code very simple, as with all large modules and packages, you simply import the package and its code is available.

We will demonstate with a small demo package how you can easily package your previously written classes. The best way to do this is to start with how you can structure the folders and files. The following is an example with a possible hierarchical filesystem:

```text
.
└── scripts
    └── sentiment.py
```

Let's copy the definition of the class "Sentiment" into a file [sentiment.py](./scripts/sentiment.py) in the folder `scripts` and load the class.

In [None]:
from scripts.sentiment import Sentiment

Sentiment()

### External Modules

One of the major advantages of Python is its variety of external packages. In addition to built-in modukes, like they are not enough already, there is an enormous amount of external packages. These external packages have to be installed before they can actually be imported and used. After they are installed they can be used like built-in modules. The following list shows which packages we will use today and in the rest of the week.

| Package | Description |
| -------- | ------- |
| `numpy` | scientific computing with arrays |
| `sklearn` | machine learning |
| `pandas` | data structures and data analysis |
| `matplotlib` | figures |
| `beautifulsoup4` | parsing of web pages |

A good and common practice is to list all modules required by a project in a file `requirements.txt`. The entire list of requirements can then be installed by `pip install -r requirements.txt`. Also this project ships with a [requirements.txt](./requirements.txt).

<div class="alert alert-block alert-info">
    <b>Exercise</b>: Install the entire list of requirements in your Python environment, e.g. via the Anaconda Navigator.
</div>

## 1.12 Style Guide

Maybe you have already noticed that code can become quickly complicated, confusing and unclear. Especially if you look at your code after a long time or show it to somebody else, it will quickly become a serious problem. For that reason, there are various style guides which use quite simple rules to ensure your code is understandable, concise and clear.

They deal with following questions for example:

- How should I make imports?
- How should I use line breaks?
- How should I indent code?
- How should I comment code?
- How should I name variables?
- How should I handle exceptions?

We recommend you to have a look into the [Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md).