# Introduction to Python for People with Programming Experience


## Session 2 / 4 - 27.09.2022 9:00 - 13:00

*by Fabian Wilde, Katharina Hoff, Matthis Ebel, Mario Stanke & Felix Becker*

Contact: felix.becker@uni-greifswald.de

<br><br><br>
## List Comprehensions and Generators
<br>
For-loops in Python very often do not contain complicated code blocks. In fact, the most frequent use case is just to apply a certain function or formula to every element e.g. in a list and store the result somewhere.<br><br>
A very elegant (and also faster way) to avoid for-loops in Python in those cases is to use so-called <a href="https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions"><i><b>list comprehensions</b></i></a>.<br><br>
<b>A list comprehension in Python is defined by</b><br><br>

<font face="Courier"><b>a = [ <i>output_expression</i> for elem in x]</b></font>

Note that you could achieve the same result e.g. with `append` and a `for loop`. However, you will notice that your code is written faster and more readable and will likely use list comprehension on a daily basis in your future Python career once you get familiar.

**We can also generate elements conditionally:**

<font face="Courier"><b>a = [ <i>out_expr</i> if <i>condition</i> else <i>alt_out_expr</i> for elem in x]</b><br>
</font>
<br>
    where <i>out_expr</i> and <i>alt_out_expr</i> are expressions (which should include elem) to be evaluated. This could be anything from <font face="Courier">1+1</font> to any function call (with the element as argument). <i>condition</i> is then a Python expression returning a boolean (True/False value) as for "normal" if-elif-else statements in Python.
</font>

### Example 1:

The range function yields a generator (an iterable) where the new/next value is generated on-the-fly 
on each iteration. In order to get a full list of the numbers, enclose it with the list constructor,
forcing the range generator to be executed.

(If you do not understand this, that's okay for the moment.)

In [None]:
# Generate a list of ascending numbers
x = list(range(10))      # define elements to work on
print("x =", x)

In [None]:
# define a lambda expression that compute the square root
sqr = lambda x: x**0.5

Assume you want to apply `sqr` to all elements in `x`. You already learned how to do this with a for loop:

In [None]:
# create empty list to store the result
y = []
for elem in x:
    # evaluates function sqr with elem as argument and adds the result to the list y
    y.append(sqr(elem))
print("y =", str(y))

A better way to express this using a list comprehension is:

In [None]:
y = [sqr(elem) for elem in x]
print("y =", str(y))

### Example 2

In [None]:
#when printed, the list y is not very readable
#one could use a list comprehension to generate a list of strings with less digits right of the comma

def print_nice(y):
    return ["%.3f" % elem for elem in y]

print("rounded y=", print_nice(y))

Here the expression `"%.3f" % elem` converts the float `elem` to a string after rounding to 3 digits after the comma.

### Example 3

In [None]:
# Example 2: conditionally generate a list of values

# import numpy module to use numpy functions
# it is sufficent to import it once in any cell, typically at the start of a notebook
import numpy as np

# create list with 10 random values in [0, 1)
x = np.random.uniform(0,1,(10,))
y = [1 if elem >= 0.5 else 0 for elem in x]
print("x =", print_nice(x)) 
print("y =", y)

In [None]:
#or to just keep "True" elements and omit the else statement (i.e. filter the list) 
x = np.random.uniform(0,1,(10,))
y = [elem for elem in x if elem >= 0.5]
print("x =", print_nice(x)) 
print("y =", print_nice(y))



<div class="alert alert-block alert-success"><b>Exercise:</b> Try it yourself: In the following cell, generate a list 
    <ol type="1">
        <li>containing the lengths of all elements in `strings` without using a loop.</li>
        <li>containing the first letters of each string.</li>
        <li>containing a tuple for each string that contains the index of the string in the list (e.g. 1 for "Adam") and the last letter of the string.</li>
        <li>of lists such that two consecutive elements of `strings` are packed into their own sublist (i.e. [["Jenny", "Adam"], ["tea", "apple"], ...]
        <li>containing only the strings that start with "A" or "a".
    </ol>
</div>

Tip: For task 3. you could use Python's `zip` (https://docs.python.org/3/library/functions.html#zip) function. Also check out `enumerate` (https://docs.python.org/3/library/functions.html#enumerate).

In [None]:
strings = ["Jenny", "Adam", "tea", "apple", "keyboard", "computer"]

#YOUR CODE HERE

<br><br><br>
<font size="3">
<b>We can also define our own generators using round brackets instead of square brackets for a list comprehension:</b>
</font>

In [None]:
# create list with 10 random values in [0, 1)
x = np.random.uniform(0,1,(10,))
print("x =", print_nice(x)) 

#define a generator 
y = (1 if elem >= 0.5 else 0 for elem in x)
print(y)

<font size="3">
In fact, y now does not contain a list with values, but yields a generator object. We still get the same result as before for the list comprehension, if the generator object is used as argument for the constructor of a list (object):
</font>

In [None]:
list(y)

In [None]:
# or use the generator (like range) in a for loop
y = (1 if elem >= 0.5 else 0 for elem in x)
for elem in y:
    print(elem)

<font size="3">
The generator object lazily generates a new value at each iteration and not in advance:
</font>

In [None]:
#redefining the generator
y = (1 if elem >= 0.5 else 0 for elem in x)
iter_count = 0

In [None]:
# the elements of the list y are generated on-the-fly
# the generator yields the next element by invoking the (private / hidden) method
# of the generator object

# evaluate same cell with CTRL + ENTER
# counter increments, the generator yields a new value at each iteration
print("iter:"+str(iter_count))
print(y.__next__())
iter_count += 1

<div class="alert alert-block alert-success"><b>Exercise:</b> Define a function that generates a list of random integers and decide for each of the numbers if the number is even or odd. As output for each number, we want a string "even" or "odd". <b>Solve the problem with a for-loop and a list comprehension seperately and compare the required time.</b><br><br>
    <b>Hint:</b> Use the function <a href="https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html"><b>np.random.randint</b></a> to generate random integers. Prior using that function, import numpy with <i>import numpy as np</i>. Use the module <a href="https://docs.python.org/3/library/timeit.html">timeit</a> and wrap it around your function to benchmark it. You can also use the function time from the module <a href="https://docs.python.org/3/library/time.html">time</a> to get the actual timestamp before and after the execution of the function.
   
</div>


### Try it yourself:

In [None]:
import numpy as np
import timeit

def example_func():
    randints = np.random.randint(-100, 100, 10) #generate 10 random numbers in the range [-100, 100)
    return randints

#example timeit
print("example_func finished after:")
print(timeit.timeit('example_func()', number=100, globals=globals())) #run example_func 100 times and time it

#YOUR CODE HERE, reuse the example above

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
### Example Solution:

In [None]:
from typing import List
import numpy as np
import timeit

# function definiton using type hinting
def list_compr_fun(lower_int : int, upper_int : int, num : int) -> (List, List):
    randints = np.random.randint(lower_int,upper_int,(num,),)
    return randints, ["even" if (elem % 2 == 0) else "odd" for elem in randints]

# function definiton using type hinting
def python_for_fun(lower_int : int, upper_int : int, num : int) -> (List, List):
    randints = np.random.randint(lower_int,upper_int,(num,),)
    result = []
    for rand_int in randints:
        if (rand_int % 2 == 0):
            result.append("even")
        else:
            result.append("odd")
    return randints, result

randints, result = python_for_fun(-100,100,10)
print("randints ="+str(randints))
print("result ="+str(result))
randints, result = list_compr_fun(-100,100,10)
print("randints ="+str(randints))
print("result ="+str(result))

#compare the speed of the Python for-loop with the list comprehension
print("python_for_fun finished after:")
print(timeit.timeit('python_for_fun(-100,100,int(1E5))', number=100, globals=globals()))
print("list_compr_fun finished after:")
print(timeit.timeit('list_compr_fun(-100,100,int(1E5))', number=100, globals=globals()))

<br><br><br>
## Plain File Loading in Python

Python offers a way to read text files line-by-line. That is great, because now we can write programs that can load (lots of) data from a file. 

The syntax to do so is

    with open('path/to/file', 'rt') as file_handle:
        # do stuff with file_handle
        
`path/to/file` is the path from the current working directory to the file that you want to open. `rt` stands for "read text", this tells the `open()` function that it should expect a text file and that we want to read from it  

Python takes care of opening the file and creating a _stream_ that can be used through `file_handle`. 

- The `file_handle` is a **generator**, i.e. it iterates over lines lazily without loading the whole file into memory at once:

        with open('path/to/file', 'rt') as file_handle:
            for line in file_handle:
                print(line)


- `readlines()` reads the complete file and returns a list containing all the lines:
        
        with open('path/to/file', 'rt') as file_handle:
            file_content = file_handle.readlines()
        
        print(file_content)
    


Note: after the indented code below `with open(...) as ...:` has ended, Python automatically closes the connection to the file. That's convenient, otherwise we would have to do this manually!

In [None]:
# Read the first 10 lines of the file `data/iris.csv` and print them
with open("data/iris.csv", "rt") as fh:
    i = 0
    for line in fh:
        print(line)
        i += 1
        if i >= 10:
            break

In [None]:
# Read the complete file `data/iris.csv` and print the first 10 lines
with open("data/iris.csv", "rt") as fh:
    content = fh.readlines()
    
content[:10]

**Note:** For many file types like CSV (comma seperated values), JSON, etc., there are better ways to read provided by (third-party) modules!

#### File Writing

Of course, we can also write to files! The approach is very similar:

    with open('path/to/file', 'wt') as file_handle:
        # do stuff with the file_handle, e.g.
        file_handle.write("This line is written to the file!")
        
`wt` stands for "write text" and tells the `open()` function that it should open the file for writing, and that we want to write text to it.  

Useful `file_handle` methods are
* `write(string)` - writes the string to the file
* `writelines([string1, string2, ...])` - writes each element in the list to the file

<font size="3"><div class="alert alert-block alert-success"><b>Exercise:</b><br>
Create a file from Python! First, create a `dict` with some keys (`str`) and some values. The values might be `int`, `float`, `str`, `bool`, `list` or `dict` (please do not use `tuple` and `set`!).<br>
Open a file for text writing, and use the function `dump()` from the module `json` to write your dictionary to that file!
</div>

Hints:
* The use of `dump` is: `dump(dictionary_name, file_handle)`
* You don't need to use another `write()` or `writelines()` method!

In [None]:
import json

# YOUR CODE HERE

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
### Example Solution

In [None]:
import json

my_dict = {
    'key1': 1,
    'key2': [True, False],
    'foo': 'bar',
    'new key': 42
}

with open('testfile.json', 'wt') as file_handle:
    json.dump(my_dict, file_handle)

In [None]:
with open('testfile.json', 'rt') as fh:
    for line in fh:
        print(line)

## Directory Batch Processing
<br>
<font size="3">
Very often, you not only need to load a single file, but a whole set of files located in one or more directories. Luckily, Python also offers builtin solutions for this problem. A directory can be traversed (also recursively) to yield a list of files to be processed in your Python script.<br><br>
    <b>The most simple solution is offered by the builtin library <i>os</i> with <i>os.walk</i>.</b>.
</font>

### Examples:

In [None]:
# imports the required module
import os

def walk_through_files(path, file_extension='.txt'):
    for (dirpath, dirnames, filenames) in os.walk(path):
        for filename in filenames:
            if filename.endswith(file_extension): 
                # yield keyword instead of return defines a generator instead of a regular function
                yield os.path.join(dirpath, filename)

In [None]:
# that's why we iterate over walk_through_files instead of calling it
for fname in walk_through_files("data/batch/"):
    print(fname)

In [None]:
# force the generator to return a list with the results
print(list(walk_through_files("data/batch/")))

<font size="3"><div class="alert alert-block alert-success"><b>Exercise:</b> Write a function to traverse the path data/batch/, read the found files and concatenate their content to one string separated by spaces.
</div>
    
Hints: 
* concatenate strings in a list `ls` using a string `s` as separator with `s.join(ls)`, e.g.  


    print(" ".join(['create', 'a', 'single', 'sentence']))
    
    create a single sentence


* Remove trailing spaces and newline characters from a string with the string method `rstrip()`
    
<b>Try it yourself here:</b></font>

In [None]:
#YOUR CODE HERE

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
### Example Solution:

In [None]:
# iterate over files
buffer = []
for fname in walk_through_files("data/batch/"):
    print(fname)
    # open file
    with open(fname, 'rt') as fh:
        # read file content and put it in the buffer list
        buffer.extend(fh.readlines())
    
# remove newlines
for i in range(len(buffer)):
    buffer[i] = buffer[i].rstrip()

print("Result:", " ".join(buffer))

However, we should not make assumptions about the order in which files are iterated. It might not be the same order as they are displayed for us in the file browser!

In [None]:
print(str.join.__doc__)

<br><br><br>
## Non-interactive Python: 
## Running your code from the command-line

So far, you have used Python interactively in this Jupyter notebook, but often, you'd like to run a long task in the background and you don't need to see intermediate results. You can run Python scripts from the command-line to achieve this behaviour. 
    
**Here, we assume that you're using Linux.** But it works in a similar manner if you're using Windows. If you'd like to run your Python script on the command-line (which is Bash under Linux in the default case), you need to save your script first and then run it with

`python your_script.py` or `python3 your_script.py`


In case you have both, Python 2.x and Python 3.x installed on your system, make sure, you're running the right interpreter using

`which python` or `python --version` 

Of course, you can also directly run your Python script on the command-line treating is it as a Bash script (a script file for the command-line), but putting the "shebang line" 

`#!/usr/bin/python3`

as first line in your Python script to tell Bash which interpreter to use for the file.

Then, you need to make your newly created file **executable** by adding the flag **executable** e.g. with the command

`chmod u+x your_script.py`

You can check the file permission flags in the listing of your directory using

`ls -la scripts`

Then you should be able to simply run your Python script as if it would be a Bash script using

`./your_script.py`        

In the following examples, a special "cell magic" command is used, so that bash scripts and commands can be run within a Jupyter notebook cell.

### Examples:

In [None]:
%%bash
which python3
python3 --version
python3 scripts/hello_world.py

Or directly run your python script as it would be a bash script or native executable, after you have set the executable file flag:

In [None]:
%%bash
ls -la scripts/
chmod u+x scripts/hello_world2.py
./scripts/hello_world2.py

We check what <i>hello_world2.py</i> contained by printing its file content with the command <i>cat</i>:

In [None]:
%%bash
cat scripts/hello_world2.py

## Handling of command-line arguments 
<br>
<font size="3">
Very often, it's the case that you'd like to run your Python script, but with slightly different parameters or paths with files to work on. It would be annoying to make the required changes every time in your Python code.<br><br>
Luckily, you can easily handle given command-line arguments in Python and work with them in your script. <b>The most basic option is to use the builtin library <i>sys</i> and the provided list <i>argv</i>. If any command-line argument was given to run your Python script, it will appear in <i>argv</i>.%%bash
cat scripts/cli_args.py
</font>

<font size="3">Let's check first the content of the little example script:</font>

In [None]:
%%bash
cat scripts/cli_args.py

Then we run it giving various command-line arguments:

In [None]:
%%bash 

# modifies file permission flags
chmod u+x scripts/cli_args.py

# run it without any command-line argument
./scripts/cli_args.py

# run it with one command-line argument
./scripts/cli_args.py --test1

# run it with multiple command-line arguments
./scripts/cli_args.py --test1 --test2 --test3

<font size="3">
    <b>A more convenient option to handle (define and check) command-line arguments is the module <a href="https://docs.python.org/3/library/argparse.html"><i>argparse</i></a></b>.<br><br> With <i>argparse</i> you can easily define expected mandatory or optional command-line arguments for your scripts with built-in checking of the user input. You can even define nice description and help texts for your parameters to help others to use your scripts later independently.
</font>

In [None]:
%%bash

# modifies file permission flags
chmod u+x scripts/argparse_example.py

# list file content
cat scripts/argparse_example.py

If we now attempt to run the script without command-line arguments, we get an error and argparse gives us a hint what we have done wrong:

In [None]:
%%bash
./scripts/argparse_example.py

Argparse provides a nice help function when the argument *-h%%bash
./scripts/argparse_example.py -h* is used:

In [None]:
%%bash
./scripts/argparse_example.py -h

In [None]:
%%bash
./scripts/argparse_example.py 1 2 3 --sum

<font size="3"><div class="alert alert-block alert-success"><b>Exercise:</b> Write your own script evaluating command-line arguments and run it on the command-line yourself. Either simply use the builtin <i>os.argv</i> variable or use the <i>argparse</i> module if you feel already comfortable enough.<br><br>
    <b>Simply create a new Python script in the directory <i>scripts</i> via <i>File -> New -> TextFile</i> and rename it to <i>my_script.py</i>.</b>
</div>
    
<b>Try to run it yourself here:</b></font>

In [None]:
%%bash
python3 scripts/my_script.py

<br><br><br>
## Importing Modules and Namespaces

Structuring the code in functions which can be easily reused and maintained is the first step to achieve cleaner and leaner code. Nevertheless, in big Python projects, it would result in not really well readable code and collaborating on a common project would be difficult, if all the code would be in one single file. Nowadays Python projects are therefore organised in modules, allowing to distribute parts of the code over multiple files as well. We have already imported modules/packages several times to use standard or 3rd party functions in the examples. 
    
The figure below shows an exemplaric project directory structure organized in modules and submodules:
    
<div align="center">
    <img src="img/absolute-import.jpg" width="40%">
</div>

<font size="2"><i>Source: <a href="https://www.geeksforgeeks.org/absolute-and-relative-imports-in-python/">https://www.geeksforgeeks.org/absolute-and-relative-imports-in-python/</a></i></font>
<br><br>

**In order to use an existing (3rd party) module in your new script or project, you need to import the module using**
    
`import module_name`
    
**or define a shorter alias for the module name, if it is too long when used in the code:**
    
`import module_name as alias`

**You can also import a specific class or function from module:**

`from module_name import function_name`

`from module_name import function_name as alias`

**You can also further specifiy an *absolute path* to the submodule (or the file) to import a specific class or function from:**

`from module_name.submodule_name.file_name import function_name`

Related to the project structure above, we could use the statement:

`from pkg2.subpkg1.module5 import fun3`

to specify an *absolute import* path. However, absolute imports are discouraged to use when the directory structure is very large.

A relative import with respect to the project structure in the figure above may be defined using

`from .subpkg1.module5 import fun3`
    
The best practice in Python regarding imports can be found in the official Python style guide <a href="https://www.python.org/dev/peps/pep-0008/#imports">PEP8</a>. 

The most important best practices are:

- import statements should be located at the beginning of your script.
- import statements should be sorted in alphabetical order for their module names.
- standard library imports before 3rd party imports.

### Examples:

In [None]:
# standard library imports first
# use for each import a new line
import math
import os

# the most popular example to use numpy
# imports numpy and sets the namespace to the alias "np"
# the module content is then accessible via the alias
import numpy as np

# the most popular example to plot/visualize data
from matplotlib import pyplot as plt

# import specific class from module
from tqdm import tqdm

<font size="3"><b>You can check the path, version and short documentation of a module using the module attributes:</b></font>

In [None]:
# the hidden attribute __version__ contains the module version
print(np.__version__)
# the hidden attribute __path__ contains the module path or location
print(np.__path__)
# the hidden attribute __doc__ contains a short documentation
# or description of the module, the so-called docstring
print(np.__doc__[:200], "... (more text follows) ...")

You can also import your own modules (python files):

In [None]:
from scripts import hello_world3 as hw3 #scripts is a folder in our local directory structure

hw3.useful_function()

<div class="alert alert-block alert-success"><b>Exercise:</b> Implement a simple pipeline called <i>hash</i> consisting of 3 functions that modify a user input (a string). Each function should be in its own module (python file) and the files should be imported here. Think about arguments and return values of each function. 
<br>
The 3 stand-alone functions should do:
    
<ol type = "1">
<li>Convert all characters in the string to upper case. Return the capitalized string.</li>
<li>Replace all characters with their respective ASCII code (function <i>ord</i>). Return a list of numbers.</li>
<li>Sum up all numbers in the list and return the result.</li>
</ol>

Name the 3 modules appropriately according to their roles. For example:

<ol type = "1">
<li>stringmod.py - Contains functions that modify strings in very specific ways.</li>
<li>convert.py - Contains functions that convert strings to other things that might be useful or not.</li>
<li>mymath.py - Contains simple arithmetric functions that operate on lists.</li>
</ol>
    
Try to use list comprehension and other things you already learned to keep your code as short as possible. (Bonus: Can you implement all 3 functions in at most 2 lines each?)
</div>

In [None]:
#YOUR imports here

user_string = "I don't have enough € for this coffee!!"

#YOUR CODE HERE

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
### Example Solution:

In [None]:
#just the functions, put these into separate files and import them
def stringmod(s):
    s_mod = "".join([c.upper() for c in s])
    return s_mod

def convert(s):
    return [ord(c) for c in s]

def my_sum(L):
    return sum(L)

s_mod = stringmod(user_string)
L = convert(s_mod)
x = my_sum(L)

print("User string:", user_string)
print("Upper case:", s_mod)
print("Codes:", L)
print("Sum:", x)

<br><br><br>
## Some popular Python Libraries (3rd party Modules)


Let's take a look at the most popular Python libraries which you will use yourself sooner that later. Each library is accompanied by a minimal piece of code just to give you an idea of what you can do with it.

### Numpy 

<a href="https://numpy.org/">Numpy</a> is one of the most widely used Python libraries. It offers fast handling and efficient storage of bigger amounts of numerical data in numpy arrays as well as a variety of useful arithmetric functions and utility to readin various data file formats.

In [None]:
import numpy as np

n = 10000

X = np.random.rand(n, n) #many numbers
Y = np.random.rand(n, n) #more numbers

S = X + Y #summing them in no time
M = np.amax(S) #computing the maximum in no time

In [None]:
#summing naively with loops takes much longer
S = np.zeros_like(X)
for i in range(n):
    for j in range(n):
        S[i,j] = X[i,j] + Y[i,j]

In [None]:
del X,Y,S #free memory, just because these arrays are so lange and so pointless, you normally don't have to do this

### Matplotlib and seaborn
<a href="https://matplotlib.org/">Matplotlib</a> is the most common Python library for data visualization. The library <a href="https://seaborn.pydata.org/">seaborn</a> builds on matplotlib and offers more beautiful plots and more sophisticated plot types for statistics.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)
plt.plot(t, s)

plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
plt.savefig("test.png")
plt.show()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")

# Load the example diamonds dataset
diamonds = sns.load_dataset("diamonds")

# Draw a scatter plot while assigning point colors and sizes to different
# variables in the dataset
f, ax = plt.subplots(figsize=(6.5, 6.5))
sns.despine(f, left=True, bottom=True)
clarity_ranking = ["I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"]
sns.scatterplot(x="carat", y="price",
                hue="clarity", size="depth",
                palette="ch:r=-.2,d=.3_r",
                hue_order=clarity_ranking,
                sizes=(1, 8), linewidth=0,
                data=diamonds, ax=ax)

### Pandas
<a href="https://pandas.pydata.org/">Pandas</a> is a library for chart data visualization and also for analysis and handling of large amounts of data. It became most popular for time series analysis.

In [None]:
import pandas as pd

d = {'col1': [1, 2, 9, -17], 'col2': [3, 4, 4, 1000], 'col3': ["A", "B", "C", "D"]}

pd.DataFrame(data=d)

### Scipy
<a href="https://www.scipy.org/">Scipy</a> is a general purpose library for science and engineering offering mostly functionalities for signal analysis, filtering and regression.

In [None]:
# preprocessing e.g. for machine learning
# just one of the more illustrative example for many different things you can do with scipy
from scipy import ndimage, misc
from matplotlib import pyplot as plt
panda = misc.face()
#rotatation function of scipy for image – image rotated 135 degree
panda_rotate = ndimage.rotate(panda, 135)
plt.imshow(panda_rotate)
plt.show()

### Statsmodels
As the name implies, the library <a href="https://www.statsmodels.org/">Statsmodels</a> offers a big variety of statistical models and tests for your data.

In [None]:
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Load data
dat = sm.datasets.get_rdataset("Guerry", "HistData").data

# Fit regression model (using the natural log of one of the regressors)
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()

# Inspect the results
print(results.summary())

### Tensorflow
<a href="https://www.tensorflow.org/">Tensorflow</a> is the most popular machine learning library developed mostly by Google besides the competing <a href="https://pytorch.org/">pytorch</a> by Facebook. It is popular for its automatic differentiation feature which lets you specify arbitrary models instead of just predefined ones like in statsmodels.

In [None]:
import tensorflow as tf

#load your data
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

#define a creative (and useful) model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

#train the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

### Requests

There are many other non scitific fields with very useful libraries like <a href="https://www.tensorflow.org/">Requests</a>, which lets you send http requests using python. 

In [None]:
import requests
   
# Making a GET request - let's hack my github account
r = requests.get('https://api.github.com/users/felbecker')
  
# check status code for response received
# success code - 200
print(r)
  
# print content of request
print(r.json()["name"])
print(r.json()["created_at"])
print(r.json()["site_admin"])
print(r.json()["public_repos"])

## How to list installed modules and install new modules in Python

Often you would like to use the functionality of a 3rd-party module, but the module is not installed (this may not be the case for the above libraries as we use a jupyter image with preinstalled packages).

In most Python environments, a package manager is used to administer the installed modules. 

In most cases, this is either <a href="https://docs.python.org/3/installing/index.html">pip</a> or if you use the Anaconda environment, it will be <a href="">conda</a> and pip.

`pip` (Package Installer for Python) downloads and installs packages and their dependencies from the Python Package Index (PyPI). Everyone can contribute their packages to PyPI and make them available for others.

`conda` is an open source package management system and environment management system that runs on Windows, macOS, Linux and z/OS. It is not limited to Python and installs pre-build packages. 

**In order to check which modules are installed (in your current virtual environment), you can use the command:**

`pip list` or `conda list`

on the command-line prompt in a console or in a cell:

In [None]:
%%bash
conda list

In [None]:
%%bash
conda list | grep tensorflow

These are the packages that are installed in the default environment. In our case they are part of the jupyter image that we loaded. 

However, when do you do if you need a package in different versions for different project, or do not want a very specific package that you need only for your very specific project and then never again to appear in the default environment?

A `conda` environment can have different packages and package versions installed. In order to not create a mess using conflicting versions of Python modules or imagine your code only works with specific versions of Python modules, you can create and use different Python environments.

**You can list the currently existing environments using**

In [None]:
%%bash
conda env list

<font size="3"><b>You can create a new (empty) environment using</b></font>

In [None]:
%%bash
conda create --name pyprog

<font size="3"><b>You can switch to a specific environment using</b></font>

In [None]:
%%bash
conda activate new_env

<font size="3"><b>And check on the actual activated environment using</b></font>

In [None]:
%%bash
conda info

In [None]:
%%bash
conda deactivate

<br><br><br>
## Numpy 

Often, you have a file e.g. a CSV file with comma-separated values and you need to convert the content yourself to a table-like datastructure. In this example, we have the famous <a href="https://en.wikipedia.org/wiki/Iris_flower_data_set">Iris dataset</a> (in this case a very popular standard data set in data science). Let's have a look at the first 10 lines of the file:

In [None]:
%%bash
head -n 10 data/iris.csv

Of course, we could now manually readin the file line by line, but then we would need to process the strings and convert them to a table-like data structure. <b>Luckily, there are already functions in the 3rd party libraries <a href="https://numpy.org/"><i>numpy</i></a> and <a href="https://pandas.pydata.org/"><i>pandas</i></a> that solve this problem.
    
Recently, a <a href="https://www.nature.com/articles/s41586-020-2649-2">paper</a> about the numpy Python package was published in *nature*.

<img src="img/numpy_nature.webp" width="100%">

*Source: <a href="https://www.nature.com/articles/s41586-020-2649-2">https://www.nature.com/articles/s41586-020-2649-2*
    
**Numpy** is a powerful 3rd party package offering the new data type of the <b><i>numpy array</i></b> (numpy.ndarray) with a more powerful and faster implementation in C++ in the background. **Numpy** is essential for scientific programming. In contrast to lists in Python, <b>the size of a numpy array cannot be changed without creating a new array and the best practice is to allocate space in advance by initializing an empty array (e.g. filled with zeros)</b>. Also, the data type of its elements shouldn't differ and the number of elements in each row or column has to be the same, since <b>the numpy array represents a N x M matrix.<br><br>
<b>To sum it up, a numpy array behaves like a "classic" array in other high-level programming languages due to it's precompiled C++ implementation in the background.</b><br><br>
<font size="3">In order to use Numpy, you have to import the package once. Since you will in the following often have to refer to numpy, we assign an alias that is faster to type (np) since programmers are lazy:</font>
</font>

In [None]:
# import num<py with alias np
import numpy as np

### NumPy Array

<br>
<font size="3">A NumPy Array can have many dimensions. Let's start with one (similar to a Python list) and two dimensions (similar to a matrix):</font>

In [None]:
# create a numpy array from a list
arr1 = np.array([1.0,2.35,3.141])
print("type(arr1) =", type(arr1))
print("arr1 =", arr1)
print("arr1.dtype =", arr1.dtype)

In [None]:
# create an 2D array / a 3x3 matrix filled with zeros
arr2 = np.zeros((3,3))
print("type(arr2) =", type(arr2))
print("arr2 = \n", arr2)
print("arr2.dtype =", arr2.dtype)

In [None]:
# create a numpy array filled with a specific value
arr3 = np.full((3,2), 2)
print("arr3 = \n", arr3)
print("arr3.dtype =", arr3.dtype)

Question: In the above cell, why is the dtype suddenly integer?

In [None]:
# index an array element in a multi-dimensional array
arr2[0,0] = 1
arr2[1,1] = 2
arr2[2,2] = 3
print("type(arr2) =", type(arr2))
print("arr2 = \n", arr2)
print("arr2.dtype =", arr2.dtype)

Sliced indexing follows the same rules as for regular python lists:

In [None]:
#take all elements after the first from the first axis and all elements except the last from the second axis
#returns a "view" into the original array without copying anything
sliced_arr2 = arr2[1:, :-1] 
print("sliced arr2 = \n", sliced_arr2)

In [None]:
# using the attribute "size" of the numpy array is more reliable than using the len() function
# the attribute "shape" contains a tuple with the array or matrix dimensions
print("arr2.size =", arr2.size)
# convention for the shape tuple: (number of rows, number of columns) in case of a 2D array
print("arr2.shape =", arr2.shape)

Sometimes you want or need to change the shape of your numpy array using the `reshape` method:

In [None]:
# defines a numpy array with row number unequal column number
arr3 = np.array([[2,3],[3,5],[6,4]])
print("content of arr3:")
print(arr3)

# prints the shape (number of rows, number of columns)
print("shape of arr3:")
print(arr3.shape)

# the same result could be achieved using the reshape method expecting a tuple 
# (total number of elements needs to be the same!)
arr4 = arr3.reshape(2,3)

print("reshaped array:\n", arr4)
print("shape of the reshaped array:", arr4.shape)

There is also the `transpose` method. Beware that it does *not* the same as `reshape`:

In [None]:
# transposes the array / matrix
arr5 = arr3.T
print("transposed array (rows and columns interchanged):")
print(arr5)

# prints the shape of the transposed array
print("shape of the transposed array:", arr5.shape)

<font size="3"><b>Often you would like to perform computations over an entire row or column of your array (hence your dataset), like:
    </b></font>

In [None]:
# generates a 2D matrix of random floating-point numbers
# np.random.normal generates normally distributed random numbers and expects (mean, standard deviation, size=(a,b))
arr = np.random.normal(0,2,size=(6,8))
print("random number array:")
print(arr)

# sums all values in each column
col_sum = np.sum(arr, axis=0)
#print("row_sum = "+str(row_sum))

# sums all values in each row
row_sum = np.sum(arr, axis=1)
#print("col_sum = "+str(col_sum))

# multiplies all values in each column
col_prod = np.prod(arr, axis=0)
#print("row_prod = "+str(row_prod))

# multiplies all values in each row
row_prod = np.prod(arr, axis=1)
#print("col_prod = "+str(col_prod))

# calculate mean
total_mean = np.mean(arr)
print("Overall mean:"+str(total_mean))
# calculate standard deviation
total_std = np.std(arr)
print("Overall standard deviation:"+str(total_std))

# calculate mean over all elements in a column (iterates over first array dimension)
col_mean = np.mean(arr, axis=0)
print(col_mean)

# calculate mean over all elements in a row (iterates over second array dimension)
col_mean = np.mean(arr, axis=1)
print(col_mean)

<font size="3"><b>Of course, you can also do elementwise manipulations and matrix multiplications (that was one of the original purposes of numpy):</b></font>

In [None]:
arr = np.array([1,2,3,4])
print("arr:")
print(arr)

# e.g multiply each element with value
print("result of arr * 3.141:")
print(arr * 3.141)

# compute the products of two arrays element-wise
arr2 = np.array([5,6,0,1])
print("arr2:")
print(arr2)
result = arr * arr2
print("result of arr * arr2:")
print(result)

# compute the dot or scalar product of two vectors
result = np.dot(arr, arr2)
print("result of arr (dot) arr2:")
print(result)

# more general, compute the product of two matrices
arr = np.array([[1,2],[5,6],[4,2]])
arr2 = np.array([[5,6,7],[2,3,2]])
print("arr:")
print(arr)
print("arr2:")
print(arr2)
result = np.matmul(arr, arr2)
print("result of matmul(arr,arr2):")
print(result)

<font size="3"><b>Now back to our CSV file, we can read-in the file using <a href="https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html">numpy.genfromtxt</a></b></font>

In [None]:
# print the documentation string of the function
print(np.genfromtxt.__doc__)

In [None]:
# load data file and convert it to (named) numpy array
iris_data = np.genfromtxt("data/iris.csv", names=True, delimiter=",")
# output just the first 20 rows
print("type(iris_data) = "+str(type(iris_data)))
print("iris_data[:20] = "+str(iris_data[:20]))
#print the guess data types for the columns of the table
print("iris_data.dtype = \n"+str(iris_data.dtype))

<font size="3"><b>Obviously, the numpy function couldn't properly guess the data type of the last column (just returning a <i>NaN</i> (not an number). But we can correct that:</b></font>

In [None]:
# load data file again with defined data types and column names and 
# convert it to (named) numpy array
iris_data = np.genfromtxt("data/iris.csv", names=True,\
                          dtype=[('sepal_length', np.float32), ('sepal_width', np.float32),\
                                ('petal_length', np.float32), ('petal_width', np.float32),\
                                ('species', "<U16")], delimiter=",")
print("iris_data[:20] = "+str(iris_data[:20]))
#print the guess data types for the columns of the table
print("iris_data.dtype = \n"+str(iris_data.dtype))

In [None]:
# access column data with column name
print("iris_data['sepal_length'] = \n"+str(iris_data['sepal_length'][:20]))

In [None]:
#work with the data
#compute mean of column
mean = np.mean(iris_data['sepal_length'])
#compute standard deviation of column
std = np.std(iris_data['sepal_length'], axis=0)
print("Mean of column sepal_length:" + str(mean))
print("Std of column sepal_length:" + str(std))

# round to n significant digits with np.round
print("Mean of column sepal_length:" + str(np.round(mean,3)))

# or just change the number output format

We now want to explore how to solve a relatively simple task in multiple ways to get a feeling of the variety of tools python proposes to us and an intuition of what to decide for depending on the complexity of the task.

<div class="alert alert-block alert-success"><b>Exercise:</b> We want to count the number of examples we have for each species (e.g. setosa). Try to solve this task...
       <ol type="1">
           <li>using only basic python syntax and a <b>dict</b>.
           <li>using the class <a href="https://docs.python.org/3/library/collections.html#collections.defaultdict">defaultdict</a> from the standard library <b>collections</b>.
           <li>using the <b>==</b> operator and <b>np.sum</b>.
           <li>using the function <a href="https://numpy.org/doc/stable/reference/generated/numpy.unique.html">numpy.unique</a> using the argument <i>return_counts</i>. 
       </ol>
</div>

In [None]:
# YOUR CODE HERE

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
### Example Solution:

In [None]:
species = iris_data["species"]

# 1
counts = {}
for s in species:
    if s in counts:
        counts[s] += 1
    else:
        counts[s] = 1
print("1.\n", counts)

#2
from collections import defaultdict
counts = defaultdict(int) # values will default to 0
for s in species: 
    counts[s] += 1
print("2.\n", counts)

#3 
#using dictionary comprehension
counts = {s : np.sum(species == s) for s in species}
print("3.\n", counts)

#4 
counts = np.unique(species, return_counts=True)
print("4.\n", counts)

Lets do short discussion:

1. Which is the most natural and readable attempt in your opinion?
2. Is there a *best* way of solving this task?
3. Do you ever want to use approach 1 over the others?

<font size="3"><div class="alert alert-block alert-success"><b>Exercise:</b> Write a function which loads a file (<b>in this case applied to <i>data/glass.csv<i></b>), print the shape of the resulting array, print the first N rows (adjustable via a function argument), compute and return the means and standard deviations for the data columns and print the results (adjustable via a parameter). The result of <b>numpy.genfromtxt is different in this case since glass.csv does not contain mixed data types. You don't need to specify the column data types in this case.</b><br><br>
    <b>Hint:</b> <b>Use the numpy functions <a href="https://numpy.org/doc/stable/reference/generated/numpy.mean.html">np.mean</a> and <a href="https://numpy.org/doc/stable/reference/generated/numpy.std.html">np.std</a></b> to compute the mean and the standard deviaton. The parameter axis allows to specifiy for which dimension/axis you'd like to perform the computation. So you don't need to use a loop.<br><br>
If you use the keyword argument <b>names = True</b> in numpy.genfromtxt, a named array will be returned where column data can be addressed via the column name defined in the first row of the file.
</div>
    
<b>Try it yourself here:</b></font>

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
### Example Solution:

In [None]:
import numpy as np  
def process_file(file, n_rows = 10, print_result = True): 
    glass_data = np.genfromtxt(file, delimiter=",", skip_header=1) 
    print(glass_data[:n_rows,:]) 
    means = np.mean(glass_data, axis = 0) 
    stds = np.std(glass_data, axis = 0) 
    if print_result: 
        print("means:"+str(means)) 
        print("stds:"+str(stds)) 
    return means, stds
means, stds=process_file("data/glass.csv", n_rows = 10, print_result = True)

<br><br><br>
## Visualizing Data with Matplotlib
<br>
<font size="3">
    <b>One picture can say more than 1000 words:</b> Visualizing your data is probably the most important aspect in your daily work routine. <a href="https://matplotlib.org/">Matplotlib</a> is the most common Python package for data visualization offering a multitude of different plot types and options to tailor the plots according to your needs. Even Latex can be used to obtain nice looking plot titles, axes labels and annotations.<br><br>
    <a href="https://matplotlib.org/gallery/index.html">See a plot gallery here</a> of all the possibilities and functions matplotlib offers out-of-the-box.
</font>

### Example 1:

<font size="3">First you need to import the pyplot function from matplotlib using</font>

In [None]:
import matplotlib.pyplot as plt

<font size="3">Then we plot a simple sine:</font>

In [None]:
# array [0, .01, .02, ..., 2*3.14]
# one option is np.arange
#x = np.arange(start = 0, stop = 2 * np.pi, step = 0.01) 
# another option is np.linspace
x = np.linspace(0, 2*np.pi, 100)
# applies sine elementwise
y = np.sin(x) 
# data for the 2nd graph
y2 = np.cos(x)

# figure size and resolution in dpi (dots per inch)
width_px = 640
height_px = 480
dpi = 100
# create a figure
fig = plt.figure(figsize=(width_px/dpi, height_px/dpi), dpi=dpi)
# create a new plot
ax = fig.add_subplot(111)
# create the plot
ax.plot(x, y, label="graph 1")
ax.plot(x, y2, color="red", label="graph 2")
# adds a grid
ax.grid(which="both", color="gray", linestyle=":")
# semicolon suppresses output of last line in cell
ax.set_title("Sine Wave"); 
ax.set_xlabel("x");
ax.set_ylabel("y");
# adds a legend for the graphs in the plot
ax.legend(loc="upper right")

In [None]:
# array [0, .01, .02, ..., 2*3.14]
# one option is np.arange
#x = np.arange(start = 0, stop = 2 * np.pi, step = 0.01) 
# another option is np.linspace
x = np.linspace(0, 2*np.pi, 20)
# applies sine elementwise
y = np.sin(x) 
# data for the 2nd graph
y2 = np.cos(x)

# figure size and resolution in dpi (dots per inch)
width_px = 640
height_px = 480
dpi = 100
# create a figure
fig = plt.figure(figsize=(width_px/dpi, height_px/dpi), dpi=dpi)
# create a new plot
ax = fig.add_subplot(111)
# create the plot
ax.scatter(x, y, label="graph 1")
ax.scatter(x, y2, color="red", label="graph 2")
# adds a grid
ax.grid(which="both", color="gray", linestyle=":")
# semicolon suppresses output of last line in cell
ax.set_title("Sine Wave"); 
ax.set_xlabel("x");
ax.set_ylabel("y");
# adds a legend for the graphs in the plot
ax.legend(loc="upper right")

In [None]:
# generates an array with normal distributed random values
N = int(1E4)
x = np.random.normal(0, 1, (N,))
# applies sine elementwise
y = np.random.normal(0, 1, (N,))

# figure size and resolution in dpi (dots per inch)
width_px = 640
height_px = 480
dpi = 100
# create a figure
fig = plt.figure(figsize=(width_px/dpi, height_px/dpi), dpi=dpi)
# create a new plot
ax = fig.add_subplot(111)
# create the plot
ax.scatter(x, y, label="graph 1", s=1, color="black", marker=".")
# adds a grid
ax.grid(which="both", color="gray", linestyle=":")
# semicolon suppresses output of last line in cell
ax.set_title("Random values"); 
ax.set_xlabel("x");
ax.set_ylabel("y");
# adds a legend for the graphs in the plot
ax.legend(loc="upper right")

### Example 2:

In [None]:
Img = np.random.randint(low=0, high=255, size=(100, 100, 3), dtype='B') # B: unsigned byte (0..255)
#print(Img) # Img is a three-dim array
plt.imshow(Img, interpolation="none"); # third dimension interpreted as red, green, blue
# image coordinates are numbered as in matrices, first rows (top to bottom) then columns (left to right)

### More complex Example 3:

<br>
<font size="3">
A more complex example demonstrating almost all capabilities to customize your plot:
</font>

In [None]:
# a very simple example to begin with
import numpy as np
from matplotlib import pyplot as plt

# generate the data to plot
omega = 1
# yields a numpy array of N equidistant points on the given interval
x = np.linspace(-2*np.pi, 2*np.pi, 100)
y = np.sin(x)

# here you can set the size and resolution of the plot
#resolution in dpi = dots per inch
# for the screen 75 to 120 dpi isfine
# for a publication you'd need to use a higher resolution of 150 or 300 dpi
dpi = 150
# height in inch
height = 3 
# width in inch
width = 4

# initalize the new figure
fig = plt.figure(figsize=(width, height), dpi=dpi)

# Here you could specify multiple plots aside or in a matrix configurationn
# we just want one plot in the figure, 
# thats why we set 111 = number of rows, number of columns, total num of plots
ax = fig.add_subplot(111)

# create the plot itself (a line plot connecting the points)
# plot the sine
ax.plot(x, y, color = "r", linestyle = "--", linewidth = 2, label = "sine")
# plot a constant line at zero
ax.plot(x, np.zeros(x.shape), color = "black", linestyle = "-", linewidth = 2, label = "line")
# plot a graph
ax.plot(x, x, color = "blue", linestyle = ":", linewidth = 1, label = "graph")

# use a grid in the background
ax.grid(which="both", color="gray", linestyle="-.", linewidth=0.5)

# set the plot title using Latex
# the semicolon at the end of the line is to suppress the output of the function
# (one of the rare cases where you'd end a line of Python code with a semicolon)
ax.set_title("A first plot of " + r'$y = \sin\left(2 \pi \omega t\right)$');

# set the axes labels
ax.set_xlabel("x");
ax.set_ylabel(r'$\sum_{i=1}^{N} \frac{1}{i} \psi$');

# normal string definition in Python uses "string"
# in order to use Latex in title or axes label, use r'string'
# Why is that?
# If double quotation marks are used, the string is "processed" and the so-called escape sequences (e.g. \n for a linebreak are replaced)
# If single quotation marks like r'string' are used, the string is stored as it is.

# use the plot legend
# only works if keyword argument 'label' was given to the method plot
ax.legend()

# save the plot in a file
# the file ending determines the format
# valied formats are e.g. jpg, png for raster graphics (will become pixelated if you zoom in and the resolution is too small)
# you can also export the plot as vector graphics with ending .eps (will be rendered at every zoom level)
fig.savefig("plot.png")

### More complex Example 2:

<br>
<font size="3">A histogram plot as more complex example for a plot:</font>

In [None]:
# a very simple example to begin with
import numpy as np
from matplotlib import pyplot as plt

# generate data to work with
num_values = 1E6
x = np.random.normal(0,1,int(num_values))

# here you can set the size and resolution of the plot
#resolution in dpi = dots per inch
# for the screen 75 to 120 dpi isfine
# for a publication you'd need to use a higher resolution of 150 or 300 dpi
dpi = 150
# height in inch
height = 3 
# width in inch
width = 4

# initalize the new figure
fig = plt.figure(figsize=(width, height), dpi=dpi)

# Here you could specify multiple plots aside or in a matrix configurationn
# we just want one plot in the figure, 
# thats why we set 111 = number of rows, number of columns, total num of plots
ax = fig.add_subplot(111)

# plots a histogram, use parameter density to normalize the vertical axis
h=ax.hist(x, bins=100, color="black", density=True)
ax.set_ylabel("count");
ax.set_xlabel("x");
ax.set_title("A simple histogram")
ax.grid(which="both",color="gray",linewidth=0.5,linestyle="--")

<font size="3">For further options to customize plots, have also a look at the predefined <a href="https://matplotlib.org/3.1.0/gallery/color/named_colors.html">color</a> and <a href="https://matplotlib.org/3.3.1/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D.lineStyles">linestyle</a> or <a href="https://matplotlib.org/3.3.1/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D.lineStyles">here</a> keywords on the matplotlib website.</b>
</font>

<font size="3">
<div class="alert alert-warning"><b>Exercise 5:</b> Create a plot for the function f(x) = 3x + 2 in the interval [-2, 3] in <b>black</b>. Also plot a horizontal, <b>red</b>, <b>dashed</b> line in the same graph at y = 1.
<br><br>
<b>Hint:</b> Use np.arange or <a href="https://numpy.org/doc/stable/reference/generated/numpy.linspace.html">np.linspace</a> to create an array with x values. Use the style keywords color and linestyle to change the appearance of the graphs. Use <a href="https://numpy.org/doc/stable/reference/generated/numpy.full.html">np.full</a> to create an array filled with a given value.
</div>
</font>

### Try it yourself:

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
### Example Solution:

In [None]:
from matplotlib import pyplot

# creates an array with 100 values between -2 and 3 
x = np.linspace(-2,3,100)
y = 3 * x + 2
# creates an array with 100 elements filled with 1
y2 = np.full(100, 1)

plt.plot(x,y,color="black")
plt.plot(x,y2,color="red",linestyle="--")