# 1. Python Concepts

We start by reviewing some more advanced Python concepts which might prove useful
in implementing your Machine Learning projects. In this notebook we focus on the follwoing 
topics which, in particular, review some concepts related to functions in Python:

* f-strings
* subplots with Matplotlib
* lambda functions
* generators
* built-in functions map and filter
* *args and **kwargs
* decorators
* speeding up function executions with Numba

Keywords: ```f"String"```, ```plt.subplots```, ```plt.figure.add_subplot```, ```np.ravel```, 
```lambda x: ...```, ```yield```, ```map```, ```filter```, ```*args```, ```**kwargs```,
```@decorator```, ```@jit```, ```%timeit```

***

## Important Python Libraries

There are several useful Python libraries for scientific computing and Machine Learning in Python.

* ```NumPy``` extends basic Python data structures (like lists) and provides efficient numerical functions for computations with large data arrays.

* ```SciPy``` builds on NumPy and provides extended functionality for numerical and statistical methods.

* ```Pandas``` incorporates data frames (similar to programming lanuage R) and allows for more statistical analyses.

* ```Matplotlib``` is the standard plotting library in Python.

* ```Seaborn``` builds on Matplotlib and, in particular, extends Pandas functionality to create appealing plots.

* ```Scikit-learn``` is a powerful Machine Learning library providing a wide range of learning algorithms. It builds upon NumPy, SciPy, and Matplotlib.

* ```TensorFlow``` is a library for efficient Machine Learning implementation and a standard library for Deep Learning.

* ```Keras``` is a high-level Deep Learning library build on top of TensorFlow which simplifies creating, training, and evaluating neural networks.

* ```PyTorch``` is another standard library for Deep Learning adhering more closely to basic Python principles.


In this course, we will mostly work with ```NumPy```, ```Matplotlib```, ```sckit-learn```, and ```TensorFlow``` / ```Keras```.
You will be able to load these librares like this

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

***

## f-Strings

Recommended string formatting since Python 3.6:

In [None]:
float_var = 3.141592

print(f'String formatting allows including variables like {float_var:.2f}.')

***

## Subplots with Matplotlib

Let us create some dummy images:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

dummy_images = [[[0,0,0,0,0],
                 [0,1,1,1,0],
                 [0,0,1,0,0],
                 [0,0,1,0,0],
                 [0,0,0,0,0]],
                
                [[0,0,0,0,0],
                 [0,0,1,0,0],
                 [0,1,1,1,0],
                 [0,0,1,0,0],
                 [0,0,0,0,0]],
                
                [[0,0,0,0,0],
                 [0,1,1,0,0],
                 [0,0,1,0,0],
                 [0,0,1,0,0],
                 [0,0,0,0,0]]]

We usually use ```plt.subplots``` to create a figure with several sub-figures.

In [None]:
fig, axs = plt.subplots(1,3)

axs[0].imshow(dummy_images[0])
axs[0].set_title('Example 1')

axs[1].imshow(dummy_images[1])
axs[1].set_title('Example 2')

axs[2].imshow(dummy_images[2])
axs[2].set_title('Example 3')

plt.show()

Another way to plot this is:

In [None]:
fig, axs = plt.subplots(1,3)

for i, ax in enumerate(axs.ravel()):
    ax.imshow(dummy_images[i])
    ax.set_title(f"Example {i}")

plt.show()

#### Note
that you can iterate over subplots stored in object ```axs``` 
 
```Python
In: print(axs)
Out: [<Axes: > <Axes: > <Axes: >]
```

with method ```ravel``` which creates a list of subplots.

There is a second way to create subplots which makes use of the ```plf.figure.add_subplot```
method. This is particularly useful when combining 3d and 2d sub-figures like in the
following example.

In [None]:
data_3d = np.random.multivariate_normal(mean=[1,2,4], cov=np.eye(3), size=100)

fig = plt.figure(figsize=(10,2))

ax = fig.add_subplot(1,4,1, projection='3d')
ax.scatter(data_3d[:,0], data_3d[:,1], data_3d[:,2], marker='.')

ax.set_title('3D Example')

ax = fig.add_subplot(1,4,2)

ax.imshow(dummy_images[0])
ax.set_title("Example 1")

ax = fig.add_subplot(1,4,3)

ax.imshow(dummy_images[1])
ax.set_title("Example 2")

ax = fig.add_subplot(1,4,4)
ax.imshow(dummy_images[2])
ax.set_title("Example 3")

plt.show()

***

## Lambda Functions

Typically, we declare a function like this in Python:

In [None]:
def fahrenheit_to_celsius(deg_F):
    return (deg_F - 32) * 5/9

In [None]:
fahrenheit_to_celsius(98.6)

For such simple **single expression functions** a lambda function might
be a convenient choice.

In [None]:
fahrenheit_to_celsius_lambda = lambda x: (x - 32) * 5/9

In [None]:
fahrenheit_to_celsius_lambda(98.6)

They are sometimes referred to as **anonymous functions**, as they are not 
required to be bound to a name, e.g.

In [None]:
(lambda x: (x - 32) * 5/9)(98.6)

We will see a more interesting example below!

***

## Generators

Generator functions are special functions particularly suited for 
generating sequences of various kind. 

In [None]:
def squared_sequence(limit):
    num = 0
    
    while num < limit:
        yield num**2
        num += 1

In [None]:
squared_sequence(10)

In [None]:
gen = squared_sequence(10)

In [None]:
next(gen)

In [None]:
for i in squared_sequence(10):
    print(i)

#### Note the differences to standard functions:
* Generators *yield* values, suspend the function and maintain the local state 
* The values are generated when they are required (called with ```next()```), i.e. we do not store a whole list but generate one element at a time
* Once all elements are generated, the iteration stops

However, you can still contain all elements in a list:

In [None]:
list(squared_sequence(10))

***

## Map and Filter

The built-in functions ```map``` and ```filter``` are convenient ways to 
apply a function to all elements of an iterable and avoid writing a loop for that.

With ```map``` we apply the function to each element of the iterable(s) and return the results. 
We use it in the following way:

```Python
map( function, iterable(s) )
```

Let's see some examples.

In [None]:
from matplotlib.colors import to_rgb

colours = ['green', 'red', 'blue', 'yellow']

rgb_colours = map(to_rgb, colours)

print(f"Try to print rgb_colours: {rgb_colours}\n")

print(list(rgb_colours))

In [None]:
decimals = [3.14159, 2.71828, 1.61803]

rounded = list( map( round, decimals, range(1,4) ) )

print(rounded)

With ```filter``` we apply a boolean function to an iterable and provide only the elements
which returned ```True```. We use it like this:

```Python
filter( function, iterable )
```

where ```function``` returns boolean values ```True``` or ```False```.

In [None]:
def passed(grade):
    return grade > 4.0

In [None]:
grades = [5.5, 6.0, 2.0, 3.5, 4.5]

list( filter(passed, grades) )

A typical use case of lambda functions is actually in connection with ```map``` or ```filter```.

In [None]:
list( filter( lambda grade: grade > 4.0, grades ) )

In [None]:
list( map( lambda grade: grade > 4.0, grades ) )

***

## *args and **kwargs

Sometimes you see these ```*args``` and ```**kwargs``` arguments in functions and classes.
We use them to write functions with variable number of arguments of positional and keyword arguments!

Here's an example:

In [None]:
def print_arguments(first, *args, **kwargs):
    
    print(first)
    
    if args:
        print(args)
    if kwargs:
        print(kwargs)

In [None]:
print_arguments()

In [None]:
print_arguments(1)

In [None]:
print_arguments(1, 2, 3)

In [None]:
print_arguments(1, 2, 3, color='red', marker='x')

#### Note 
that with ```*args``` we collect additional positional arguments in a tuple
and with ```**kwargs``` the additional keyword arguments in a dictionary.

***

## Decorators

Decorators are a great way to modify the behaviour of a function or class
without changing the function or class itself.

In [None]:
def fahrenheit_to_celsius(deg_F):
    deg_C = (deg_F - 32) * 5/9
    return deg_C

In [None]:
fahrenheit_to_celsius(deg_F=80)

Let's define a decorator and "decorate" the previous function!

In [None]:
def just_int(func): 
    
    def wrapper(**kwargs):
        res = func(**kwargs)
        return int(res)
    
    return wrapper  

In [None]:
@just_int
def fahrenheit_to_celsius(deg_F):
    deg_C = (deg_F - 32) * 5/9
    return deg_C

In [None]:
fahrenheit_to_celsius(deg_F=80)

In [None]:
def just_int(print_out=None):
    
    def just_int_inner(func): 

        def wrapper(*args,**kwargs):

            if print_out:
                print(print_out)
                
            res = func(*args,**kwargs)
            return int(res)

        return wrapper  
    
    return just_int_inner

In [None]:
@just_int(print_out='Get rid of decimals!')
def fahrenheit_to_celsius(deg_F):
    deg_C = (deg_F - 32) * 5/9
    return deg_C

In [None]:
fahrenheit_to_celsius(deg_F=80)

***

## Numba

Numba is a just-in-time compiler for Python which can speed up your code substantially 
if it is based on loops, NumPy arrays and NumPy functions. Its main feature is the
```jit``` decorator. Find more on Numba in the [official documentation](https://numba.readthedocs.io/en/stable/user/5minguide.html).

Let's see what it does on a small example.

In [None]:
from numba import jit
import numpy as np

In [None]:
def calc_pi(total_num_points): 
    np.random.seed(1234)
    
    num_circle_points = 0
    
    for i in range(total_num_points):
        x = np.random.random()
        y = np.random.random()
        radius_squared = np.power(x,2) + np.power(y,2)
        
        if radius_squared < 1:
            num_circle_points += 1
           
    pi_estimate = 4 * num_circle_points / total_num_points
    return pi_estimate

In [None]:
@jit
def calc_pi_faster(total_num_points): 
    np.random.seed(1234)
    
    num_circle_points = 0
    
    for i in range(total_num_points):
        x = np.random.random()
        y = np.random.random()
        radius_squared = np.power(x,2) + np.power(y,2)
        
        if radius_squared < 1:
            num_circle_points += 1
           
    pi_estimate = 4 * num_circle_points / total_num_points
    return pi_estimate

The function ```calc_pi_faster``` is compiled to machine code when called for 
the first time.

In [None]:
total_num_points = 100000

print(f"calc_pi: {calc_pi(total_num_points)}")
print(f"calc_pi_faster: {calc_pi_faster(total_num_points)}")

In [None]:
%timeit calc_pi(total_num_points)

In [None]:
%timeit calc_pi_faster(total_num_points)

#### Note
that Jupyter (IPython) provides some built-in Python functionality through 
*magic commands* ```%command```.

***

## Exercise Section

Generators are very useful when loading datasets, for instance. Suppose your dataset is too large to load into memory (RAM)
(e.g. several tens of GB). Typically, a possible approach is to load parts of the dataset during training as needed. 

In this exercise, we implement a generator to load *batches* of images from the **MNIST dataset**. MNIST is a standard Deep Learning 
benchmark dataset comprising 70 000 images (usually 28 x 28 pixels) of hand-written digits with the corresponding label as the 
classification target. 

<center><img src="images/MNIST_example.png" alt="MNIST Example" width="500"/></center>

You can load the dataset (reduced to 5 000 samples) in the next cell.

In [None]:
import numpy as np

mnist_data = np.load('data/mnist_data_5k.npy', mmap_mode='r', allow_pickle=True)
mnist_targets = np.load('data/mnist_labels_5k.npy', mmap_mode='r', allow_pickle=True)

Note that ```mmap_mode='r'``` specifies that we do not load data into memory but instead a memory-mapped array is constructed.
This means, the data is read from the disk whenever accessed. Note that

```Python
    In[1]:  mnist_data = np.load('data/mnist_data_5k.npy', mmap_mode='r')
    In[2]:  type(mnist_data)
    Out[2]: numpy.memmap
```

provides a ```memmap``` object, while 

```Python
    In[1]:  mnist_data = np.load('data/mnist_data_5k.npy')
    In[2]:  type(mnist_data)
    Out[2]: numpy.ndarray
```

provides the standard NumPy array. These are two different types of objects. ```memmap``` lives still on the disk,
while the NumPy array is loaded into RAM. Note that in a ```memmap``` many but not all NumPy operations will work.

(1.) Define a generator which allows passing the data, targets, and the size of the batch in the following cell. 
For e.g. ```batch_size=10```, the generator shall slice both arrays into the first 10 samples and provide them with 
the ```next``` call. With every following call, the next 10 samples are provided. So you need to think about how to
access a slice of data in NumPy arrays and how consecutively get the next slice with the next call by adjustind the indices.

In [None]:
def batch_data(data, targets, batch_size):
    # fill in

If you were successful, you can create the generator with

In [None]:
mnist_gen = batch_data(data=mnist_data, targets=mnist_targets, batch_size=10)

and retrieve a (new) batch of data with

In [None]:
new_data, new_targets = next(mnist_gen)

print(f"Target labels in this batch: {new_targets}")