## Repetition, what is python, what are modules and why do you need to know that

Python is so widely used because it is so flexible. Pretty much for any task you have there is a specialized "Toolbox" that makes you work simplier. These "Toolboxes" are called Modules. You have used this already. 

When we import things we actually plug in new modules into Python. For data science one of the key libraries is numpy.

<div>
<img src="Data/KEMM30_001.jpg" width="300">
</div>

Most Modules/Libraries contain many of these these objects and we can choose to load one or all. They also contain functions that do things (which are also objects in a sense).



<div>
<img src="Data/KEMM30_003.jpg" width="200">
</div>

What are objects? Objects are "boxes" That contain things. E.g. Data or functions or instructions what and how you can use them. 

<div>
<img src="Data/KEMM30_002.jpg" width="150">
</div>

These functions are hidden in each object and can be accessed with the "dot"

<div>
<img src="Data/KEMM30_004.jpg" width="180">
</div>

Why so complicated? Try:

<div>
<img src="Data/KEMM30_005.jpg" width="400">
</div>

# Data Science

## Data types

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
types = [
    float(0.94545),
    int(3),
    str("hello world"),
    bool(True),
    1.0e4,
]  # this is a list of data types
for i in types:  # and now we loop through the contents of this list
    print(type(i), i, f'I am in a string and show: "{i}"')

## Collecting data, what have we learned?

* Lists = rectangular brackets [], think append and sort
* Tupels = round brackets (), think order matters, in and out of functions
* dictionaries = wavy brackets {}, think data with keywords = cookbook, filenames, Lookup tables

## Manipulating data of the same kind: vectors/matrixes/images

Classically we collect large amount of the same data. and we could store it in a list of lists. Let me show you with an image as if it would be constructed from a list of lists

In [None]:
def make_smiley_list():
    """Make a smiley face as a list of lists"""
    return [
        [0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 1, 0],
        [0, 0, 1, 0, 1, 0, 0],
        [0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0],
    ]


print(make_smiley_list())

or let me show you as an image. For this we need a module to plot things.

In [None]:
plt.imshow(make_smiley_list(), cmap="gray", vmin=0, vmax=1)
plt.axis("off");

if we now wanted to do somehting as simple as multiplying each entry with 255 to get a proper gray image we would need to loop over each entry, which you hopefully will agree is somewhat tedious.

In [None]:
smiley = make_smiley_list()  # make a list-of-lists smiley

for row in range(0, 9, 1):
    print(f"row = {row}")
    for col in range(0, 7, 1):
        print(f"column = {col}")
        if smiley[row][col] == 1:
            smiley[row][col] = smiley[row][col] * 255
        else:
            smiley[row][col] = 128

print(smiley)
plt.imshow(smiley, cmap="gray", vmin=0, vmax=255)
plt.axis("off");

# Numpy

entering Numpy, the defacto standard library for data manipulation. There are many additional tools for data-handling. But most use numpy as a standard. We can of course convert our list of lists into an array and then perform calculations on it.

Two ways to import numpy are common. either directly with **import numpy** after which you will need to write numpy.sin for the sinus function. Or using **import numpy as np** after which we can use np.sin, which is more common

In [None]:
smiley_list = make_smiley_list()
smiley = np.array(smiley_list)  # convert the list_of lists into an array

print(smiley)
print(type(smiley))

Now we can perform calculations on the array as a whole

In [None]:
smiley = smiley * 128  # each entry is multiplied by 122
smiley += 128  # simplified version to add to each element 128

plt.imshow(smiley, cmap="gray", vmin=0, vmax=255)
plt.axis("off")

In [None]:
def make_smiley():
    """Make smiley as a 2D numpy array"""
    face = np.zeros((9, 7))
    for a in [(2, 2), (2, 4), (4, 1), (4, -2), (5, 2), (5, -3), (6, 3)]:
        face[a] = 128
    face += 128
    return face

## Statistics

the object "array" contains a whole bunch of usefull operations that can be found with the "dot"
among them are many statistical tools. All of which can be used for either in all axis (dimensions) or for a single axis 

In [None]:
smiley = make_smiley()
print(f"arrayen min() is {smiley.min():.1f}")
print(f"arrayen max() is {smiley.max():.1f}")
print(f"arrayen mean() is {smiley.mean():.2f}")
print(f"the mean for each row is {smiley.mean(axis=0)}")

We will talk more about statistical calculations later

# Creation Data and doing standard calculations
one of the most important features of numpy is that one can perform calculations for one "variable" that is in reality a long vector with many numbers very efficiently. Very often  one starts with creating a vector with numbers and then performs calculations with it.

In [None]:
x = np.arange(0, 2, 0.00001)  # create an x-vector with fine steps
%time np.sin( 2 * np.pi * x) # now we calculate the sinus for each of these values. 

See the Wall time? this is the time the calculation of 200000 sinus functions took. So even large numbers can be very efficient and fast. As can be the preparation of plots.

In [None]:
y = np.sin(2 * np.pi * x)
plt.plot(x, y)

There are many functions that create types of data. As numpy is created to be efficient in as many dimensions as you like, you can also create data in many dimensions.

In this course we will restrain ourselfs to two dimensions, but all the code can also be used in 3 or more dimensions, if e.g. the movement of a particle in space and time is of interest. Some of the useful creating functions are:

* `np.zeros((3,3))`       # creates 3x3 array with zeroes
* `np.ones(3)`            # creates 3x0 array with ones 
* `np.arange(0,1,0.1))`   # creates evenly spaced array (maybe the most used function)
* `np.linspace(1,10,5))`  # creates array with a fixed number of evenly spaced entries (here 5)
* `np.logspace(1,3,5))`   # creates array with a fixed number of logaritmically spaced entries

## important additional error check

when working with arrays the maybe most common error will be problems with the shape and the dimension of an array. So an imporant test is to check both the shape and the dimensions.

In [None]:
print("the shape of x is", x.shape)
print("the shape of y is", y.shape)
# that this array has only one dimension (its a vector)

## Task

Create an x-vector from  -3 to 3 with 0.01 steps

then create the y values of a bell curve: $g(x)=\frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}((x-\mu)/\sigma)^2}$ 
with $\mu$=0.2 and $\sigma$=1

and plot it using **plt.plot(x,y)**

In [None]:
# arrays can be re-shaped as long as all values are used
x2 = x.reshape(100, 2000)  # rows, columns
print("after reshaping of x the shape of x2 is", x2.shape)

x2 = x.reshape(
    -1, 1
)  # the -1 stands for all the rest, the second means use one dimension, so one column
print("after reshaping of x the shape of x2 is", x2.shape)
# note that this array has now two dimensions and is a matrix

There are other ways to create different matrixes that we will not discuss further. The name usually will give you a hint what they do.
* `np.vstack`
* `np.hstack`
* `np.concatenate`
* `np.tile`

## Task

use one of these to stack x and y together into a single array and then use the function

**np.savetxt** write this matrix to a file, use 2 digits after the comma and a `;` between the numbers. Give the file a recognizable name.

# Data Selection -slices


In data science we often work with a selection of data. This means that we want to use one of two ways of selecting data
* Either we want to select which columns or rows of the data we are using - index based selection
* more often we want to select the portion of data that is "good" so the one where the values are above or below a certain threshold. For this use boolean based indexing. 
 
## index based selection

Synthax 1-dim

* `a[i]` for single cell
* `a[i:j]` for range
* `a[i:]` from i to end
* `a[:i]` from beginning to (excl.) i
* `a[:-1]` from beginnnig until (excl.) last

### Examples

In [None]:
x = range(15)
print(f"x = range(15) is {list(x)}")
print(f"x[2]   is {x[2]}")
print(f"x[:]   is {x[:]}")
print(f"x[:2]  is {x[:2]}")
print(f"x[2:]  is {x[2:]}")
print(f"x[2:6] is {x[2:6]}")
print(f"x[:-1] is {x[:-1]}")
print(f"x[:-2] is {x[:-2]}")

* Syntax 2-dim: `a[i:j,k:h]`
* Syntax 3-dim: `a[i:j,k:h,l:m]`

### Examples

In [None]:
y = np.reshape(np.arange(20), [4, 5])
print(f"y[:,:] is:\n{y[:,:]}\n")
print(f"y[2,:] is:\n{y[2,:]}\n")
print(f"y[:,2] is:\n{y[:,2]}\n")
print(f"y[2,2] is:\n{y[2,2]}\n")
print(f"y[:1,:] is:\n{y[:1,:]}\n")
print(f"y[:-1,:] is:\n{y[:-1,:]}\n")
print(f"y[:,1:] is:\n{y[:,1:]}\n")
print(f"y[:,:2] is:\n{y[:,:2]}\n")
print(f"y[:,:-2] is:\n{y[:,:-2]}\n")
print(f"y[1:3,2:4] is:\n{y[1:3,2:4]}")
print(f"y[::2,:] is \n{y[::2,:]}")

### Task: 
create new array for each case and use a single! index based slices to create the following images: 

In [None]:
smiley = make_smiley()
plt.imshow(smiley[:, :], cmap="gray", vmin=0, vmax=255)
plt.axis("off")

<div>
<img src="http://www.jensuhlig.de/Kemm30/kemm30_day3_task1_1.png" width="100">
</div>

<div>
<img src="http://www.jensuhlig.de/Kemm30/kemm30_day3_task1_2.png" width="100">
</div>

<div>
<img src="http://www.jensuhlig.de/Kemm30/kemm30_day3_task1_3.png" width="100">
</div>

<div>
<img src="http://www.jensuhlig.de/Kemm30/kemm30_day3_task1_4.png" width="100">
</div>

think every second
<div>
<img src="Data/kemm30_day3_task1_5.png" width="100">
</div>

## Value based slicing

this is often the more useful of the two The basic idea is that if you create a logical conditions that results in a true or false condition you can select only the pixels that fullfill the condition

In [None]:
smiley = make_smiley()
smiley > 130

In [None]:
fig, ax = plt.subplots(1, 2)  # we learn that next

smiley = make_smiley()
ax[0].imshow(smiley, cmap="gray", vmin=0, vmax=255)
ax[0].set_title("before change")

# Now we select these specific values and change them
smiley[smiley > 130] = 0

ax[1].imshow(smiley, cmap="gray", vmin=0, vmax=255)
ax[1].set_title("after change")

use two operations to create this for this:
<div>
<img src="Data/kemm30_day3_task1_6.png" width="100">
</div>

Three operations for this
<div>
<img src="Data/kemm30_day3_task1_7.png" width="100">
</div>

Lets do a real task to finish this part. 

* We download the datafolder
* We load a real image from disk using a special image library,  here we use PIL
* We use PIL to convert to grayscale and make a simple array out of the data. <br> The latter step makes it easier to handle, but we could perform all of the steps on an image too.


In [None]:
!git clone https://github.com/luchem/Kemm30.git --depth=1

In [None]:
from PIL import Image  # tool to open and convert images
import os  # tool to handle file operations

path_to_files = os.sep.join([os.getcwd(), "Kemm30", "lectures", "Data"])
image_open = Image.open(path_to_files + os.sep + "2D_measured.png")
arrayen = np.asarray(image_open.convert("L"))

print(arrayen.shape)
print(type(arrayen))

plt.imshow(arrayen, cmap="gray")
plt.axis("off")

Many analysis techniques collect grey scale images. But an image is also nothing else as the representation of two independently conducted measurements. A nice example is two-dimensional liquid chromatography as often used in Analytic chemistry. This image (from Sun, Mingzhe , PhD Thesis LU2019) illustrates this method. 
<div>
<img src="Data/2dLC.png" width="700">
</div>
After each run a number of substances are not well separated. So can e.g. method 1 (x-axis) not separate the black, yellow and green signatures, while the second method (axis 2) can not separate blue and yellow peak. In this sketch the two methods are very nicely "othogonal" meaning that e.g. the green, yellow and black features are perfectly aligned, in general this is of course not assumed. Plotting the two methods in a 2d image as shown here allows a visible inspection of  the two analysis runs and one can "see" the different compounds. But how do we quantitatively extract e.g. what fraction of each compound is in the mixture we need to extract the information from the image.

<div>
<img src="http://www.jensuhlig.de/Kemm30/2D_measured_indicated.png" width="200">
</div>

## Task

* plot the isolated intensity profile of these features.
* maximize and invert the contrast of these features by 
    * substracting the minimum
    * then dividing by the maximum
    * then invert the intensity by caluclating 1-arrayen 
* rotate the image by 90 degree, here you can use the transpose function
* take the average in the right direction using the "mean" function and plot it with plt.plot(). This should show two clear peaks.
* While you would normally fit a shape to the measured signatures a good approximation of the ratio of the peaks is the sum under the features (use the sum function on a slice of the image).
* compare against the integration of the peak gained from np.trapz over the same slices

Once we have learned plotting we will look more on the statistical tools in numpy.

# Advanced

### Advanced slicing

This one you can create in many different ways:
Using np.tile, using np.concatenate or np.hstack, using ravel or some clever looping. 
<div>
<img src="http://www.jensuhlig.de/Kemm30/kemm30_day3_task1_9.png" width="100">
</div>

### Image work:

As you might have guess, images are nothing but matrices of grey values

In [None]:
from scipy import misc  # a library with useful stuff
import matplotlib.pyplot as plt  # the plotting library

plt.show()  # this is needed once to make the plots show
face = misc.face(gray=True)  # here we import images
plt.imshow(face, cmap="gray", vmin=0, vmax=255)
plt.axis("off");

and we can manipulate them with simple matrix operations:


In [None]:
def contrast(image, gamma=1, lightness=0):
    """Change contract of image using matrix operations"""
    image = image.copy()
    image = image * gamma
    image = image + lightness
    image.clip(min=0, max=255, out=image)
    return np.array(image, dtype="int")

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(
    1, 3, figsize=(16, 4)
)  # since we want to compare multiple images we use the subplots to create multiple axis within a single image and them plot into each axis one of the images
ax1.imshow(
    contrast(face, lightness=0, gamma=1), cmap="gray", vmin=0, vmax=255
)  # use the contrast function to calculate something ont he image
ax2.imshow(contrast(face, lightness=-30, gamma=1), cmap="gray", vmin=0, vmax=255)
ax3.imshow(contrast(face, lightness=0, gamma=1.3), cmap="gray", vmin=0, vmax=255)
ax1.axis("off")
ax2.axis("off"), ax3.axis("off");

a usefull things for any image work (or a lot of other process like spectroscopy ;o) are histograms
In Histograms the contents of an matrix/vector or something are classified and put in a bin. So we say we have xxx many pixel with yyy intensity. The function np.histograms returns the bin edges (beginning end of each bin) with (bins[:-1]+bins[1:])/2 we calculate the center of each bin.

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 4))
intensity, bins = np.histogram(
    contrast(face, lightness=0, gamma=1).ravel(), bins=range(0, 255)
)
ax1.plot((bins[:-1] + bins[1:]) / 2, intensity, "*")
intensity, bins = np.histogram(
    contrast(face, lightness=-100, gamma=1).ravel(), bins=range(0, 255)
)
ax2.plot((bins[:-1] + bins[1:]) / 2, intensity, "*")
ax2.set_ylim(0, 6000)
intensity, bins = np.histogram(
    contrast(face, lightness=0, gamma=1.3).ravel(), bins=range(0, 255)
)
ax3.plot((bins[:-1] + bins[1:]) / 2, intensity, "*")

Cool thing all image manipulations you find in tools like Photoshop are nothing else but fancy matrix operations.
Here we use convolve to convolute a smaller matrix with a bigger.

In [None]:
coins = misc.ascent()
from scipy import ndimage

edge_detection = np.ones((3, 3)) * -1
edge_detection[1, 1] = 8
sharpen = np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]])
gauss_blur = np.array(
    [
        [1, 4, 6, 4, 1],
        [4, 16, 24, 16, 4],
        [6, 24, 36, 24, 6],
        [4, 16, 24, 16, 4],
        [1, 4, 6, 4, 1],
    ]
) * (1 / 256.0)

coins1 = ndimage.convolve(coins, edge_detection, mode="constant", cval=0)
coins2 = ndimage.convolve(coins, sharpen, mode="constant", cval=0)
coins3 = ndimage.convolve(coins, gauss_blur, mode="constant", cval=0)

fig, ax = plt.subplots(1, 4, figsize=(16, 8))
ax[0].imshow(coins1, cmap="gray", vmin=0, vmax=255)
ax[0].set_title("edge_detection")
ax[1].imshow(coins2, cmap="gray", vmin=0, vmax=255)
ax[1].set_title("sharpen")
ax[2].imshow(coins, cmap="gray", vmin=0, vmax=255)
ax[2].set_title("original")
ax[3].imshow(coins3, cmap="gray", vmin=0, vmax=255)
ax[3].set_title("blurr")
[ax[i].axis("off") for i in range(4)]