--------------------------------------------------
# NumPy and Matplotlib
---------------------------------------------------

*NumPy* is a Python library for multi-dimensional arrays and matrices as well as mathematical functions for arrays. 

*Matplotlib* is a plotting library for Python and *NumPy*. *Matplotlib* provides a *MATLAB*-like interface for plotting in Python.

For more information check out these links for Youtube lectures and other tutorial notebooks:
<br>
*NumPy*: 
<br>
Material: https://github.com/gertingold/euroscipy-numpy-tutorial/releases/tag/v2017
<br>
Youtube lecture:
<br>
https://www.youtube.com/watch?v=R2rCYf3pv-M&t=129s (1/2)
<br>
https://www.youtube.com/watch?v=sunNXIxIGV8 (2/2)

*Matplotlib*:
<br>
Youtube lecture:
https://www.youtube.com/watch?v=YrnHdgZ8n1U

## Lists as vectors or matrices

Simple slicing

In [None]:
mylist = list(range(10))
print(mylist)

Use slicing to produce the following outputs:

[2, 3, 4, 5]

[0, 1, 2, 3, 4]

[6, 7, 8, 9]

[0, 2, 4, 6, 8]

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

[7, 5, 3]

## Matrices and lists of lists

In [None]:
matrix = [[0, 1, 2],
          [3, 4, 5],
          [6, 7, 8]]

In [None]:
matrix

Get the second row by slicing twice

Try to get the second column by slicing. Do not use a list comprehension!

List of list don't work as matrices
<br>
You can't easily extract columns...

# 1. Getting started with NumPy

Import the NumPy package

In [None]:
import numpy as np

## Create an array

In [None]:
np.lookfor('create array')

In [None]:
help(np.array)

The variable `matrix` contains a list of lists. Turn it into an `ndarray` and assign it to the variable `myarray`. Verify that its type is correct.

In [None]:
myarray = np.array(matrix)
myarray

In [None]:
type(myarray)

For practicing purposes, arrays can conveniently be created with the `arange` method.

In [None]:
myarray1 = np.arange(6)
myarray1

## Data types

Use `np.array()` to create arrays containing
 * floats
 * complex numbers
 * booleans
 * strings
 
and check the `dtype` attribute.

In [None]:
np.array([[1.0, 2.0], [3.0, 4.0]]).dtype

In [None]:
np.array([[1+2j, 3+4j], [3-4j, 1-2j]]).dtype

In [None]:
np.array([True, False]).dtype

In [None]:
np.array(['Python', 'EuroSciPy', 'Erlangen']).dtype

Do you understand what is happening in the following statement?

In [None]:
np.arange(1, 160, 10, dtype=np.int8)

## Some array creation routines

### numerical ranges

arange(*start*, *stop*, *step*), *stop* is not included in the array

In [None]:
np.arange(5, 30, 5)

arange resembles range, but also works for floats

Create the array [1, 1.1, 1.2, 1.3, 1.4, 1.5]

linspace(*start*, *stop*, *num*) determines the step to produce *num* equally spaced values, *stop* is included

Create the array [1., 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.]

For equally spaced values on a logarithmic scale, use logspace.

In [None]:
np.logspace(-2, 2, 5)

In [None]:
np.logspace(0, 4, 9, base=2)

### Homogeneous data

In [None]:
np.zeros((4, 4))

Create a 4x4 array with integer zeros

Create a 5x2 array with  ones

Create a 3x3 array filled with tens

### Diagonal elements

In [None]:
np.diag([1, 2, 3, 4])

diag has an optional argument k. Try to find out what its effect is.

Replace the 1d array by a 2d array. What does diag do?

In [None]:
np.diag(np.arange(4).reshape(2, 2))

In [None]:
np.info(np.eye)

Create the 3x3 array

```[[2, 1, 0],
 [1, 2, 1],
 [0, 1, 2]]
```

## Indexing and slicing

### 1d arrays

In [None]:
a = np.arange(10)

Create the array [7, 8, 9]

Create the array [2, 4, 6, 8]

Create the array [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

### Higher dimensions

In [None]:
a = np.arange(40).reshape(5, 8)
a

Create the 3x3 array out of a

```[[21, 22, 23],
 [29, 30, 31],
 [37, 38, 39]]
```

Get the 4th column of the matrix a.

Create the array [11, 12, 13] from a

Get the matrix 

```[[8, 11, 14],
 [24, 27, 30]]
```

I.e. figure out how you can skip certain elements or use a specific step to slice through the matrix a

## Fancy indexing ‒ Boolean mask

In [None]:
a = np.arange(40).reshape(5, 8)
a

In [None]:
a %3 == 0

Get the values of a that are TRUE in a %3==0

Creating an array with specific values from a

In [None]:
a[(1, 1, 2, 2, 3, 3), (3, 4, 2, 5, 3, 4)]

## Application: sieve of Eratosthenes

Read about the Sieve of Eratosthenes to generate an algorithm that yield all prime numbers until a arbitrary number nmax (e.g. 50). (Bonus: Can you implement it without a for-loop?)


https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

## Axes

Create an array and calculate the sum over all elements

In [None]:
a = np.arange(9).reshape(3, 3)
a

Now calculate the sum along axis 0 ...

In [None]:
np.sum(a, axis=0)

and now along axis 1

In [None]:
np.sum(a, axis=1)

Identify the axis in the following array

In [None]:
a = np.arange(24).reshape(2, 3, 4)
a

In [None]:
np.sum(a, axis=0)

In [None]:
np.sum(a, axis=1)

In [None]:
np.sum(a, axis=2)

## Axes in more than two dimensions

Create a three-dimensional array

In [None]:
a = np.arange(24).reshape(2, 3, 4)

Produce a two-dimensional array by cutting along axis 0 ...

and axis 1 ...

and axis 2

What do you get by simply using the index `[0]`?

What do you get by using `[..., 0]`?

## Exploring numerical operations

In [None]:
a = np.arange(4)
b = np.arange(4, 8)
a, b

In [None]:
a+b

In [None]:
a*b

Operations are elementwise. Check this by multiplying two 2d array...

In [None]:
a = np.arange(4).reshape(2, 2)
b = np.arange(4, 8).reshape(2, 2)
print('a=',a)
print('b=', b)

In [None]:
a*b

This is the elementwise multiplication.
<br>
Try to write a function (dotprod) that computes the scalar product of two 1D-vectors.
<br>
Test your code by comparing to np.dot

Real matrix multiplication

In [None]:
a = np.arange(4).reshape(2, 2)
b = np.arange(4, 8).reshape(2, 2)

In [None]:
np.dot(a,b)

In [None]:
a.dot(b)

In [None]:
a @ b

## Broadcasting

In [None]:
a = np.arange(12).reshape(3, 4)
a

In [None]:
a+1

In [None]:
a+np.arange(4)

In [None]:
a+np.arange(3)

In [None]:
np.arange(3)

In [None]:
np.arange(3).reshape(3, 1)

In [None]:
a+np.arange(3).reshape(3, 1)

Create a multiplication table for the numbers from 1 to 10 starting from two appropriately chosen 1d arrays.

As an alternative to `reshape` one can add additional axes with `newaxes`:

In [None]:
a = np.arange(5)
b = a[:, np.newaxis]

Check the shapes.

In [None]:
a.shape, b.shape

## Linear Algebra in NumPy

In [None]:
import numpy.linalg as LA

In [None]:
a=np.diag((2,10,-3))
a

In [None]:
eigenvalues, eigenvectors = LA.eig(a)
print(eigenvalues)
print(eigenvectors)

A bit more complex...

In [None]:
a = np.arange(4).reshape(2, 2)
a

In [None]:
eigenvalues, eigenvectors = LA.eig(a)

Explore whether the eigenvectors are the rows or the columns.

In [None]:
a @ eigenvectors[:, 0]

In [None]:
eigenvalues[0]*eigenvectors[:, 0]

Validate if everything worked out. Recalculate the matrix a from the eigenvalues and eigenvectors. You will need to invert a matrix (LA.inv). 

## 2.  Plotting Data with Matplotlib 

In [None]:
import matplotlib.pyplot as plt

In [None]:
%matplotlib inline
# The above line is a 'magic' line for the Jupyter notebook which allows plots to be placed inside of the notebook.

In [None]:
x = np.linspace(0, 10, 100)
y = np.cos(x)

In [None]:
plt.plot(x, y)

In [None]:
x=np.arange(2,11)

In [None]:
y=x**2

In [None]:
plt.plot(x,y)
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('My Nice Plot $\pi$') # supports mathmode from latex
plt.grid(True)
plt.savefig('myplot1.png', dpi=300)

**How to use matplotlib in the terminal:**
<br>
`import matplotlib`
<br>
`matplotlib.use('TkAgg')`
<br>
`import matplotlib.pyplot as plt`
<br>
<br>
`x=np.arange(2,11)`
<br>
`y=2x+1`
<br>
<br>
`plt.plot(x,y)`
<br>
`plt.xlabel('X Axis')`
<br>
`plt.ylabel('Y Axis')`
<br>
`plt.title('My Nice Plot $\pi$')`
<br>
`plt.grid(True)`
<br>
<br>
`plt.show(plt.plot(x,y))`

In [None]:
plt.plot(x,y, label='measured')
plt.plot(x, y+20, label='calculated')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('My Nice Plot $\pi$')
plt.legend()

In [None]:
plt.plot(x,y, 'r--o', label='measured')
plt.plot(x, y+20, label='calculated', color=(0.5, 0.5, 1))
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('My Nice Plot $\pi$')
plt.legend()

In [None]:
lines1 =plt.plot(x,y, 'r--o', label='measured')
plt.plot(x, y+20, label='calculated', color=(0.5, 0.5, 1))
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('My Nice Plot $\pi$')
plt.legend()
ax = plt.gca()

In [None]:
lines1 =plt.plot(x,y, 'r--o', label='measured') #makes it red
plt.plot(x, y+20, label='calculated', color=(0.5, 0.5, 1))
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('My Nice Plot $\pi$')
p=lines1[0]
p.set_color('darkgreen') #makes it green
plt.setp(p, marker='+')
plt.legend() #if you put it before set_color, the old color (here red) will be used.

### ... and also

In [None]:
x2 = np.linspace(0, 2 * np.pi, 500)
y1 = np.sin(x2)
y2 = np.sin(3 * x2)

fig, ax = plt.subplots()
ax.fill(x2, y1, 'b', x2, y2, 'r', alpha=0.3) # both curves are treated in the same way
plt.show()

In [None]:
plt.plot(x,y)
plt.text(5,60, 'hello') #writes in the plot, is like another datapoint
plt.figtext(0.2,0.6, 'figure text') #writes in the Figure, i.e the graphic itself, not bound to the datapoint
plt.figtext(1.1,0.6, 'figure text outside')

In [None]:
plt.plot(x,y)
plt.text(5,60, 'hello $\delta$') 
plt.figtext(0.2,0.6, 'figure text')
ax=plt.gca()
ax.annotate('Important', xy=(6,35), xytext=(7,20), arrowprops={'facecolor': 'r'})

## Subplots

In [None]:
sub1=plt.subplot(2,3,1) ### as long as you have ony 9 plots, you don't need the commas
sub3=plt.subplot(2,3,3)
sub6=plt.subplot(2,3,6)
sub1.plot(x,y)
plt.tight_layout() ### prevents overlap between neighboring subplots

In [None]:
rows=2
cols=3
n=0
for row in range(1,rows+1):
    for col in range(1, cols+1):
        n+=1
        sub = plt.subplot(rows, cols, n)
        sub.plot(x,y)
plt.tight_layout()

In [None]:
rows=2
cols=3
n=0
def plot_many(rows, cols, plot='plot'):
    n=0
    for row in range(1,rows+1):
        for col in range(1, cols+1):
            n+=1
            sub = plt.subplot(rows, cols, n)
            getattr(sub,plot)(x,y*n)
    plt.tight_layout()
plot_many(rows=5, cols=3, plot='semilogy')

### Random numbers

In [None]:
np.random.rand(5, 2)

In [None]:
np.random.seed(1234)
np.random.rand(5, 2)

In [None]:
mu, sigma = 0, 1
data=np.random.normal(mu,sigma,100)
plt.hist(data)

In [None]:
sub1=plt.subplot(1,4,1) 
sub2=plt.subplot(1,4,2)
sub3=plt.subplot(1,4,3)
sub4=plt.subplot(1,4,4)
sub1.hist(data, edgecolor='black', bins=2)
sub2.hist(data, edgecolor='black', bins=5)
sub3.hist(data, edgecolor='black', bins=15)
sub4.hist(data, normed=True, edgecolor='black', color='r', bins=5) #what is the difference here?
plt.tight_layout()

Comparing random numbers to density

In [None]:
mu, sigma = 0, 1
data=np.random.normal(mu,sigma,50)
y, x, others = plt.hist(data, 30, normed=True,edgecolor='black')
plt.plot(x, 1/(sigma*np.sqrt(2*np.pi))*np.exp(-(x - mu)**2 / (2 * sigma**2)), color='r')

Why does the actual density function not compare well to the random data set?
<br>
How can you improve the comparison?

Try to figure out what the next couple of lines do...

In [None]:
data=np.random.randint(1, 7, (100, 3)) 
data[0:3]

In [None]:
casts = np.random.randint(1, 7, (100, 3))
plt.hist(casts, np.linspace(.5, 6.5, 7)) 

### Scatterplot

In [None]:
data1=np.random.uniform(1,10, 100)
data2=np.random.uniform(300, 200, 100)
plt.scatter(data1, data2)

In [None]:
data = np.random.rand(20, 20)
plt.imshow(data, cmap=plt.cm.hot, interpolation='none')
plt.colorbar()

## Example 1: Predicting temperature with crickets

Read the data in crickets.cvs (np.loadtxt(), look up the options. What do you need to take care of?) 
<br>
In the data set, the first column contains the number of chirsp per second, the second column gives the temperature in °F

Seperate the data into the vectors x and y for chirps/second and temperature, respectively. 

Create the vector yc containig the temperature as °C.

Make a scatterplot of x against yc

Do a linear regression on the cricket data.
Google linear regression in python. (Hint: scipy)
<br>
Define a model function based on the regression model
<br> 
How good is the linear model?
<br>
Plot the model into the scatterplot. Add a legend and the correlation coefficient.

In [None]:
from scipy import stats

How warm is it if you hear 25 chirps/second?
<br>
How many times per second does a cricket chirp at freezing temperautres? 
<br>
At what temperature do the crickets stop chirping?
<br> 
What is a clear limit of this model?


## Example 2: Wines

In wine-data.xlsx you can find information for three types of wines (in total 178 samples). For each wine, 13 characteristics have been determined:
<br>
1) Alcohol
<br>
2) Malic acid
<br>
3) Ash
<br>
4) Alcalinity of ash
<br>
5) Magnesium
<br>
6) Total phenols
<br>
7) Flavanoids
<br>
8) Nonflavanoid phenols
<br>
9) Proanthocyanins
<br>
10) Color intensity
<br>
11) Hue
<br>
12) OD280/OD315 of diluted wines
<br>
13) Proline 


Look at wine.csv in vi.
<br>
What do you need to take care of while reading in the data?

How many samples were taken for each wine type?

What's the average color intensity of the complete wine dataset?

How many wines have more than the average malic acid of the data set?

What is the median of malic acid? How many wines have more malic acid as the median (think first, and try to calculate afterwards)?

What is the average percentage of alcohol in wines of type 2? How large is the standard deviation?

Make a histogram of the magnesium content of the wines of type 1.
<br>
plt.hist()

Make a histogram to look at the distribution of the alcohol content comparing all three wine types. Add a legend to the plot.
<br>
Try the alpha and color options for plt.hist.
<br>
Use plt.legend()

What is the amount of magnesium of the wine with type 1 that has the highest alcohol content of group 1?

Is there a linear relationship between Color intensity and the amount of phenols? 

Nope.

How about Flavanoids and phenols? If so, determine the correlation for each wine type.

## Example 3: Time Series  Analysis

Compute the **Moving average**:


Given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series. Then the subset is modified by "shifting forward"; that is, excluding the first number of the series and including the next value in the subset.


Use the Electricity-Australia.csv data set showing the annual electricity sales to residential customers in south Australia.

Make a linear plot of the data 

Compute the moving average and plot the results along with the original data.

## Curve fitting

In [None]:
## example data
x_data=np.linspace(-5,5,num=50)
y_data=3*np.sin(1.5*x_data)+np.random.normal(size=50)
plt.scatter(x_data, y_data)

In [None]:
## fitting model function
def test_func(x,a,b):
    return a*np.sin(b*x)

Easy fiting routines can be found in the scipy package. Read about the scipy optimize taks. 

In [None]:
## fitting with scipy optimize

from scipy import optimize

params, params_cov = optimize.curve_fit(test_func, x_data, y_data, p0=[2,2])

print(params)

In [None]:
plt.scatter(x_data, y_data)
plt.plot(x_data, test_func(x_data, params[0], params[1]), label='Fitting function', color='red')
plt.legend()

## Polynomials

https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.polynomials.polynomial.html

In [None]:
from numpy.polynomial import polynomial as P

Powers increase from left to right (index corresponds to power)

In [None]:
p1 = P.Polynomial([1, 2])

In [None]:
p1.degree()

In [None]:
p1.roots()

In [None]:
p4 = P.Polynomial([24, -50, 35, -10, 1])

In [None]:
p4.degree()

In [None]:
p4.roots()

In [None]:
p4.deriv()

In [None]:
p4.integ()

In [None]:
P.polydiv(p4.coef, p1.coef)

## Polynomial fit

In [None]:
x = np.linspace(0, np.pi, 100)
y = np.sin(x)+0.2*np.random.rand(100)
plt.plot(x, y, 'o')
fit = P.Polynomial(P.polyfit(x, y, 2))
plt.plot(x, fit(x))

## Image manipulation

In [None]:
from scipy import misc
face = misc.face(gray=True)
face

In [None]:
plt.imshow(face, cmap=plt.cm.gray)

Modify this image, e.g. convert it to a black and white image, put a black frame, change the contrast, ...

In [None]:
plt.imshow(face > 128, cmap=plt.cm.gray)

In [None]:
framedface = np.zeros_like(face)
framedface[30:-31, 30:-31] = face[30:-31, 30:-31]
plt.imshow(framedface, cmap=plt.cm.gray)

In [None]:
plt.imshow(255*(face/255)**1.5, cmap=plt.cm.gray)

In [None]:
sy, sx = face.shape
y, x = np.ogrid[:sy, :sx]
centerx, centery = 660, 300
mask = ((y - centery)**2 + (x - centerx)**2) > 230**2
face[mask] = 0
plt.imshow(face, cmap=plt.cm.gray)

### Seaborn

Another complimentary package that is based on this data visualization library is [**Seaborn**](https://seaborn.pydata.org/), which provides a high-level interface to draw statistical graphics. 
Here, we provide only a couple of examples. For a extensive documentation, examples and tutorials we invite you to refere to Seaborn [website](https://seaborn.pydata.org/).

In [None]:
#Import the necessary library
import seaborn as sns

In [None]:
#load a built-in seaborn dataset contained in a Pandas Dataframe (you will learn more about it in the next notebook)
iris = sns.load_dataset('iris')

The iris dataset  is perhaps the best known database to be found in the pattern recognition literature. It contains 
contains 3 classes of 50 instances each, where each class refers to a type of iris plant (iris setosa, iris Versicolor, and iris virginica). Each iris plant is characterized by its sepal lenght, sepal width, petal length, and petal width.

In [None]:
#you can have a look at the Dataset:
iris

In [None]:
#construct a Violin plot
sns.violinplot(x="species",y="petal_length", data =iris)

### another example

To plot multiple pairwise bivariate distributions in a dataset, you can use the pairplot() function. This creates a matrix of axes and shows the relationship for each pair of columns in a **DataFrame**. 
By default, it also draws the univariate distribution of each variable on the diagonal Axes:

In [None]:
sns.pairplot(iris)