WISO100303 / Johannes Schmidt & Peter Regner

# **An introduction to scientific programming**

<br> <br> <br> <br><br> <br> <br> <br>

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Numpy

<!-- does not work in Datalore
<img src="numpy-logo.png" style="width: 50%; margin: auto;">
-->
![Numpy logo](images/numpy-logo.png)

# The ndarray data type

In [None]:
import numpy as np

In [None]:
np.array([1., 2., 5.])

**Warning:** confusingly `np.array([1., 2., 5.])` is used to create an array, but the type of the resulting object is `np.ndarray`!

Do not use `np.ndarray()` to create a new `ndarray` object, use `np.array()`!

In [None]:
type(np.array([1., 2., 5.]))

In [None]:
arbitrary_values = np.array([1., 2., 5.])

# Elementwise operations and broadcasting

In [None]:
np.array([1, 2, 3]) + 10

In [None]:
np.array([1, 2, 3]) * 10

In [None]:
np.arange(5)

In [None]:
np.arange(5)**2

In [None]:
np.cos(np.arange(5) * 2 * np.pi)

# The Python Scientific Ecosystem

<!-- <br>
<div style="text-align:center">
<img alt="Python scientific ecosystem" src="python-scientific-ecosystem.png" style="height:650px;">
</div> -->

![](images/python-scientific-ecosystem.png)

<small>Source: https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=8</small>

# Why numpy? Performance!

One reason for using Numpy instead of pure Python is performance. 

<img src="images/benchmarks.svg" style="width:750px">



<small>Note that this comparison is not very fair. The Python tests are mostly done with pure Python. One can easily speed-up tests by using numpy or other libraries. Adding ``@numba.jit`` to the Fibonacci test causes a speed up of factor 50 (tested with $n=30$).</small>

<small>Source: https://julialang.org/benchmarks/</small>

Also: vector and matrix operations are way simpler to write, many numerical algorithms implemented

# How fast is Numpy in comparison to pure Python?

No numpy used:

In [None]:
numbers = list(range(10**6))

In [None]:
#%timeit sum(numbers)

Numpy used:

In [None]:
numbers_np = np.array(numbers)

In [None]:
#%timeit numbers_np.sum()

A factor 10 speedup comes usually for free with good simple code, but sometimes a speedup factor of 500 is also not uncommon.

## Exercise 1 - Find the error

A user wants to add 0.5 to 4 values. There is a mistake in the code - find and fix it!

Why does the code not work as written here? Check the docstring of `np.array` to find an answer!

In [None]:
# np.array(1, 2, 3, 42) + 0.5

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

<div style="color:#555;border-top:1px solid #999;text-align:right;padding:4px;">End of exercise</div>

## Exercise 2 - Plot sine and cosine

Plot $\sin(x)$ and $\cos(4x)$ from -2 to 4.5. Use the function `np.linspace(start, stop)` to generate values for `x`.

The plot should contain two lines: one should be the graph of $\sin(x)$ and one the graph of $\cos(4x)$.

Note: this is very similar to exercise 4 in lecture03.

*Hint:* Use the functions `np.sin(x)` and `np.cos(x)`.

<br>

**Bonus task:**

Plot a line defined by points with coordinates $\cos(x)$ on the x axis and $\sin(x)$ on the y axis for x in the range $[0, 2\pi]$. Can you guess the result before actually plotting it?

Note: The plot in the bonus task is not a function $f(x)$ in the mathematical sense, where each value x is mapped to some value $f(x)$.

You can use this additional matplotlib call to make it look nicer: `plt.gca().set_aspect('equal')`

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

<div style="color:#555;border-top:1px solid #999;text-align:right;padding:4px;">End of exercise</div>

## Exercise 3 - Pythagorean identity

Write a function `pythagorean_identity(x)` which calculates $\cos(x)^2 + \sin(x)^2$. Call the function with a numpy array with a range of values from 0 to 7.

Do you know the result before calling the function?

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

<div style="color:#555;border-top:1px solid #999;text-align:right;padding:4px;">End of exercise</div>

# Accessing elements in Numpy arrays

In [None]:
arbitrary_values

In [None]:
arbitrary_values[2]

In [None]:
arbitrary_values[0:2]

Note that start/stop before the `:` can be omitted, then the slice runs from the start (i.e. index `0`) or from the end (index `-1`):

In [None]:
arbitrary_values[:2]

In [None]:
arbitrary_values[2:]

In [None]:
arbitrary_values[:]

# Setting elements in Numpy arrays

In [None]:
arbitrary_values

In [None]:
arbitrary_values[2] = 3

In [None]:
arbitrary_values

In [None]:
arbitrary_values[0:2] = 42

In [None]:
arbitrary_values

# Dimensions and shape

Numpy arrays can be multidimensional. They are created by passing a list of lists to `np.array()`:

In [None]:
unity_matrix = np.array([[1., 0., 0.],
                         [0., 1., 0.],
                         [0., 0., 1.]])

Lists are always one-dimensional, but 2D objects can be created by putting lists in lists:

![](images/list-of-lists.svg)

A numpy array can be 2D itself, without nesting objects.

In [None]:
unity_matrix

In [None]:
unity_matrix.shape

Note that `.shape` is an _attribute_ of a numpy array. It uses a very similar syntax to a method, but there are no parenthesis!

In [None]:
unity_matrix[2, 2]

In [None]:
unity_matrix[2, :]

Note that `len()` probably does not do what you want for multidimensional arrays:

In [None]:
len(unity_matrix)

In [None]:
arbitrary_values.shape

In [None]:
another_2d_array = np.array([[1, 1, 1, 1],
                             [2, 2, 2, 2]])

In [None]:
another_2d_array.shape

In Numpy the dimensions are called _axis_ and are numberd starting from 0. If you want to sum along a certain axis, you can do so by using the `sum` method and the `axis` parameter:

In [None]:
another_2d_array.sum(axis=0)

In [None]:
another_2d_array.sum(axis=1)

In [None]:
another_2d_array[0, :]

In [None]:
another_2d_array[0, :].shape

## Exercise 4 - Create a matrix with ones surrounded by zeros 

Create a matrix `m` with shape `(4, 4)` by using `np.zeros([4, 4])` and set the inner 4 elements to `1` by using slicing in axis 0 and axis 1. Do not use more than two assignment statements in total for this exercise!

Note: The _inner_ and _outer_ elements of a matrix are no widely used technical terms. With _outer elements_, we simply mean here the first/last row and column. _Inner elements_ are all other matrix values. So in case of a 4x4 matrix, there are 4 inner elements and 12 outer elements.

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

<div style="color:#555;border-top:1px solid #999;text-align:right;padding:4px;">End of exercise</div>

## Exercise 5 - Plot first row and first column

Plot the first row and the first column of the matrix `some_matrix` given below. Can you imagine, how the plot should look like, before plotting?

In [None]:
some_matrix = np.array([[1., 0., 1., 0., 1.],
                        [2., 0., 2., 0., 2.],
                        [3., 0., 3., 0., 3.]])

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

<div style="color:#555;border-top:1px solid #999;text-align:right;padding:4px;">End of exercise</div>

## Exercise 6 - Sum over rows and columns

Calculate the sum of rows and then the sum of columns of `some_matrix`.

In [None]:
some_matrix = np.array([[1., 0., 1., 0., 1.],
                        [2., 0., 2., 0., 2.],
                        [3., 0., 3., 0., 3.]])

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

<div style="color:#555;border-top:1px solid #999;text-align:right;padding:4px;">End of exercise</div>

# Summary: use Numpy for faster and easier computations with gridded data

Numpy makes calcuations with gridded data:
- faster
- easier

Also there are plenty of common functions available (mean, cosine, sine, filters, ...).

![](images/list-of-lists.png)

Non-gridded data doesn't work very well with Numpy:

Imagine somebody wants to collect the age of each person infected with Covid-19 for each day. One could use a list of lists, i.e. a Python list where each element is a list of integers:

In [None]:
infected_age_per_day = [
    [16, 41],
    [73],
    [40, 24, 83, 29],
    [],
    [64, 13],    
]

In [None]:
infected_age_per_day[2]

The number of infected person varies, so this is not a regular array with rectangular shape.

One can convert the this lists of lists to a `numpy` array, but it is of dtype `object` and not `int`. This should be avoided!

In [None]:
np.array(infected_age_per_day)

Instead we can convert this to a gridded array, if we add a new row for each infected person. In the first column there is a value indicating the day of infection and in the section the age of the person:

In [None]:
infected_day_age = [
    [0, 16],
    [0, 41],
    [1, 73],
    [2, 40],
    [2, 24],
    [2, 83],
    [2, 29],
    [4, 64],
    [4, 13],    
]
np.array(infected_day_age)

Note that there is a difference in how [lists](https://en.wikipedia.org/wiki/Linked_list) are stored internally vs how [arrays](https://en.wikipedia.org/wiki/Array_(data_structure)) are [stored internally](https://en.wikipedia.org/wiki/Row-_and_column-major_order).

<small>Source: https://transfem.social/notes/9re0vbzxmg8u1wm7</small>

![Matrix pants meme](images/matrix-pants.png)