# Basics

## Numpy

You will now play around with some basics of tensor manipulation in numpy. The basic object in numpy is a homogeneous multidimensional array. Numpy's array class is called `ndarray`. Checkout the [quickstart tutorial](https://numpy.org/devdocs/user/quickstart.html).

First, we import the numpy package.

In [None]:
import numpy as np


Let’s create two matrices and check their properties.

In [None]:
A = np.array(np.arange(6))
B = np.array([-1, 3])
print(f"A (shape: {A.shape}, type: {type(A)}) = {A}")
print(f"B (shape: {B.shape}, type: {type(B)}) = {B}")


**Explanation:** First, 2 arrays (also called tensors) are created. Each numpy tensor has an attribute `numpy.ndarray.shape` which describes the dimensions of the defined tensor. Type, shape and content of the tensors are the first output of the script.

Please note how we are using [f-strings](https://realpython.com/python-f-strings/#f-strings-a-new-and-improved-way-to-format-strings-in-python) to output variables.

Next, we try to multiply the two tensors with [np.matmul](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html). The dimensions of the matrices are incompatible, so we get an error.

In [None]:
print(f"A={A}\nB={B}\n")
try:
    np.matmul(A, B)
except ValueError as e:
    print(f"Operation failed: {e}")


We get a *ValueError* due to the shape mismatch between the two numpy arrays we want to multiply. 

In order to deal with different array shapes during arithmetic operations, we can either [reshape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html) the arrays or [broadcast](https://docs.scipy.org/doc/numpy/user/theory.broadcasting.html#array-broadcasting-in-numpy) the smaller array across the larger one such that they have compatible shapes.

In [None]:
C = A.reshape([3, 2])
print(f"C={C}\nshape: {C.shape}")


Now the matrix multiplication $CB$ works out.

In [None]:
print(f"C={C}\nB={B}\n")
matmul_result = np.matmul(C, B)
print(matmul_result)


Let's do matrix addition with [np.add](https://numpy.org/doc/stable/reference/generated/numpy.add.html) which can also be done by using `+`. 
When adding $C$ with shape $(3,2)$ and $B$ with shape $(2,)$, $B$ will be automatically broadcast to match the shape of $C$ by repeating it over 3 rows to $(3,2)$

In [None]:
print(C + B)


The star operator $*$ will do an element-wise multiplication ([np.multiply](https://numpy.org/doc/stable/reference/generated/numpy.multiply.html)) between $C$ and $B$. Again, $B$ will be broadcasted to fit.

In [None]:
print(C * B)


The function [np.diag](https://numpy.org/doc/stable/reference/generated/numpy.diag.html) can transform the vector $B$ shaped $(2,)$ into a diagonal matrix of shape $(2,2)$

In [None]:
print(f"B={B}\n")
print(np.diag(B))


For transposing a ndarray use [np.transpose](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html) or `my_array.T`

In [None]:
print(f"C={C}\n")
print(np.transpose(C))


[Indexing operations](https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing) are used to select parts of the tensor.

In [None]:
print(f"C={C}\n")
print(C[0])  # select row 0
print(C[0, 0])  # select row 0, column 0
print(C[:, 0])  # select all rows, column 0


Tensor operations are a central part of the exercises.
Play around with the notebook to get familiar with them.

**Note:** For-loops in python are usually too slow, use vectorized numpy expressions instead.

**Slow** sum in python:

In [None]:
%%timeit -r 1 -n 1
total = 0
for i in range(1000000):
    total += i ** 2


**Fast** sum by letting numpy do the math. Note that we need the datatype `np.int64` to use integers that are large enough to store the result. The square and sum operations are vectorized and run in fast C code internally.

In [None]:
%%timeit -r 1 -n 1
numbers = np.arange(1000000, dtype=np.int64)
total = (numbers ** 2).sum()


## Plotting

In the exercises you will use [matplotlib](https://matplotlib.org/) to visualize data.

Let's create some gaussian random data first:

In [None]:
data_2d = np.random.randn(1000, 2)


We created a tensor with 1000 rows for our datapoints and 2 columns for an x- and y-axis position. Now we can plot it:

In [None]:
from matplotlib import pyplot as plt

# increase the font size for all plots in the notebook
plt.rc('font', size=16)

# create a figure that is bigger than the default size
plt.figure(figsize=(12,8))

# create the scatterplot
plt.scatter(data_2d[:, 0], data_2d[:, 1], label="The data")

# set axis labels
plt.xlabel("x-axis")
plt.ylabel("y-axis")

# display a grid and legend
plt.grid()
plt.legend()

# show the plot
plt.show()


Instead of a scatter plot, we can also do a line plot:

In [None]:
x = np.arange(100)
y = np.random.randn(100)

plt.figure(figsize=(12,8))
plt.plot(x, y, "X-", label="A sequence")
plt.xlabel("Step")
plt.ylabel("Value")
plt.grid()
plt.legend()
plt.show()
