# 03 Linear algebra

Part of ["Introduction to Data Science" course](https://github.com/kupav/data-sc-intro) by Pavel Kuptsov, [kupav@mail.ru](mailto:kupav@mail.ru)

Recommended reading for this section:

1. Grus, J. (2019). Data Science From Scratch: First Principles with Python (Vol. Second edition). Sebastopol, CA: O’Reilly Media
1. Williams, G. (2019). Linear Algebra with Applications (Vol. Ninth edition). Burlington, MA: Jones & Bartlett Learning. 

The following Python modules will be required. Make sure that you have them installed.
- `matplotlib`
- `numpy`

## Lesson 1

### Linear, affine and non-linear operations

Everything starts from equations due to their great importance for practice. 
Most of all problems solved with the help of mathematics are represented as 
an equation or as an equation set. 

When we need to solve equations we develop mathematical methods and concepts 
for this. That is how many mathematical branches appear.

Equation: given the formula 
$$ax=b$$
find the unknown $x$.

Another equation:
$$ax^2+bx+c=0$$
Also one needs to find $x$.

The first equation is _linear_ while the second one is _non-linear_

The first equation is linear since $x$ undergoes only _linear_ operations.

So we can classify operations done with the unknown of an equation.

Let $f(x)$ be some operation. For example
$$f(x)=ax$$<br/>
$$f(x)=ax+k$$<br/>
$$f(x)=\sin x$$<br/>
$$f(x)=x^2$$

So let $f(x)$ be some operation, not necessary one of those above. It is said to be linear if the following holds:

$$f(x+y) = f(x) + f(y)$$<br/>
$$f(cx) = cf(x)$$

where $x$ and $y$ are variables and $c$ is merely a number.

Only $f(x)=ax$ is a linear operation:

$$f(x+y)=a(x+y)=ax+ay=f(x)+f(y)$$<br/>
$$f(cx)=cax=cf(x)$$

Let us test $f(x)=ax+k$:

$$f(x+y)=a(x+y)+k=ax+ay+k\neq f(x)+f(y)$$

Due to $k$ the equality is not fulfilled.

The operation
$$f(x)=ax+k$$
is called affine. This is linear operation with a shift.

All other operations are non-linear.

An equation 
$$ax=b$$
reads: find such $x$ whose linear transformation $ax$ results in $b$. 
That is why this is equation is linear.

### Linear equations

The question: what a mathematical object can be denoted by $x$? Mathematics give a lot of different answers to this questions, and we need only some of them.

The simplest case: $x$ is denotes some unknown number. We will say that $x$ is _a scalar_. On this occasion $a$ and $b$ are also scalars. We can find $x$ as follows:
$$ax=b$$<br/>
$$x=b/a$$
provided that $a\neq0$. 

If $a=0$ there are two options. If $b\neq 0$ a solution does not exist. But if $b=0$ any $x$ fulfills the equations. A bit later we will consider similar options for equation sets.

But what if we have a problem with two unknown scalars, say $x_0$ and $x_1$, that depend on each other? In this case we have an equation set:

$$a_{00} x_0 + a_{01} x_1 = b_0$$
$$a_{10} x_0 + a_{11} x_1 = b_1$$

How these equation appears, why this structure? 
We just have taken into account all possibilities being limited by linear operations.

Observe that in the left hand sides we have linear operations both with $x_0$ and with $x_1$. 

Why for two unknowns we have written exactly two equations? One equation would be not enough to find both $x_0$ and $x_1$:
$$a_{00} x_0 + a_{01} x_1 = b_0$$<br/>
$$a_{00} x_0 = b_0 - a_{01} x_1$$<br/>
$$x_0 = \frac{b_0 - a_{01} x_1}{a_{00}}$$
The third equation is either a combination of two others and thus does not needed at all, or makes the whole system inconsistent.

Just to make ourself more comfortable with it, let us solve the equation set.
First we need to take one of the equations and solve it for one of the variables. We have already done it above:

$$x_0 = \frac{b_0 - a_{01} x_1}{a_{00}}$$

Now let us substitute it to the other equation: we take the second equation

$$a_{10} x_0 + a_{11} x_1 = b_1$$

and write instead of $x_0$ its expression:

$$a_{10} \left(\frac{b_0 - a_{01} x_1}{a_{00}}\right) + a_{11} x_1 = b_1$$

Now find $x_1$ from here:

$$a_{10} (b_0 - a_{01} x_1) + a_{11}a_{00} x_1 = b_1 a_{00}$$<br/>
$$a_{10} b_0 - a_{10} a_{01} x_1 + a_{11}a_{00} x_1 = b_1 a_{00}$$<br/>
$$a_{10} b_0 - (a_{10} a_{01} - a_{11}a_{00}) x_1 = b_1 a_{00}$$<br/>
$$-(a_{10} a_{01} - a_{11}a_{00}) x_1 = b_1 a_{00}-a_{10} b_0$$<br/>
$$x_1 = \frac{b_1 a_{00}-a_{10} b_0}{-(a_{10} a_{01} - a_{11}a_{00})}$$<br/>
<br/>
$$x_1 = \frac{a_{00}b_1-a_{10} b_0}{a_{11}a_{00} - a_{10} a_{01}},\;\;x_0 = \frac{b_0 - a_{01} x_1}{a_{00}}$$

Let us now consider an example:

In [None]:
a = [[-1.0, 2.0], [2.0, 4.0]]
b = [3.0, -4.0]

x1 = (a[0][0] * b[1] - a[1][0] * b[0]) / (a[1][1] * a[0][0] - a[1][0] * a[0][1])

x0 = (b[0] - a[0][1] * x1) / a[0][0]

print(f"x0={x0}, x1={x1}")

In [None]:
# Check if the solution is correct
b0_new = a[0][0] * x0 + a[0][1] * x1
b1_new = a[1][0] * x0 + a[1][1] * x1

W = 5
print(f"b0_new={b0_new:{W}}, b[0]={b[0]:{W}}")
print(f"b1_new={b1_new:{W}}, b[1]={b[1]:{W}}")

But what if $a_{11}a_{00} - a_{10} a_{01}=0$? Looking ahead, this expression is called determinant. If the determinant is zero the obtained formulas become incorrect due to division by zero. 

It occurs when the left hand sides are not so different:

In [None]:
a = [[-1.0, 2.0], [2.0, -4.0]]
print(f"det={a[1][1] * a[0][0] - a[1][0] * a[0][1]}")

$$-x_0 + 2 x_1=b_0$$
$$2 x_0 - 4 x_1=b_1$$

Observe what happens if we multiply the first equation by $-2$:

$$-2(-x_0 + 2 x_1)=-2 b_0$$
$$2 x_0 - 4 x_1=b_1$$
<br/>
$$2x_0 - 4 x_1=-2 b_0$$
$$2 x_0 - 4 x_1=b_1$$

Again we have two options. If $b_1\neq -2 b_0$ this equation set have no solutions.

But if $b_1=-2 b_0$, in particular at $b_0=b_1=0$, the two equations becomes fully identical. Initial equations in this case are called linear dependent since one equation is proportional to another.

One of the unknowns can have an arbitrary value and the other one is computed from one of the equations:
$$x_0 = \frac{b_0 - a_{01} x_1}{a_{00}}$$
In other words there are infinitely many solutions.

For example:

In [None]:
import numpy as np
rng = np.random.default_rng()

In [None]:
a = [[-1.0, 2.0], [2.0, -4.0]]
print(f"det={a[1][1] * a[0][0] - a[1][0] * a[0][1]}")

b = [2, -4.0]

x1 = rng.standard_normal()
print(f"Generate random x1: {x1}")

x0 = (b[0] - a[0][1] * x1) / a[0][0]

print(f"Solution with random x1: x0={x0}, x1={x1}")

# Check if the solution is correct
b0_new = a[0][0] * x0 + a[0][1] * x1
b1_new = a[1][0] * x0 + a[1][1] * x1

W = 5
print(f"b0_new={b0_new:{W}}, b[0]={b[0]:{W}}")
print(f"b1_new={b1_new:{W}}, b[1]={b[1]:{W}}")

What if there are three variables: $x_1$, $x_2$, and $x_3$ and three linear equations for them?

Everything is basically the same. Again one have to compute the determinant. If this is non-zero there is a single solution, and for zero determinant there are two options: either no solution or infinitely many solutions.

### Sets of linear equations: appearing vectors and matrices

There is a unified approach for linear equations.

Let us compose two scalars $x_0$ and $x_1$ as a column and call it a vector:
$$\vec x = \begin{pmatrix} x_0 \\ x_1 \end{pmatrix}$$

Also compose a vector of $b_0$ and $b_1$:
$$\vec b = \begin{pmatrix} b_0 \\ b_1 \end{pmatrix}$$

Let us compose equation coefficients into a $2\times 2$ table and call it a matrix:
$$A = \begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix}$$

Now we define a matrix-vector multiplication as follows:
$$
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix}
\begin{pmatrix} x_0 \\ x_1 \end{pmatrix}=
\begin{pmatrix} a_{00} x_0 + a_{01} x_1 \\ a_{10} x_0 + a_{11} x_1 \end{pmatrix}
$$
- Take the first row of the matrix and the vector
- Multiply corresponding components: first by first, second by second
- Sum the results of the multiplications. This is the first resulting component
- Do the same with the second row

Thus the equation set can be written as follows:
$$
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix}
\begin{pmatrix} x_0 \\ x_1 \end{pmatrix}=
\begin{pmatrix} b_0 \\ b_1 \end{pmatrix}
$$
or in a compact form:
$$
A \vec x = \vec b
$$

In the same way we can construct larger vectors, with three, four or more entries. In general vectors $\vec x$ and $\vec b$ contain $N$ components and a matrix is a table of $N\times N$ components

Recall the question above: what a mathematical object can be denoted by $x$? We see now that linear equation can be written for vectors.

### Two useful drawing functions

In what follows we will need two functions for convenient drawing vectors and lines.

In [None]:
def draw_vector(ax, label, color, vec, orig):
    """
    Draw a vector
    -------------
    ax : axes
        The axes to draw
    label : str
        The label of the vector
    color : str
        Color specification
    vec : list of two comopnents
        The vector - shifts dx and dy from orig
    orig : list of two components
        The vector begining point
    """
    
    vec_end = [orig[0] + vec[0], orig[1] + vec[1]] 
    ax.annotate("", 
                xytext=orig,  # arrow begining point
                xy=vec_end,   # arrow end point
                arrowprops={'color': color});

    # Draw a transparent circle around a label to improve visibility
    bbox = {'facecolor':'w',
            'edgecolor': 'w',
            'boxstyle': 'circle',
            'alpha': 0.8}

    # Add a text in the middle of the arrow
    ax.text(orig[0] + 0.5*vec[0], orig[1] + 0.5*vec[1], label,
           fontsize=16, bbox=bbox, va='center', ha='center')

def draw_line(ax, label, beg, end, linestyle='-', lw=4):
    """
    Draw a line (without an arrow)
    ------------------------------
    ax : axes
        The axes to draw
    label : str
        The label of the vector
    p_beg : list of two components
        Staring coordinates
    p_end : 
        Ending coordinates
    """

    ax.annotate("", 
                xytext=beg,
                xy=end,
                arrowprops={'color': '0.5',          # this is 50% of gray level
                            'arrowstyle': '-',       # no arrow, just a line
                            'linestyle': linestyle,  # sometimes we will need dahsed lines
                            'lw': lw});              # line width

    if label != '':
        # Draw a transparent circle around a label to improve visibility
        bbox = {'facecolor':'w',
                'edgecolor': 'w',
                'boxstyle': 'circle',
                'alpha': 0.8}

        # Add a text in the middle of the arrow
        mid = [0.5 * (pb + pe) for pb, pe in zip(beg, end)]
        ax.text(mid[0], mid[1], label,
               fontsize=16, bbox=bbox, va='center', ha='center')

### The vectors

From a practical point of view a vector as an array of numbers. 

Recall that earlier we discussed one dimensional arrays: these are sequences of numbers. Also there were two dimensional arrays:  tables of numbers. Vectors are one dimensional arrays. A bit later we will consider matrices that are the two dimensional arrays.

Usually a vector consists of real numbers, but also it can be integers, complex numbers and some others.

Number of coordinates (the length of the array) equals to the dimension of a space where a vector is defined: Two coordinates for vectors in 2D space, three coordinates for 3D space and so on.

Do not confuse: we have used the word "dimension" two times with two different meanings. The array dimension refers to a structure of vector representation. This is equal to one always. The space dimension is a number of vector components, its size. It refers to the playground were the vectors exist. It can have any value, from one to even infinity.

Geometrically a vector is an arrow specifying some point in space. Sometimes instead of vectors we say about the points themselves. 

Let us draw a 2D vector:

In [None]:
# Vector in 2D space
vec = [0.7, 0.8]

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([0, 1])
ax.set_ylim([0, 1])

draw_line(ax, label=r'$x_0$', beg=vec, end=[0, vec[1]])
draw_line(ax, label=r'$x_1$', beg=vec, end=[vec[0], 0])
draw_vector(ax, label=r'$\vec v$', color='C0', vec=vec, orig=[0,0])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')

ax.grid();

If needed a vector can be shifted from the origin:

In [None]:
# Shifted vector in 2D space
orig = [0.2, 0.1]
vec = [0.7, 0.8]

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([0, 1])
ax.set_ylim([0, 1])

# Lines x and y
draw_line(ax, label=r'$x_0$', beg=[vec[0] + orig[0], vec[1] + orig[1]], end=[orig[0], vec[1] + orig[1]])
draw_line(ax, label=r'$x_1$', beg=[vec[0] + orig[0], vec[1] + orig[1]], end=[vec[0] + orig[0], orig[1]])

# Dashed lines as delimites
draw_line(ax, label='', beg=[orig[0], vec[1] + orig[1]], end=orig, lw=2, linestyle='--')
draw_line(ax, label='', beg=[vec[0] + orig[0], orig[1]], end=orig, lw=2, linestyle='--')

# The vector
draw_vector(ax, label=r'$\vec v$', color='C0', vec=vec, orig=orig)

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')

ax.grid();

### A bit deeper into mathematics 

(Actually this is not very important for us):

Rigorously, a vector is a mathematical object with some properties. 
If we want to show it, we choose coordinate axes and then represent 
a vector with respect to these axes as a 1D array of scalar coordinates.

Thus an array of coordinates is not the vector itself, this is 
its particular representation.

We can choose other coordinate axes and represent the vector with respect 
to them. We will obtain another set of coordinates representing the same vector.

Again, $\vec v$ is the vector, a mathematical object, and array of 
coordinates, say $(-3, 1)$ is just one of its representations.

### Vector-columns and vector-rows. Transposition

There are row-vectors and column-vectors. A row vector:
$$
\vec v = (x_0, x_1)
$$
A column vector:
$$
\vec v = \begin{pmatrix}x_0 \\ x_1 \end{pmatrix}
$$
A row-vector is transformed into a column vector and back trough operation named transposition. It is denoted as the superscript "T":
$$
(x_0, x_1)^{\mathrm{T}}=\begin{pmatrix}x_0 \\ x_1 \end{pmatrix},\;\;
\begin{pmatrix}x_0 \\ x_1 \end{pmatrix}^{\mathrm{T}}=(x_0, x_1)
$$

In texts column vectors take more space then row vectors. That is why column vectors sometimes are written as transposed row vectors. For example
$$
(x_0, x_1)^{\mathrm{T}}
$$
actually means 
$$
\begin{pmatrix}x_0 \\ x_1 \end{pmatrix}
$$

In [None]:
# Row vector as a Python list
row_vec = [2, 3, -1]

# Col vector - sequence of unit length rows
col_vec = [[2], [3], [-1]]

print(f"row_vec={row_vec}")
print(f"col_vec={col_vec}")

For mathematics the difference between row and column vectors is essential. For computations sometimes it is ignored for simplicity: row vectors can be used where the columns are required in a strict sense.

### Vector addition and subtraction

Vectors admits arithmetic: two of them  can be added or subtracted. Addition and subtraction is a componentwise operation:

Let
$$
\vec v_1 = (3.0, 4.0), \; 
\vec v_2 = (1.5, -1.5)
$$
Their sum is
$$
\vec v_1 + \vec v_2 = (3.0, 4.0) + (1.5, -1.5) = (4.5, 2.5)
$$

Let us implement a vector addition as a function. Vectors can have arbitrary dimensions.

Actually, we will use NumPy arrays as vectors. The componentwise addition rule is already implemented for them. But now we are going to get aquainted better with vectors.

In [None]:
def vector_add(vec1, vec2):
    """
    Vector addition vec1 + vec2
    """
    assert len(vec1) == len(vec2)  # lengths must be identical
    
    return [v1 + v2 for v1, v2 in zip(vec1, vec2)]

In [None]:
vec1 = [2.3, 3.5, -1.2, 5.7]
vec2 = [1.7, -1.5, 1.2, 1.3]
vec3 = [2.7, 2.5, -0.8, -0.7, 1.2]

In [None]:
# try to add vec1 and vec2
try:
    vec_sum_12 = vector_add(vec1, vec2)
    print(f"vec_sum_12={vec_sum_12}")
except AssertionError:
    print("Dimensions are incompatible")

In [None]:
# try to add vec1 and vec3 - will fail
try:
    vec_sum_13 = vector_add(vec1, vec3)
    print(f"vec_sum_13={vec_sum_13}")
except AssertionError:
    print("Dimensions are incompatible")

An illustration: how the vectors additions looks geometrically

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-0.5, 5])
ax.set_ylim([-0.5, 5])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')

vec1 = [3.0, 4.0]
vec2 = [1.5, -1.5]
vec_sum = vector_add(vec1, vec2)

draw_vector(ax, r'$\vec v_1$', 'C0', vec1, orig=[0, 0])
draw_vector(ax, r'$\vec v_2$', 'C1', vec2, orig=vec1)
draw_vector(ax, r'$\vec v_1+\vec v_2$', 'C2', vec_sum, orig=[0, 0])

ax.grid();

Now consider a vector subtraction: 

$$
\vec v_1 = (3.3, -4.2), \; 
\vec v_2 = (1.3, 2.8)
$$
<br/>
$$
\vec v_1 - \vec v_2 = (3.3, -4.2) - (1.3, 2.8) = (2.0, -7.0)
$$

The function can be obtained easily from the addition by changing plus to minus:

In [None]:
def vector_sub(vec1, vec2):
    """
    Vector subtraction: vec1 - vec2
    """
    assert len(vec1) == len(vec2)
    
    # The difference is here: "-" instead of "+"
    return [v1 - v2 for v1, v2 in zip(vec1, vec2)]

In [None]:
vec1 = [3.2, -1.0, 2.7, 3.8]
vec2 = [1.2, 1.0, 0.7, 1.8]
vec_dif = vector_sub(vec1, vec2)
print(vec_dif)

A geometric illustration

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-0.5, 5])
ax.set_ylim([-0.5, 5])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')

vec1 = [4.5, 2.5]
vec2 = [3, -2]
vec_dif = vector_sub(vec1, vec2)

draw_vector(ax, r'$\vec v_1$', 'C0', vec1, orig=[0, 0])
draw_vector(ax, r'$\vec v_2$', 'C1', vec2, orig=vec_dif)
draw_vector(ax, r'$\vec v_1+(-\vec v_2)$', 'C2', vec_dif, orig=[0, 0])

ax.grid();

### Vector length and Euclidean norm

Vectors have length that can be computed via Pythagorean theorem

$$
\vec v = (x_0, x_1)
$$
<br/>
$$
|\vec v|_2 = \sqrt{x_0^2 + x_1^2}
$$
Subscript 2 means using squares and square root in computations.

Vector length computed like above is also called an Euclidean norm, 2-norm or $L^2$ norm. 

Euclidean norms of higher dimensional vectors is computed in a similar way. Let $N$ be an $N$-dimensional vector:
$$
\vec v = (x_0, x_1, x_2, \ldots, x_{N-1})
$$

Its Euclidean norm reads:
$$
|\vec v|_2 = \sqrt{x_0^2 + x_1^2 + \cdots + x_{N-1}^2}
$$

The same formula in a compact form
$$
|\vec v|_2 = \sqrt{\sum_{i=0}^{N-1} x_i^2}
$$

To "touch vectors by hands" let us create a program for computing Euclidean vector norms. In actual computations later we will use NumPy vectors for which the norm computation is built-in.

In [None]:
def vector_norm2(vec):
    """
    Euclidean norm of vector
    """
    tmp = [v**2 for v in vec]
    return sum(tmp)**0.5

vec = [3.0, 4.0]
print(f"|vec|={vector_norm2(vec)}")

An illustration:

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-0.5, 5])
ax.set_ylim([-0.5, 5])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')


vec = [3.0, 4.0]
nrm = vector_norm2(vec)

draw_line(ax, r'$x_0={{{}}}$'.format(vec[0]), [vec[0], 0], [0, 0])
draw_line(ax, r'$x_1={{{}}}$'.format(vec[1]), [vec[0], 0], vec)

draw_vector(ax, r'$|\vec v|_2={{{}}}$'.format(nrm), 'C0', vec=vec, orig=[0, 0])

ax.grid();

print(f"L2(vec)=sqrt({vec[0]}**2+{vec[1]}**2)={nrm}")

### Various vector norms

Norm is a generalization of the idea of length: the length can be computed in different ways.

A norm is a function $d(\vec v)$ that returns for any vector $\vec v$ a real number. Required property of the norm:
- If $d(\vec v)=0$ then $\vec v=0$.
- $d(a \vec v) = |a| d(\vec v)$, where $a$ is scalar.
- $d(\vec v + \vec u) \leq d(\vec v) + d(\vec u)$.

__No need to figure out in detail! This is here just to show what is it.__

In the other words, Mathematics tells us that if we have vectors and there is a function $d(\vec v)$ that computes for each vector a real number provided that the conditions above are fulfilled,  this number can be considered as a vector length and it will be as good as intuitively obvious  Euclidean length.

Why at all we do so? Why we need many norms? For computations: some norms are computed simpler and thus faster than others. For data science: when we tune up a model for data processing sometimes 
an appropriate norm helps to get the result faster and with higher precision. 

### Taxicab norm or Manhattan norm

This is the distance that a taxi has to drive in a city due to a rectangular street grid:
$$
|\vec v|_1 = \sum_{i=0}^{N-1} |x_i|
$$

This is also called $L^1$ norm.

A function for $L^1$ norm:

In [None]:
def vector_norm1(vec):
    """
    Taxicab norm
    """
    tmp = [abs(v) for v in vec]
    return sum(tmp)

vec = [-3.0, 4.0]
print(f"|vec|={vector_norm1(vec)}")    

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-5, 0.5])
ax.set_ylim([-0.5, 5])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')


vec = [-3.0, 4.0]
nrm = vector_norm1(vec)

draw_line(ax, r'$x_0={{{}}}$'.format(vec[0]), [vec[0], 0], [0, 0])
draw_line(ax, r'$x_1={{{}}}$'.format(vec[1]), [vec[0], 0], vec)

draw_vector(ax, r'$|\vec v|_1={{{}}}$'.format(nrm), 'C0', vec=vec, orig=[0, 0])

ax.grid();

print(f"L1(vec)=|{vec[0]}|+|{vec[1]}|={nrm}")

### Maximum norm or Infinity norm

This is the largest by magnitude vector coordinate:
$$
|\vec v|_\infty = \textrm{max}(|x_0|, |x_1|, \ldots, |x_{N-1}|)
$$

This is $L^{\infty}$ norm.

In [None]:
def vector_norminf(vec):
    """
    Infinity norm
    """
    tmp = [abs(v) for v in vec]
    return max(tmp)

vec = [-4.0, 3.0]
print(f"|vec|={vector_norminf(vec)}") 

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-5, 0.5])
ax.set_ylim([-0.5, 5])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')


vec = [-4.0, 3.0]
nrm = vector_norminf(vec)

draw_line(ax, r'$x_0={{{}}}$'.format(vec[0]), [vec[0], 0], [0, 0])
draw_line(ax, r'$x_1={{{}}}$'.format(vec[1]), [vec[0], 0], vec)

draw_vector(ax, r'$|\vec v|_{{\infty}}={{{}}}$'.format(nrm), 'C0', vec=vec, orig=[0, 0])

ax.grid();

print(f"Linf(vec)=max(|{vec[0]}|,|{vec[1]}|)={nrm}")

### p-norm

This is a general equation for a family of norms:
$$
|\vec v|_p=\left(\sum_{i=0}^{N-1} |x_i|^p \right)^{1/p}
$$
where $p\geq 1$.

- Taxicab norm at $p=1$
- Euclidean norm at $p=2$
- Infinity norm at $p\to\infty$

The last item is not so obvious but we will not prove it.

In [None]:
def vector_norm(vec, p):
    """
    p-norm
    """
    tmp = [abs(v)**p for v in vec]
    return sum(tmp)**(1/p)

vec = [-4.0, 3.0]

print(f"L1:   {vector_norm1(vec)}, {vector_norm(vec, 1)}")
print(f"L2:   {vector_norm2(vec)}, {vector_norm(vec, 2)}")
print(f"Linf: {vector_norminf(vec)}, {vector_norm(vec, 20)}") # take large value for p instead of infinity

### Distances

Assume that John stays at coordinates given by a vector $\vec a$ and Ann is at location $\vec b$. Find the distance between them.

To find the distance we need to subtract one vector out of the other and then compute a norm of the result.

If both of persons are in a flat area, say in a corn field, the Euclidean norm is appropriate. 
$$
L^2=|\vec a - \vec b|_2 = \sqrt{(a_0-b_0)^2 + (a_1-b_1)^2}
$$
This is the shortest possible estimate of the distance.

In a city an Euclidean shortest path is never available. One has to follow a rectangular mesh of streets. Then the Taxicab is suitable:
$$
L^1=|\vec a - \vec b|_1 = |a_0 - b_0| + |a_1 - b_1|
$$

So we see that the distance can be computed in different ways. But what is thee worst case? This is given by the maximum norm:
$$
L^{\infty}=|\vec a - \vec b|_{\infty} = \textrm{max}(|a_0 - b_0|, |a_1 - b_1|)
$$
The worst distance estimate is $N L^{\infty}$, where $N$ is space dimension, i.e. 2 in our example.

### Vectors and distances for data science

In data science vectors are used to store entity descriptions: a person, a flower, a picture, a text and so on. Components of a vector are its features. For example if a vector represents a person for a medical database it can contain for example age, height and weight: 
$$
\vec a = (25, 180, 65)
$$
$$
\vec b = (28, 170, 84)
$$

Given these vectors we can check how similar are these two persons. For this we need to compute the distance between them.

In [None]:
a = [25, 180, 65]
b = [28, 170, 84]

L1 = vector_norm1(vector_sub(a, b))
L2 = vector_norm2(vector_sub(a, b))
Linf = vector_norminf(vector_sub(a, b))

print(f"L1   = {L1}")
print(f"L2   = {L2}")
print(f"Linf = {Linf}")

We have three estimates of the similarity. Which one is correct? There is no unambiguous answer. It depends on the problem we solve. 

It is possible that no one is appropriate. It means that we need to our representation is not adequate and we need to preprocess feature vectors somehow before doing the comparison. 

Another case when we need distance between vectors. Assume we have a model that takes a feature vector as an input, works out it somehow and returns an answer vector. This vector is often called a prediction because the model tries to predict some properties of the input entity. 

When we tune up our model we feed it with the entities for that the true answer is known in advance. Computing the distance between the prediction and the vectors we can say how good our model is and also how it must be changed to make it better.

### Scaling of vectors: multiplication by a scalar

Vectors can be multiplied by scalar. This is componentwise operation:
$$
a \vec v = a (x_0, x_1) = (a x_0, a x_1)
$$

No matter what norm is used, when a vector is multiplied by a scalar its length is increased by this factor.
$$
|a \vec v|_p=\left(\sum_{i=0}^{N-1} |a x_i|^p \right)^{1/p} =
\left( |a|^p \right)^{1/p} \left(\sum_{i=0}^{N-1} |x_i|^p \right)^{1/p} =
|a| \left(\sum_{i=0}^{N-1} |x_i|^p \right)^{1/p}
$$

Let us create a function for scaling of vectors

In [None]:
def vector_scl(a, vec):
    """
    Multiply vector by scalar
    """
    return [a*v for v in vec]

Now an illustration

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-0.5, 5])
ax.set_ylim([-0.5, 5])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')

vec = [2.0, 3.0]
a1 = 1.5
a2 = 0.5

vec_scl_1 = vector_scl(a1, vec)
vec_scl_2 = vector_scl(a2, vec)

draw_vector(ax, r'$\vec v$', 'C0', vec, orig=[0, 0])

# We shift the rescaled vectors to improve visibility
draw_vector(ax, r'$a_1 \vec v$', 'C1', vec_scl_1, orig=[1, 0])
draw_vector(ax, r'$a_2 \vec v$', 'C3', vec_scl_2, orig=[2, 0])

ax.grid();

### Vector normalization 

Each vector has a length and direction. When we want to know its length we compute a vector norm. 

To get only the direction we need to normalize a vector. It means to divide it by its norm. 

The result is so called a unit vector whose length with respect to the chosen norm equals one. Thus the only essential information that carries this vector is its direction.

Here is the function that computes vector norms. Observe that we pass a function for the norm computation as a parameter.

In [None]:
def vector_normalize(vec, norm):
    """
    Normalization of vector
    
    norm : function
        This is a function employed to find a vector norm
    """
    nrm = norm(vec)
    return vector_scl(1/nrm, vec)

Consider an illustration. Notice that we shift normalized vectors from the origin to avoid overlapping.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-0.5, 5])
ax.set_ylim([-0.5, 5])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')

vec = [2.0, 5.0]

vec_unit_1 = vector_normalize(vec, vector_norm1)
vec_unit_2 = vector_normalize(vec, vector_norm2)
vec_unit_inf = vector_normalize(vec, vector_norminf)

draw_vector(ax, r'$\vec v$', 'C0', vec, orig=[0, 0])

# We shift the rescaled vectors to improve visibility
draw_vector(ax, '1', 'C1', vec_unit_1, orig=[1, 0])
draw_vector(ax, '2', 'C2', vec_unit_2, orig=[2, 0])
draw_vector(ax, r'$\infty$', 'C3', vec_unit_inf, orig=[3, 0])

ax.grid();

We see that all normalized vector are parallel to each other and to the original one. 

Once more: normalization preserves vector direction and drops out information about its lengths (with respect to the chosen norm).

### Dot product

Dot product is an operation when we sum a componentwise products of two vectors:
$$
(\vec x, \vec y) \equiv \vec x \cdot \vec y \equiv  \vec x \vec y =
\sum_{i=0}^{N-1} x_i y_i
$$

Symbol "$\equiv$" means identity. It is used to show equivalent forms of denoting of the dot product.

Observe that the dot product of a vector by itself equals to its squared Euclidean norm:
$$
\vec x \vec x = \sum_{i=0}^{N-1} x_i^2 = \left( L^2(\vec x) \right)^2
$$

Using the dot product we can compute angels between vectors due to the following property:
$$
\vec x \vec y = |\vec x|_2 |\vec y|_2 \cos(\alpha)
$$
where $|\vec x|_2$ and $|\vec y|_2$ are Euclidean norms and $\alpha$ is the angle between $\vec x$ and $\vec y$.

Since $\cos 90^\circ=\cos \pi/2=0$ the dot product of two orthogonal vectors is always zero.

Cosine between two vectors can be used as a measure of their similarity. This is called a cosine similarity. 

The cosine similarity is the highest and equals 1 when the angle between two vectors is zero. For nonzero angles the similarity is less then 1.

Here is the function that computes the dot product:

In [None]:
def vector_dot(vec1, vec2):
    """
    Dot product of two vectors
    """
    
    assert len(vec1) == len(vec2)
    
    tmp = [v1 * v2 for v1, v2 in zip(vec1, vec2)]
    return sum(tmp)

This is an illustration that the dot product of two orthogonal vector is zero.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-1.5, 4])
ax.set_ylim([-1.5, 4])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')

vec1 = [2.0, 4.0]
vec2 = [2.0, -1.0]

dot = vector_dot(vec1, vec2)
print(f"(vec1, vec2) = {dot}")

draw_vector(ax, r'$\vec v_1$', 'C0', vec1, orig=[0, 0])
draw_vector(ax, r'$\vec v_2$', 'C1', vec2, orig=[0, 0])


ax.grid();

### Exercises

1\. Given the set of linear equations
$$ 2 x_1 + 2 x_3 = -2 $$
$$ 2 x_1 - 2 x_3 =  6 $$
$$ 3 x_2 - x_1 = -1 $$
derive formulas for $x_1$, $x_2$ and $x_3$ (not necessary in this order) and write a Python program to compute the solution.

2\. Create a function that computes an angle between two vectors. After computing the angle cosine use inverse trigonometric function arc cosine to recover the angle. You will find this function in either `math` or `numpy` modules: `math.acos` or `numpy.arccos`, respectively.

3\. A unit vector directed along a coordinate axis contains all 0 except the single 1 at a certain position. For example in three-dimensional Cartesian coordinate system these vectors are $(1, 0, 0)$, $(0, 1, 0)$, and $(0, 0, 1)$. 

Create a function that accepts two parameter: space dimension `dim` and an order number of a vector `vec_num`.  This function has to return the corresponding unit vector. For example for `dim=5` and `vec_num=2` it hast to be $(0, 1, 0, 0, 0)$.

4\. Draw these three vectors 
$$
  \vec v_1 = (0.98480775, 0.17364818)
$$

$$
  \vec v_2 = (-0.64278761,  0.76604444)
$$

$$
  \vec v_3 = (-0.34202014, -0.93969262)
$$

to demonstrate that their sum is zero. Use function `draw_vector` defined 
above or create your own better version if you like.

5\. Draw these two vectors
$$
  \vec v_1 = (0.96592583, 0.25881905)
$$

$$
  \vec v_2  = (-0.38822857,  1.44888874)
$$

to show that they are orthogonal. Prove their orthogonality using computations. Use function `draw_vector` defined above or create your own better version if you like.

## Lesson 2

### Matrices

Recall a sets of linear equations:
$$
\begin{pmatrix} 
    a_{00} & a_{01} & \dots & a_{0,N-1} \\
    a_{10} & a_{11} & \dots & a_{1,N-1} \\
    \vdots & \vdots & \ddots & \vdots \\
    a_{N-1,0} & a_{N-1, 1} & \dots & a_{N-1,N-1}
\end{pmatrix}
\begin{pmatrix} 
    x_0 \\ x_1 \\ \vdots \\ x_{N-1} 
\end{pmatrix}=
\begin{pmatrix} 
    b_0 \\ b_1 \\ \vdots \\ b_{N-1} 
\end{pmatrix}
$$

In a compact form it reads
$$
A \vec x = \vec b
$$
where $A$ is a $N\times N$ matrix.

Now we are going to get acquainted with matrices in more detail.

Convenient way of working with matrices is to use NumPy 2D arrays. But here we will consider matrices as native Python lists. 

NumPy can do with matrices almost all that we need. But it produces a result at once, and we want to know how it works.

Using native Python lists a matrix can be written as a list of lists. 

A list of list can be treated as a matrix if 
- all of its components have identical types;
- all of its sublists have the same lengths.

Here is a list of list representing $2\times 2$ matrix of floats.
```python
A = [[1.2, 2.1], [-3.6, 9.1]]
```

### Matrix main diagonal and the trace

Collection of matrix components with equal indexes is called a matrix _main diagonal_ (also a major diagonal, also a leading diagonal, also a principal diagonal). 

For the matrix $A$ above the main diagonal is $( a_{00}, a_{11}, \ldots, a_{N-1.N-1})$.

Sum of the components along the main diagonal is a matrix _trace_ and is denoted as $\textrm{Tr}$:
$$
\textrm{Tr} A = \sum_{i=0}^{N-1} a_{i,i}
$$

Let
```python
A = [[3.4, -4.1, 3.7], [-0.1, 2.2, -9.8], [6.0, -4.5, -5.3]]
```

Its main diagonal is
```python
[3.4, 2.2, -5.3]
```
and its trace is
```python
TrA = 3.4 + 2.2 - 5.3 = 0.3
```

### Row-major and column-major forms

When we write a list of lits we do it _linearly_, number by number separated by commas and square brackets. But a matrix is two-dimensional by its nature and we can go either along its rows or columns.

In the other words there are two options: to write a matrix row by row or column by column.

We will always assume that a matrix is written row by row. This is called row-major form.

An example:
$$
A = \begin{pmatrix}
    2.1 &  3.2 &  1.7 \\ 
    4.1 & -1.9 &  2.2 \\ 
    2.9 & -1.1 & -1.7 
\end{pmatrix}
$$
This matrix in a row-major form reads:

In [None]:
Ar = [[2.1, 3.2, 1.7], [4.1, -1.9, 2.2], [2.9, -1.1, -1.7]]

Also a matrix can be written column by column. This is called a column-major form.

In [None]:
Ac = [[2.1, 4.1, 2.9], [3.2, -1.9, -1.1], [1.7, 2.2, -1.7]]

We will always use a row-major form since it is more natural for human reading and also when we create a NumPy array from a list of lists a row major form is required.

### Getting matrix rows

It is very easy to extract a row from a row-major matrix using list slicing:

In [None]:
A = [[2.1, 3.2, 1.7], [4.1, -1.9, 2.2], [2.9, -1.1, -1.7]]

print(f"row 0: {A[0]}")
print(f"row 1: {A[1]}")
print(f"row 2: {A[2]}")

We can even iterate over rows:

In [None]:
A = [[2.1, 3.2, 1.7], [4.1, -1.9, 2.2], [2.9, -1.1, -1.7]]

for row in A:
    print(row)

To have a row number we use `enumerate`:

In [None]:
A = [[2.1, 3.2, 1.7], [4.1, -1.9, 2.2], [2.9, -1.1, -1.7]]

for i, row in enumerate(A):
    print(f"row {i}: {row}")

Using iteration over matrix rows one can write a function for matrix printing:

In [None]:
def matrix_print(A):
    """
    Printing a matrix A specified as a list of lists
    """
    for row in A:
        for x in row:
            print(f"{x:6.2}", end="")
        print()    

In [None]:
A = [[2.1, 3.2, 1.7], [4.1, -1.9, 2.2], [2.9, -1.1, -1.7]]    
matrix_print(A)

### Getting matrix columns

Column extraction of a row-major matrix requires some efforts.

This matrix will be our test example:
$$
A = \begin{pmatrix}
    2.1 &  3.2 &  1.7 \\ 
    4.1 & -1.9 &  2.2 \\ 
    2.9 & -1.1 & -1.7 
\end{pmatrix}
$$

To get a column we need to iterate over rows and extract a required component from each one:

In [None]:
def matrix_getcol(A, n):
    """
    Returns n-th column of matrix A
    """
    return [row[n] for row in A]

In [None]:
A = [[2.1, 3.2, 1.7], [4.1, -1.9, 2.2], [2.9, -1.1, -1.7]]

print(f"col 0: {matrix_getcol(A, 0)}")
print(f"col 1: {matrix_getcol(A, 1)}")
print(f"col 2: {matrix_getcol(A, 2)}")

### Matrix transposition

A matrix can be flipped over its main diagonal. This operation is called transposition.

In [None]:
def transp(A):
    """
    Matrix transposition
    """
    ncols = len(A[0])
    B = []
    for n in range(ncols):
        col = matrix_getcol(A, n)
        B.append(col)
    return B

In [None]:
A = [[2.1, 3.2, 1.7], [4.1, -1.9, 2.2], [2.9, -1.1, -1.7]]
B = transp(A)

matrix_print(A)
print()
matrix_print(B)

Observe that after the transposition columns become rows and vice versa.

### Matrix -  vector multiplication

Linear equation again:
$$
A \vec x = \vec y
$$
Another point of view to this equation: Given a vector $\vec x$ we apply a matrix 
$A$ to obtain a new vector $\vec y$.

Matrices can be considered as tools for vector transformations: Application of a matrix results in a new vector that can have different length and direction.

Let us recall how we apply a matrix to a vector. This is called matrix - vector multiplication:
$$
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix}
\begin{pmatrix} x_0 \\ x_1 \end{pmatrix}=
\begin{pmatrix} a_{00} x_0 + a_{01} x_1 \\ a_{10} x_0 + a_{11} x_1 \end{pmatrix}=
\begin{pmatrix} y_0 \\ y_1 \end{pmatrix}
$$
- Take the first row of the matrix and the whole vector
- Multiply corresponding components: first by first, second by second and so on
- Sum the results of the multiplications. This is the first resulting component
- Do the same with the second and other rows

Recall the dot product: $\vec a\vec b=a_0 b_0 + a_1 b_1$. In terns of dot product the matrix - vector multiplication reads:
- Take the first row of the matrix and the whole vector
- Find their dot product - this is the first component of the result
- Do the same with the second and other rows

Strictly speaking, we must multiply a matrix by a vector-column. 
```python
vec = [[2], [-3], [4]]
```
But as mentioned above in computations for simplicity we will ignore this and take a vector-row instead:
```python
vec = [2, -3, 4]
```

Here is the function that implements matrix - vector multiplication

In [None]:
def matrix_vecmult(A, vec):
    """
    Matrix - vector multiplication
    """
    return [vector_dot(row, vec) for row in A]

In [None]:
A = [[-30, 10], [40, -20]]
x = [2, 3]

Let us first compute the result "by hands":

In [None]:
y = [0, 0]  # here we reserve a space for y

# Dot product of the first row and the vector
# [-30, 10] . [2, 3]
y[0] = -30 * 2 + 10 * 3

# Second row and the vector
# [40, -20] . [2, 3]
y[1] = 40 * 2 - 20 * 3

print(f"by hands: y={y}")

Now use the function and compare the results

In [None]:
yf = matrix_vecmult(A, x)
print(f"using function: y={yf}")

When a matrix is multiplied by a vector a new vector appears with another length and direction. This is an illustration:

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5,5))

ax.set_xlim([-0.5, 5])
ax.set_ylim([-0.5, 5])

ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')

A = [[-6.8, 3.2], [-2.1, 0.9]]
x = [1.7, 4.5]

y = matrix_vecmult(A, x)

draw_vector(ax, r'$x$', 'C0', x, orig=[0, 0])
draw_vector(ax, r'$y$', 'C1', y, orig=[0, 0])

ax.grid();

### Vector - matrix multiplication (multiplication from the left)

Vector-row is multiplied by a matrix from the left
$$
\begin{pmatrix} x_0 & x_1 \end{pmatrix}
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix}=
\begin{pmatrix} a_{00} x_0 + a_{10} x_1 & a_{01} x_0 + a_{11} x_1 \end{pmatrix}=
\begin{pmatrix} y_0 & y_1 \end{pmatrix}
$$
It obey the same rule: row by column. Now there is one row and multiple columns.

### Rectangular matrices

All matrices above were squared: equal numbers of rows and columns. But they can also be rectangle.

Rectangle matrix changes vector dimension (size):

$$
\begin{pmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \end{pmatrix}
\begin{pmatrix} x_0 \\ x_1 \\ x_2\end{pmatrix}=
\begin{pmatrix} y_0 \\ y_1 \end{pmatrix}
$$

Our function for multiplication still works:

In [None]:
A = [[2.0, 1.0, -5.0], [4.0, 3.0, 1.0]]
x = [-1.0, 2.0, 2.0]
y = matrix_vecmult(A, x)
print(y)

### Matrix - matrix multiplication

A column vector technically is a rectangular matrix with one column. We can append some more columns to it and consider a matrix - matrix multiplication:
$$
A B =
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix}
\begin{pmatrix} b_{00} & b_{01} \\ b_{10} & b_{11} \end{pmatrix}=
\begin{pmatrix} 
    a_{00} b_{00} + a_{01} b_{10} & a_{00} b_{01} + a_{01} b_{11} \\ 
    a_{10} b_{00} + a_{11} b_{10} & a_{10} b_{01} + a_{11} b_{11} 
\end{pmatrix}
$$

Order of matrix - matrix multiplication is important!
$$
A B \neq B A
$$

### Diagonal matrix

A special form of matrix is a diagonal matrix. All its elements are zeros except the main diagonal:
$$
D = 
\begin{pmatrix}
d_{0} & 0 & 0 \\
0 & d_{1} & 0 \\
0 & 0 & d_{2}
\end{pmatrix}
$$

Diagonal matrix can be created from a list of its diagonal components

In [None]:
def matrix_diag(diag):
    """
    Creates diagonal matrix from a list diag
    """
    size = len(diag)
    D = []
    for n, d in enumerate(diag):
        row = [0.0] * size
        row[n] = d
        D.append(row)

    return D

In [None]:
diag = [2.3, 5.4, 6.2]
D = matrix_diag(diag)
matrix_print(D)

### Identity matrix

Special version of a diagonal matrix is an identity matrix. This matrix contains ones along the main diagonal and zeros everywhere else.
$$
I = 
\begin{pmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{pmatrix}
$$

This matrix behave like a number 1 in the arithmetics of numbers: its multiplication by another matrix does not change it.
$$
A I = 
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix} 
\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}=
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix} 
$$

$$
I A = 
\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix}=
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix} 
$$


### Matrix addition and subtraction

Matrices of identical sizes can be added or subtracted componentwise:
$$
A + B =
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix} +
\begin{pmatrix} b_{00} & b_{01} \\ b_{10} & b_{11} \end{pmatrix}=
\begin{pmatrix} 
    a_{00} + b_{00} & a_{01} + b_{01} \\ 
    a_{10} + b_{10} & a_{11} + b_{11}
\end{pmatrix}
$$

$$
A - B =
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix} -
\begin{pmatrix} b_{00} & b_{01} \\ b_{10} & b_{11} \end{pmatrix}=
\begin{pmatrix} 
    a_{00} - b_{00} & a_{01} - b_{01} \\ 
    a_{10} - b_{10} & a_{11} - b_{11}
\end{pmatrix}
$$

Order of matrix addition is not important:
$$
A + B = B + A
$$

### Zero matrix

Matrix of zeros plays a role of 0: its addition or subtraction to another matrix does not change it:

$$
A + Z =
\begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix} +
\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}=
\begin{pmatrix} 
    a_{00} & a_{01} \\ 
    a_{10} & a_{11}
\end{pmatrix}
$$


### Matrix multiplication by a scalar

Matrix multiplication by a scalar is a componentwise operation:
$$
c A =
c \begin{pmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{pmatrix} =
\begin{pmatrix} c a_{00} & c a_{01} \\ c a_{10} & c a_{11} \end{pmatrix}
$$

Order is not important:
$$
cA =Ac
$$

### Vectors and matrices in NumPy

Matrix inversion, identity matrix, zeros, addition, multiplication

In [None]:
import numpy as np

A = np.array([[1.2, 2.3, -3.1], [4.3, -9.1, 2.1], [3.1, 3.7, -4.1]])
B = np.array([[4.3, -1.2, 0.5], [-0.2, 4.5, -6.7], [0.9, -0.9, 1.3]])
C = np.array([[-2.3, 6.5], [0.5, 4.5], [1.7, 3.8]])
x = np.array([-3.3, 1.2, -6.1])
y = np.array([3.1, -4.2, 2.1])

In [None]:
# Add two vectors
print(x + y)

In [None]:
# Vector norms
print(np.linalg.norm(x))          # 2-norm (Euclidean)
print(np.linalg.norm(x, 2))       # also 2-norm
print(np.linalg.norm(x, 1))       # 1-norm
print(np.linalg.norm(x, np.inf))  # inf-norm

In [None]:
# Distance between two points
print(np.linalg.norm(x - y)) # Euclidean distance (2-norm)

In [None]:
# Multiplication my a scalar
x1 = 2 * x
print(x1)
print(np.linalg.norm(x))
print(np.linalg.norm(x1))

In [None]:
# Normalization
x2 = x / np.linalg.norm(x)
print(x2)
print(np.linalg.norm(x2))

In [None]:
# Dot product
print(x @ y)

In [None]:
# Extracting matrix rows and columns
print(A)
print()
print(f"row 1: {A[1, :]}")
print(f"col 0: {A[:, 0]}")

In [None]:
# Transposition
print(A)
print()
print(A.T)

In [None]:
# Matrix - vector multiplication
print(A @ x)

In [None]:
# Vector - matrix multiplication (multiplication from the left)
print(x @ A)

In [None]:
# Matrix - matrix multiplication
print(A)
print()
print(C)
print()
print(A @ C)

In [None]:
# Diagonal matrix
print(np.diag(x))

In [None]:
# Identity matrix 5x5
print(np.eye(5))

In [None]:
# Matrix addition and subtraction
print(A)
print()
print(B)
print()
print(A + B)
print()
print(A - B)

In [None]:
# Zero matrix 4x4. Observe that the size is specified as a tuple
print(np.zeros((4, 4)))

In [None]:
# Multiplication by a scalar
print(A)
print()
print(10 * A)

### Tensors

Vector is one dimensional array. It means one index is required to get a component. 

In [None]:
import numpy as np

# Vector
v = np.array([3.4, 3.2, 5.6])
print(v[1])

Matrix is two dimensional since two indexes are needed:

In [None]:
m = np.array([[3.2, -1.2], [4.3, -9.1]])
print(m[0, 1])

Higher dimensional arrays are called tensors. In general all array-like objects are tensors. Its dimension is called the order, degree or rank. 

Vector is a tensor of an order 1. Matrix is a tensor of an order 2. Here is an example of a tensor of an order 4.

In [None]:
d = np.arange(600).reshape(10, 5, 4, 3)
print(d[0,1,2,1])

Tensors are employed to feed the data processing models. 

Data tensors shapes:
- Order 2: Each entity is described by a one-dimensional vector. A butch of such vectors is order 2 tensor.
- Order 3: Each entity is a black and withe figure described by a matrix. A butch of figures is order 3 tensor.
- Order 4: Each entity is a color figure described by a three dimensional array. A butch of color figures is order 4 tensor.

### Vectors and matrices for data science

As already mentioned above in data science vectors are used to store entity descriptions. Vector components are entity features.

In general the purpose of data science is processing of data vectors to draw some conclusions. 

Matrices modify vectors. Thus they are the main tool for data vectors analysis. 

Matrices are responsible for linear transformations. But in most cases this is not enough and additional nonlinear transformations are needed.

### Exercises

1\. Create a function that computes a matrix trace: $\textrm{Tr} A$.

2\. Create functions that perform matrix - matrix addition and subtraction: $C = A \pm B$.

3\. Create a function that performs left vector - matrix multiplication: $\vec x A = \vec y$.

4\. Create a function that performs multiplication of a matrix by a scalar: $B = c A$.

5\. Create a function that performs matrix - matrix multiplication: $C = A B$

6\. Create a function that generates an identity matrix of a given size. You can not solve the problem just by calling the defined above function `matrix_diag`.

7\. Create functions that perform left and right multiplication of a matrix by a diagonal matrix.

8\. Create a function that generates a zero matrix. You can not solve the problem just by calling the defined above function `matrix_diag`.