In [1]:
import matplotlib.pyplot as plt

# Linear Algebra

Linear algebra is the study of vectors and rules to manipulate vectors. Vectors can be added together and multipled by scalars to produce another vector of the same kind.

Scalar: a value with only magnitude, this is not a vector.

#### Vector Examples

##### Geometric Vectors
These are directed segments and can be drawn in at least two dimensions. Usually define with
an arrow e.g. $\overrightarrow{x}$. Two geometric vectors can be added to make a new vector
e.g. $\overrightarrow{x} + \overrightarrow{y} =  \overrightarrow{z}$. These can also be manipulated by a
scalar e.g. $\lambda\overrightarrow{x}$, $\lambda\in{\rm I\!R}$ is also a geometric vector. In this case the vector is
scaled by $\lambda$. Interpreting vectors as geometric enables us to infer direction and magnitude.

![image.png](attachment:e4142754-1191-47b3-95bd-3a93adc9159e.png)

_Geometric Vectors_

##### Elements of ${\rm I\!R}^n$
Elements of ${\rm I\!R}^n$ (tuples of real numbers) are vectors too e.g.
\begin{equation}
a = \begin{bmatrix}
1\\
2\\
3
\end{bmatrix}
\in {\rm I\!R}^3
\end{equation}
above what we have is a triplet of real numbers. Adding two vectors component wise ($a,b \in{\rm I\!R}^n$) will result in a new
vector $a+b=c \in {\rm I\!R}^n$.If $a$ multiplied by a scalar $a \in {\rm I\!R}^n$ by $\lambda\in{\rm I\!R}^n$ results in a scaled vector
$\lambda a \in {\rm I\!R}^n$.

The underlying properites of ML come from something called "vector space". The vector space is defined as a set of vectors that can result by starting with a small set of vectors, and adding them to each other and scaling them etc...

## Systems of Linear Equations

- A System of Linear Equations is when we have two or more linear equations working together
    - Together they are a system of linear equations.
- A collection of one or more linear equations involving the same set of variables
- If a system has tow linear equations with variables $x$, $x_2$ each equation defines a line on the $x$, $x_2$ plane.
    - The solution to linear equations must satisfy all equations simultaneously the soluion set is the intersection of these lines
    - Intersection set can be a line, a point or empty (lines are parallel)

![image.png](attachment:819b6dad-bef9-4d72-a39c-170fb9e8ca86.png)

_The solution space of a system of two linear equations with two varialbes can be geomettrically interpreted as the intersection of two lines. Every linear equation represents a line_

For 3 variables in 3D space the intersect of all 3 planes is 3D space satisfies all 3 equations simultaneously.

### Matricies

Definition: $m,n \in \mathbb{N}$

\begin{equation}
A = \begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n}\\
a_{21} & a_{22} & \cdots & a_{2n}\\
\vdots & \vdots &  & \vdots\\
a_{m1} & a_{m2} & \cdots & a_{mn}\\
\end{bmatrix}
, a_{ij} \in {\rm I\!R}
\end{equation}

$A$ is a matrix ordered in a rectangular scheme consisting of $m$ rows and $n$ columns
- $(1, n) = rows$
- $(m, 1) = columns$

${\rm I\!R}^{m\times n}$ is a set of real-valued (m,n)-matrices. $A \in {\rm I\!R}^{m \times n}$ can also be expessed as $a \in {\rm I\!R}^{mn}$ by stacking all $n$ columns of the matrix into a long vector:

![image.png](attachment:5cacdf63-ea06-40f4-93f2-673675022869.png)

_By stacking its columns, a matrix $A$ can be represented as a long vector $a$_

#### Matrix Multiplication & Addition

Sum of two matrices $A \in \mathbb{R}^{m \times n}$, $B \in \mathbb{R}^{m \times n}$ is defined as elementwise sum:

\begin{equation}
A+B := \begin{bmatrix}
a_{11} & b_{11} & \cdots & a_{1n} + b_{1n}  \\
\vdots \\
a_{m1} & b_{m1} & \cdots & a_{mn} + b_{mn}  \\
\end{bmatrix} \in \mathbb{R}^{m \times n}
\end{equation}

For matrices $A \in \mathbb{R}^{m \times n}$, $B \in \mathbb{R}^{m \times n}$, the elements $c_{ij}$ of the product $C=AB \in \mathbb{R}^{m \times k}$ are computed as: 

\begin{equation}
C_{ij} = \sum_{l=1}^{n} a_{il}b_{lj},\ \text{i=1,...,j=1,k}
\end{equation}

To compute element $c_{ij}$ we multiply the elements of the ith
row of A with the jth column of B and sum them up. Also known as the "dot product". In cases, where we need to be explicit that we are performing multiplication, we use the notation $A \cdot B$.

Dot product between two vectors can also be shown as $a^{\top} b$ or $\langle a,b\rangle$

Matrices can only be multiplied if their "neighboring" dimensions match. For instance, an $n \times k$-$matrix$ A can be multiplied with a $k \times m$-$matrix B$, but only from the left side:

\begin{equation}
\underbrace{A}_{n \times k}\ \underbrace{B}_{k \times m} = \underbrace{C}_{n \times m}
\end{equation}

The product of $BA$ is not defined is $m \neq n$ since the neighbouring dimensions do not match

![Screenshot 2021-08-02 at 11.08.54.png](attachment:ab858c8b-76de-4197-8812-afa3fa6a0f7a.png)

**Identity Matrix**

The identity matrix as the $n \times n$-$matrix$ containing $1$ on the diagonal and $0$ everywhere else


\begin{equation}
I_n := \begin{bmatrix}
1 & 0 & \cdots & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 0 & \cdots & 1 \\
\end{bmatrix}
\end{equation}

Now that we defined matrix multiplication, matrix addition and the identity matrix, let us have a look at some properties of matrices:

![Screenshot 2021-08-02 at 11.38.45.png](attachment:b7b0b07a-f012-4458-85f2-8f61da3908ae.png)

- One of the biggest differences between real number multiplication and matrix multiplication is that matrix multiplication is not commutative. In other words, in matrix multiplication, the order in which two matrices are multiplied matters!
- **Associative**: This property states that you can change the grouping surrounding matrix multiplication. For example, you can multiply matrix A by matrix B, and then multiply the result by matrix C, or you can multiply matrix B by matrix C, and then multiply the result by matrix A. When using this property, be sure to pay attention to the order in which the matrices are multiplied, since we know that the commutative property does not hold for matrix multiplication!
- **Distributive**: We can distribute matrices in much the same way we distribute real numbers. If a matrix A is distributed from the left side, be sure that each product in the resulting sum has A on the left! Similarly, if a matrix A is distributed from the right side, be sure that each product in the resulting sum has A on the right!
- **Identity**: The multiplicative identity property states that the product of any $n \times n$ matrix A and $I_n$ is always A, regardless of the order in which the multiplication was performed: $A \cdot I = I \cdot A = A$

#### Multiplication by a Scalar

- Scalar: $\lambda \in \mathbb{R}$
- Matrix: $A \in \mathbb{R}^{m \times n}$

- $\lambda A = K$
- $K_{ij} = \lambda A_{ij}$

Practically $\lambda$ scales each element of $A$. Associativity and distributivity also holds when multiplying by a scalar.

### Vector Spaces

#### Group

Set of elements and an operation defined on these elements keeps some structure of the set intact

- **Associativity**: For all $a, b, c$ in $G$ one has $(a \cdot b) \cdot c = a \cdot (b \cdot c)$
- **Inverse element**:
    - For each $a$ in $G$ there exists an element $b$ in $G$ such that $a \cdot b = e$ and $b \cdot a = e$ where $e$ is the identity element
    - For each $a$, the element $b$ is unique; it is called the inverse of $a$

#### Vector Spaces

A real valued vector space $V = (\nu, +, \cdot)$ is set $\nu$ with two operations:
- $+$ : $\nu \times \nu \rightarrow \nu$
- $-$ : $\mathbb{R} \times \nu \rightarrow \nu$

$\nu, +$ is an Abelian group: is a group in which the result of applying the group operation to two group elements does not depend on the order in which they are written. That is, the group operation is commutative. With addition as an operation, the integers and the real numbers form abelian groups, and the concept of an abelian group may be viewed as a generalization of these examples

- Distributivity of vector sums:
    - $r(X+Y)=rX+rY$
- Associativity of scalar multiplication:
    - $r(sX)=(rs)X$

### Vector Subspaces

- These are sets contained in the original vector space
- When we perform operations on elements within this subspace we will never leave it
- In a sense they are closed

**Definition**
- $V = (\nu, +, \cdot)$ be a vector space
- $u \subseteq \nu$ this means $u$ is a subset of $V$
- $u \neq \emptyset$ cannot be empty set
- Therefore $U = (u, +, \cdot)$ is called a vector space of $V$ (or linear space)

If $U$ is a vector space with the vector space operations + and $\cdot$ restricted to $u \times u$ and $\mathbb{R} \times u$. We write $U \subseteq V$ to denote a subspace of $U$ of $V$

If $u \subseteq \nu$ and $V$ is a vector subspace. $U$ inherits properties from $V$ because they hold for all of $x \in \nu$. To determine whether ($u, +, \cdot$) is a subspace of $V$ we still do need to show:

1. $U \neq \emptyset$, in particular $0 \in u$
2. Closure of $U$:
    - with respect to the outer operation: $\forall \lambda \in \mathbb{R} \forall x \in u$: $\lambda x \in u$
    - with respect to the inner operation: $\forall x, y \in u$: $x + y \in u$

### Linear Independence

**Linear Combination**: Consider a vector $V$ and a finite number of vectors $x_1, \cdots, x_k \in V$ then every $\nu \in V$ of the form:

\begin{equation}
\nu = \lambda_1 x_1 + \cdots + \lambda_k x_k = \sum^k_{i=1} \lambda_i x_i \in V
\end{equation}

with $\lambda_1, \cdots, \lambda_k \in \mathbb{R}$ is a linear combination of vectors


**Linear Independence**
- vector space $V$
- $k \in \mathbb{N}$
- $x_1, \cdots, x_n \in V$

Linear independence is one of the most important concepts in linear algebra. Intuitively, a set of linearly independent vectors consists of vectors that have no redundancy, i.e., if we remove any of those vectors from the set, we will lose something.

**Example**

Person in Nairobi describes where Kigoli is. Can be described as:
- Go 506km Northwest to Kampala
- Then 374km SouthWest

Another person may say "it is 751km West of hear".

Although the last statement is true, it is not needed to find Kigali given previous information.

<img src="attachment:3f81e6db-35ae-4b94-83f9-693ddeccd15c.png" width="500"/>

The 506km and 374km vectors are linearly independent. This means the southwest vector cannot be described in terms of the Northwest vector and vice verca. The 751km vector is a linear combination of the two other vectors and makes the set of vecotrs linearly dependent. Equivalently the 751 vector and the 374 vector can be combined to obtain the 506 vector.

The following properties are useful to find out whether vectors are linearly independent:
- $k$ vectors are either linearly dependent or linearly independent. There is no third option.
- If at least one of the vectors $x_1, \cdots, x_k$ is 0 then they are linearly dependent. The same holds is two vectors are identical.
- Vectors $\{x_1, \cdots, x_k : x_i \neq 0, i = 1, \cdots, k \}, k \geqslant 2$ are linearly dependent if and only if (at least) one of them is a linear combinations of the others e.g. if one vector is a multiple of another i.e. $x_i = \lambda x_j, \lambda \in \mathbb{R}$

### Affine Subspaces

One-dimensional affine subspaces are called lines and can be written as $y = x_0 + \lambda b_1$, where $\lambda \in \mathbb{R}$ and $U = span[b_1] \subseteq \mathbb{R}^2$ is one-dimensional subspace of $\mathbb{R}^n$. This means that a line is defined by a support point $x_0$ and a vector $b_1$ that defines the direction.

<img src="attachment:fdb8a597-70cd-4660-8340-5615a6eb462b.png" width="400"/>

Lines are affine subspaces. Vector $y$ on a line $x_0 + \lambda b_1$ lie in an affine subspace $L$ with support point $x_0$ and direction $b_1$

Two-dimensional affine subspaces of $\mathbb{R}^n$ are called planes. The parametric equation for planes is $y= x_0 + \lambda_1 b_1 + \lambda_2 b_2$, where $\lambda_1, \lambda_2 \in \mathbb{R}$ and $U = span[b_1, b_2] \subseteq \mathbb{R}^n$. This means that a plane is defined by a support point $x_0$ and two linearly independent vectors $b_1, b_2$ that span the direction space.

In $\mathbb{R}^n$, the (n-1)-dimensional affine subspaces are called hyperplanes and the corresponding parametric equation is $y = x_0 + \sum_{i=1}^{n-1} \lambda_i b_i$ where $b_1, \cdots, b_{n-1}$ form a basis of an (n-1)-dimensional subspace $U$ of $\mathbb{R}^n$. This means that a hyperplane is defined by a support point $x_0$ and (n-1) linearly independent vectors $b_1, \cdots, b_{n-1}$ that span the direction space. In $\mathbb{R}^2$, a line is also a hyperplane. In $\mathbb{R}^3$, a plane is also a hyperplane