# Some Linear Algebra

To understand how data input into a Neural Network gets transformed, we will need the definition of a Linear Transformation from a vectorspace into another. 

Let $\mathbb{R}$ be the set of real numbers. Our discussion will be limited to vector spaces $\mathbb{R^n}$  over $\mathbb{R}$; it may be easy to follow the contents below if you apply them to vectorspaces of $\mathbb{R^2}$ or $\mathbb{R^3}$ over  $\mathbb{R}$.

#### *Definition: Vector Space* 
($V$, $\mathbb{R}$, $+$, $.$) where $V$ is a set and $\mathbb{R}$ is the set of real numbers, is said to be a vector space with a vector addition operation $+$' and scalar multiplication '$.$', if it satisfies the following axioms: 
<br>(In the axioms stated below, let us write $a.v$ as $av$ when $a\in\mathbb{R}$ and $v \in V$ )
- $u, v \in V \implies$ &nbsp;  $u+v$ $\in$ $V$ (Closure under addition)
- $u+v$ = $v+u$ &emsp; $\forall u,v \in V$ (Commutativity of addition)
- $u + (v+w) = (u+v)+w$ &emsp; $\forall u,v,w \in V$ (Associativity of addition)
- $\exists$ a zero vector $0 \in V$  such that $v+0=v$    $\forall v \in V$ (Existense of zero vector)
- $\forall v \in V$  $\exists$ an additive inverse, $-v$ such that $v + (-v) = 0$  (Existense of additive inverse)
- $a \in \mathbb{R}, v \in V$  $\implies$  $av \in V$  (Closure under scalar multiplication)
- $a, b \in \mathbb{R}, v \in V$  $\implies$  $a(bv) = (ab)v$  (Associativity of scalar multiplication)
- $a(u+v) = au + av$ $\forall a \in \mathbb{R}, u,v \in V$ (Distributivity of vector addition )
- $(a+b)v = av + bv =$ $\forall a,b \in \mathbb{R}, v \in V$ (Distributivity of scalar addition)
- $1v = v$ &emsp; $\forall v \in V$ (Multiplicative identity)



$T$ is said to be a linear transformation from 

To know more, consider reading a textbook on Linear Algebra at an undergrad level. Some suggestions are are below:
- https://open.umn.edu/opentextbooks/textbooks/5?form=MG0AV3
- https://understandinglinearalgebra.org/home.html?form=MG0AV3 

##### Examples of Vector spaces:

- ($\mathbb{R}^2$, $\mathbb{R}$, $+$, $.$) is a vector space. Here the set of vectors is the set of all ordered pairs of real numbers, often visualized as points in a 2D plane. This vector space is of dimension 2. 

- ($\mathbb{R}^3$, $\mathbb{R}$, $+$, $.$) is the vector space of all ordered triples of real numbers, often visualized as points in a 3D plane. This vector space is of dimension 3.

- ($\mathbb{R}^n$, $\mathbb{R}$, $+$, $.$) is a vector space of dimension $n$.  Going forward, let us write ($\mathbb{R}^n$, $\mathbb{R}$, $+$, $.$)  as $\mathbb{R}^n$

- In [**Attention is All You Need**](https://arxiv.org/abs/1706.03762) paper, tokens in the vocabulary are represented by vectors in the 512 dimensional space $\mathbb{R}^{512}$.

### Linear Transformations

A Linear Transformation is a function from a vector space $V$ to another vector space $W$ that preserves vector addition and scalar multiplication. 

#### Definition: Linear Transformation
A function $T : V \to W$ is a linear transformation if:
- $\forall u,v \in V$,  &emsp; $ T(u + v) = T(u) + T(v) $
- For any $v \in V$ and scalar $c \in \mathbb{R}$, $T(cv) = cT(v)$

#### Example

We will be interested in linear transformations from $\mathbb{R^m}$ to $\mathbb{R^n}$ for some $m$ and $n$

Let us represent vectors in $\mathbb{R^n}$ as column matrices of dimension nx1


$v =
\begin{pmatrix}
v_1 \\
v_2 \\
... \\
v_n
\end{pmatrix} \in \mathbb{R^n}
$

Let $T : \mathbb{R^n} \to \mathbb{R^m}$ be defined as follows:  $$T(v) = Av \in \mathbb{R^m}$$
where 
$$ A = \begin{pmatrix}
a_{11} & a_{12} & a_{13} & ... & a_{1n}\\
a_{21} & a_{22} & a_{23} & ... & a_{2n}\\
.\\.\\. \\
a_{m1} & a_{m2} & a_{m3} & ... & a_{mn}
\end{pmatrix} \in \mathbb{R^n}
$$

- Geometrically, rotations, reflections, scaling and shearing are all linear transformations. 

- Projections followed by any of the above transformations are also linear transformations

- In neural networks, weights form the entries of matrices associated with linear transformations. Computations at each layer comprise of a linear transformation,followed by an addition of a bias term, and that is followed by an application of softmax function.  See [003_Multi Layer Perceptrons.](<./003_Multi Layer Perceptrons.ipynb>). 