<a href="https://colab.research.google.com/github/rares985/machine-learning/blob/master/Notations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear algebra notations

The following notations will be used throughout the notebook collection:



## Matrices


## Vectors


### Notation
A vector is a collection of $ n $ numbers, placed in a column format. This is also reffered to as a *column vector*.
Column vectors will be marked with a **bold lowercase letter**(either Latin or Greek), as follows:
<br><br>
$$
\mathbf{x} =
\begin{bmatrix}
    x_1 \\
    x_2 \\
    \vdots \\
    x_n \\
\end{bmatrix}
\in \mathbb{R}^{n\times1}
$$
<br><br>
If we want to represent a *row vector*, the vector will be noted as:
<br><br>
$$ 
\mathbf{x}^{T}=
\begin{bmatrix}
    x_1 & x_2 & \dots & x_n 
\end{bmatrix}
\in \mathbb{R}^{1\times n}
$$

### Examples

Let's take for example the vector which contains the first 4 even numbers:

$$
\mathbf{x} = 
\begin{bmatrix}
2 \\
4 \\
6 \\
8
\end{bmatrix}
\in \mathbb{R}^{4 \times 1}
$$
<br>
This is a *column* vector, and the normal way we are going to write vectors. Whenever you see the word **vector**, you should think of its components placed in this format. Let's also write a *row* vector:
<br><br>
$$
\mathbf{y^T}=
\begin{bmatrix}
1 & 3 & 5 & 7
\end{bmatrix}
\in \mathbb{R}^{1 \times 4}
$$

## Matrices



### Notation
A matrix is a collection of $ m \times n $ numbers. Matrices are usually marked with an **uppercase symbol** (i.e $ A $, $ X $):
<br><br>
$$ 
X=
\begin{bmatrix}
    x_{11} & x_{12} & \dots & x_{1n} \\
    x_{21} & x_{22} & \dots & x_{2n} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & \dots & x_{mn} \\
\end{bmatrix}
\in \mathbb{R}^{m\times n}
$$.

We can see that the element on the $ i $ row and $ j $ column will be marked with the lower-case letter of the matrix, with indices $ i $ and $ j $. In our case $ a_{ij} $. 
Sometimes, it is useful to write matrices either as a *row vector* of *column vectors*,
<br><br>
$$
X=
\begin{bmatrix}
    \mathbf{x}_{1} & \mathbf{x}_{2} & \dots & \mathbf{x}_{n}
\end{bmatrix}
=
\begin{bmatrix}
    \vdots & \vdots & \dots & \vdots \\
    \mathbf{x}_{1} & \mathbf{x}_{2} & \dots & \mathbf{x}_{n} \\
    \vdots & \vdots & \dots & \vdots \\
\end{bmatrix}
$$
<br><br>
or as a *column vector* of *row vectors*:
<br><br>
$$
X=
\begin{bmatrix}
    \mathbf{x}_{1}^{T} \\
    \mathbf{x}_{2}^{T} \\
    \vdots \\
    \mathbf{x}_{m}^{T}
\end{bmatrix}=
\begin{bmatrix}
    \dots & \mathbf{x}_{1}^{T} & \dots \\
    \dots & \mathbf{x}_{2}^{T} & \dots \\
    \vdots & \vdots & \vdots \\
    \dots & \mathbf{x}_{m}^{T} & \dots
\end{bmatrix}
$$

### Examples

Let's take a $ 3 \times 2 $ matrix, containing the first 6 even numbers.

$$
A=
\begin{bmatrix}
2 & 4 \\
6 & 8 \\
10 & 12
\end{bmatrix}
\in \mathbb{R}^{3 \times 2}
$$
<br><br>
Using the aforementioned notations, let's write a column vector:
<br><br>
$$
\mathbf{a}_{1}=
\begin{bmatrix}
2 \\
6 \\
10
\end{bmatrix}
\in \mathbb{R}^{3 \times 1}
$$
<br><br>
and a row vector:
<br><br>
$$
\mathbf{a}_{1}^{T}=
\begin{bmatrix}
2 & 4
\end{bmatrix}
\in \mathbb{R}^{1 \times 2}
$$
<br><br>


⚠️ ***We can see pretty quick that when those two notations are used together in the same equation, it is very easy to get confused. We will explain a workaround notation for that later.***



# Machine learning notations





## Usual letters
Here we are going to have some sort of glossary which explains what each letter usually denotes in a machine learning problem.

|Notation|Explanation|
|--------|-----------|
| $ m $  |    Number of training samples                            |
| $ n $  | Number of input features (dimensions of training sample) |
| $ X $  | The dataset |
| $ y $  | The output (target) variable |
| $ \theta $ | The weights of the model |
| $ \mathbb{w}$ | Same as $\theta$ - weights of model |


### Training examples
As mentioned, if we have a look at the matrix written as a row of columns or as a column of rows, we note that $\mathbf{x}_{1}^{T}$ and $\mathbf{x}_{1}$ are **different** vectors with **different elements**:

$$
\mathbf{x}_{1}^{T} = 
\begin{bmatrix}
x_{11} & x_{12} \dots x_{1n}
\end{bmatrix}
\in \mathbb{R}^{1 \times n}
$$

<br><br>
which represents the first **row** of the matrix, where as $\mathbf{x}_{1}$ is:
<br><br>

$$
\mathbf{x}_{1}^{T} = 
\begin{bmatrix}
x_{11} \\
x_{21} \\
\vdots \\
x_{m1}
\end{bmatrix}
\in \mathbb{R}^{m \times 1}
$$

<br><br>
The two vectors not only are of *different* sizes, but the elements are different since: $ x_{ji} \neq x_{ij} $. To avoid this confusion, the usual notations in machine learning are as follows:

<br><br>
#### Row vectors
For the rows, instead of the $\mathbf{x}_{i}^{T}$ notation, the $\mathbf{x}^{(i)^{T}}$ is used, to denote the $i^{th}$ row (training example) in our dataset. As an example:
$$
\mathbf{x}^{(i)^{T}}=
\begin{bmatrix}
x_{1}^{(i)} &
x_{2}^{(i)} &
\dots &
x_{n}^{(i)}
\end{bmatrix}
\in \mathbb{R}^{1 \times n}
$$

#### Column vectors
For the columns, the normal $\mathbf{x}_{j}$ notation is used.

$$
\mathbf{x}_{j}=
\begin{bmatrix}
x_{j}^{(1)} \\
x_{j}^{(2)} \\
\vdots \\
x_{j}^{(m)} \\
\end{bmatrix}
\in \mathbb{R}^{m \times 1}
$$


### Training dataset
Usually, the training dataset is denoted as $ X $, or $X_{train}$ in code. Using the notations explained before, the matrix $ X $ can be written in any of the following forms:
<br><br>
$$
X=
\begin{bmatrix}
    \mathbf{x}^{(1)^{T}} \\
    \mathbf{x}^{(2)^{T}} \\
    \vdots \\
    \mathbf{x}^{(m)^{T}}
\end{bmatrix}=
\begin{bmatrix}
    \mathbf{x}_{1} &
    \mathbf{x}_{2} &
    \dots &
    \mathbf{x}_{n}
\end{bmatrix}=
\begin{bmatrix}
    x_{1}^{(1)} & x_{2}^{(1)} & \dots & x_{n}^{(1)} \\
    x_{1}^{(2)} & x_{2}^{(2)} & \dots & x_{n}^{(2)} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{1}^{(m)} & x_{2}^{(m)} & \dots & x_{n}^{(m)} \\
\end{bmatrix}
\in \mathbb{R}^{m \times n}
$$