## Sets

- Sets are one of the most fundamental concepts in mathematics. 
- **Sets are well-defined collections of objects**. 
- Such objects are called **elements or members** of the set. 
- In the context of linear algebra, we say that a line is a set of points, and the set of all lines in the plane is a set of sets. Similarly, we can say that *vectors* are sets of points, and *matrices* sets of vectors.

## Belonging and inclusion

We build sets using the notion of **`belonging`**. We denote that $a$ *belongs* (or is an *element* or *member* of) to $\textit{A}$ with the Greek letter epsilon as:

$$
a \in \textit{A}
$$

Another important idea is **`inclusion`**, which allow us to build *subsets*. Consider sets $\textit{A}$ and $\textit{B}$. When every element of $\textit{A}$ is an element of $\textit{B}$, we say that $\textit{A}$ is a *subset* of $\textit{B}$, or that $\textit{B}$ *includes* $\textit{A}$. The notation is:

$$
\textit{A} \subset \textit{B}
$$

or

$$
\textit{B} \supset \textit{A}
$$


## Set specification

In general, anything we assert about the elements of a set results in **generating a subset**. In other words, asserting things about sets is a way to manufacture subsets. Take as an example the set of all dogs, that I'll denote as $\textit{D}$. I can assert now "$d$ is black". This is denoted as:
$$
\textit{B} = \{ d \in \textit{D} : \text{d is black} \}
$$

or 

$$
\textit{B} = \{ d \in \textit{D} \vert \text{ d is black} \}
$$

The colon ($:$) or vertical bar ($\vert$) read as "such that". Therefore, we can read the above expression as: *all elements of $d$ in $\textit{D}$ such that $d$ is black*. And that's how we obtain the set $\textit{B}$ from $\textit{A}$. 
 

## Ordered pairs 

Consider a pair of sets $\textit{x}$ and $\textit{y}$. An **`unordered pair`** is a set whose elements are $\{ \textit{x},\textit{y} \}$, and $\{ \textit{x},\textit{y} \} = \{ \textit{y},\textit{x} \} $. Therefore, presentation order does not matter, the set is the same.

In machine learning, we usually do care about presentation order. For this, we need to define an **`ordered pair`**. An **`ordered pair`** is denoted as $( \textit{x},\textit{y} )$, with $\textit{x}$ as the *first coordinate* and $\textit{y}$ as the *second coordinate*. A valid ordered pair has the property that $( \textit{x},\textit{y} ) \ne ( \textit{y},\textit{x} )$.

## Relations

From ordered pairs, we can derive the idea of **`relations`** among sets or between elements and sets.   
Relations can be binary, ternary, quaternary, or N-ary. 
In set theory, **relations** are defined as *sets of ordered pairs*, and denoted as $\textit{R}$. Hence, we can express the relation between $\textit{x}$ and $\textit{y}$ as:

$$
\textit{x R y}
$$

Further, for any $\textit{z} \in \textit{R}$, there exist $\textit{x}$ and $\textit{y}$ such that $\textit{z} = (\textit{x}, \textit{y})$. 

## Domain and Range

From the definition of $\textit{R}$, we can obtain the notions of **`domain`** and **`range`**. The **`domain`** is a set defined as:

$$
\text{dom } \textit{R} = \{ \textit{x:  for some y } ( \textit{x R y)} \}
$$

This reads as: the values of $\textit{x}$ such that for at least one element of $\textit{y}$, $\textit{x}$ has a relation with $\textit{y}$. 

The **`range`** is a set defined as:

$$
\text{ran } \textit{R} = \{ \textit{y:  for some x } ( \textit{x R y)} \}
$$

This reads: the set formed by the values of $\text{y}$ such that at least one element of $\textit{x}$, $\textit{x}$ has a relation with $\textit{y}$. 

## Functions

Consider a pair of sets $\textit{X}$ and $\textit{Y}$. We say that a **`function`** from $\textit{X}$ to $\textit{Y}$ is relation such that:

- $dom \textit{ f} = \textit{X}$ and
- such that for each $\textit{x} \in \textit{X}$ there is a unique element of  $\textit{y} \in \textit{Y}$ with $(\textit{x}, \textit{y}) \in {f}$ 

More informally, we say that a function "*transform*" or "*maps*" or "*sends*" $\textit{x}$ onto $\textit{y}$, and for each "*argument*" $\textit{x}$ there is a unique value $\textit{y}$ that $\textit{f }$ "*assumes*" or "*takes*".

We typically denote a relation or function or transformation or mapping from X onto Y as:

$$
\textit{f}: \textit{X} \rightarrow \textit{Y}
$$
or
$$
\textit{f}(\textit{x}) = \textit{y} 
$$



## Functions

In the figure, the left-pane shows a valid function, i.e., each value $\textit{f}(\textit{x})$ *maps* uniquely onto one value of $\textit{y}$. The right-pane is not a function, since each value $\textit{f}(\textit{x})$ *maps* onto multiple values of $\textit{y}$.

<center>
<img src="./images/b-function.svg" width="1000" height="700">
<center/>

## Goal of Machine Learning

**The ultimate goal of machine learning is learning functions from data**, i.e., transformations or mappings from the *domain* onto the *range* of a function.   
This may sound simplistic, but it's true.   
The *domain* $\textit{X}$ is usually a vector (or set) of *`variables`* or *`features`* mapping onto a vector of *`target`* values.   
`In machine learning the words transformation and mapping are used interchangeably, but both just mean function.`

## IMPORT LIBRARIES

In [3]:
# Libraries for this section 
import numpy as np
import pandas as pd
import altair as alt
alt.themes.enable('dark')

ThemeRegistry.enable('dark')

# Vectors

Linear algebra is the study of vectors. At the most general level, vectors are **`ordered finite lists of numbers`**.   
Vectors are the most fundamental mathematical object in machine learning.   
We use them to **`represent attributes of entities`**: age, sex, test scores, etc.   
We represent vectors by a bold lower-case letter like $\bf{v}$ or as a lower-case letter with an arrow on top like $\vec{v}$.    
Vectors are a type of mathematical object that can be **`added together`** and/or **`multiplied by a number`** to obtain another object of **`the same kind`**.   
For instance, if we have a vector $\bf{x} = \text{age}$ and a second vector $\bf{y} = \text{weight}$, we can add them together and obtain a third vector $\bf{z} = x + y$. We can also multiply $2 \times \bf{x}$ to obtain $2\bf{x}$, again, a vector.   
This is what we mean by *the same kind*: the returning object is still a *vector*. 

## Types of vectors

Vectors come in three flavors:   
(1) **`geometric vectors`**   
(2) **`polynomials`**  
(3) and **elements of $\mathbb{R^n}$ space**.  

### Geometric vectors

**`Geometric vectors are oriented segments`**.   
Many linear algebra concepts come from the geometric point of view of vectors: `space, plane, distance,` etc.

## 

<center> Fig. 2: Geometric vectors <center/>

<center>
<img src="./images/b-geometric-vectors.svg">
<center/>

## Polynomials

**`A polynomial is an expression` like $f(x) = x^2 + y + 1$**. This is, a expression adding multiple "`terms`" (nomials). Polynomials are vectors because they meet the definition of a vector: they can be added together to get another polynomial, and they can be multiplied together to get another polynomial. 

$$
\text{function addition is valid} \\
f(x) + g(x)\\
$$
$$
and\\
$$
$$
\text{multiplying by a scalar is valid} \\
5 \times f(x)
$$

## Polynomials 

<center> Fig. 3: Polynomials <center/>

<center>
<img src="./images/b-polynomials-vectors.svg">
<center/>

## Elements of $\mathbb{R}^n$

**Elements of $\mathbb{R}^n$ are sets of real numbers**. This type of representation is arguably the most important for applied machine learning. It is how data is commonly represented in computers to build machine learning models. For instance, a vector in $\mathbb{R}^3$ takes the shape of:

$$
\bf{x}=
\begin{bmatrix}
x_1 \\
x_2 \\
x_3
\end{bmatrix}
\in \mathbb{R}^3
$$


## Polynomial Operations {.smaller}

$$
\text{addition is valid} \\
\phantom{space}\\
\begin{bmatrix}
1 \\
2 \\
3
\end{bmatrix} +
\begin{bmatrix}
1 \\
2 \\
3
\end{bmatrix}=
\begin{bmatrix}
2 \\
4 \\
6
\end{bmatrix}\\
$$
$$
and\\
$$
$$
\text{multiplying by a scalar is valid} \\
\phantom{space}\\
5 \times
\begin{bmatrix}
1 \\
2 \\
3
\end{bmatrix} = 
\begin{bmatrix}
5 \\
10 \\
15
\end{bmatrix}
$$

## Numpy Arrays 

In `NumPy` vectors are represented as n-dimensional arrays. To create a vector in $\mathbb{R^3}$:

In [2]:
x = np.array([[1],
              [2],
              [3]])

We can inspect the vector shape by:

In [3]:
x.shape # (3 dimensions, 1 element on each)

(3, 1)

In [4]:
print(f'A 3-dimensional vector:\n{x}')

A 3-dimensional vector:
[[1]
 [2]
 [3]]


## Zero vector, unit vector, and sparse vector {.smaller}

**`Zero vectors`**, are vectors composed of zeros, and zeros only. It is common to see this vector denoted as simply $0$, regardless of the dimensionality. Hence, you may see a 3-dimensional or 10-dimensional with all entries equal to 0, refered as "the 0" vector. For instance:

$$
\bf{0} = 
\begin{bmatrix}
0\\
0\\
0
\end{bmatrix}
$$

**`Unit vectors`**, are vectors composed of a single element equal to one, and the rest to zero. Unit vectors are important to understand applications like norms. For instance, $\bf{x_1}$, $\bf{x_2}$, and $\bf{x_3}$ are unit vectors:

$$
\bf{x_1} = 
\begin{bmatrix}
1\\
0\\
0
\end{bmatrix},
\bf{x_2} = 
\begin{bmatrix}
0\\
1\\
0
\end{bmatrix},
\bf{x_3} = 
\begin{bmatrix}
0\\
0\\
1
\end{bmatrix}
$$

**`Sparse vectors`**, are vectors with most of its elements equal to zero. We denote the number of nonzero elements of a vector $\bf{x}$ as $nnz(x)$. The sparser possible vector is the zero vector. Sparse vectors are common in machine learning applications and often require some type of method to deal with them effectively.  


## Vector dimensions and coordinate system

Vectors can have any number of dimensions.   
The most common are the 2-dimensional cartesian plane, and the 3-dimensional space.   
Vectors in 2 and 3 dimensions are used often for pedgagogical purposes since we can visualize them as geometric vectors.   Nevetheless, most problems in machine learning entail more dimensions, sometiome hundreds or thousands of dimensions.   
The notation for a vector $\bf{x}$ of arbitrary dimensions, $n$ is:  

$$
\bf{x} = 
\begin{bmatrix}
x_1 \\ x_2 \\ \vdots \\ x_n
\end{bmatrix}
\in \mathbb{R}^n
$$

## Coordinate system

Vectors dimensions map into **coordinate systems or perpendicular axes**.   
Coordinate systems have an origin at $(0,0,0)$, hence, when we define a vector:

$$\bf{x} = \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix} \in \mathbb{R}^3$$


## Graphical Example

<center> Fig. 4: Coordinate systems <center/>

<center>
<img src="./images/b-coordinate-system.svg">
<center/>

## Basic vector operations

### Vector-vector addition 

We used vector-vector addition to define vectors without defining vector-vector addition. Vector-vector addition is an element-wise operation, only defined for vectors of the same size (i.e., number of elements). Consider two vectors of the same size, then: 

$$
\bf{x} + \bf{y} = 
\begin{bmatrix}
x_1\\
\vdots\\
x_n
\end{bmatrix}+
\begin{bmatrix}
y_1\\
\vdots\\
y_n
\end{bmatrix} =
\begin{bmatrix}
x_1 + y_1\\
\vdots\\
x_n + y_n
\end{bmatrix}
$$

For instance:

$$
\bf{x} + \bf{y} = 
\begin{bmatrix}
1\\
2\\
3
\end{bmatrix}+
\begin{bmatrix}
1\\
2\\
3
\end{bmatrix} =
\begin{bmatrix}
1 + 1\\
2 + 2\\
3 + 3
\end{bmatrix} =
\begin{bmatrix}
2\\
4\\
6
\end{bmatrix}
$$

## PROPERTIES

Vector addition has a series of **fundamental properties** worth mentioning:

1. `Commutativity`: $x + y = y + x$
2. `Associativity`: $(x + y) + z = x + (y + z)$
3. `Adding the zero vector has no effect`: $x + 0 = 0 + x = x$
4. `Substracting a vector from itself returns the zero vector`: $x - x = 0$

## NUMPY 

In `NumPy`, we add two vectors of the same with the `+` operator or the `add` method:

In [5]:
x = y = np.array([[1],
                  [2],
                  [3]])

In [6]:
x + y

array([[2],
       [4],
       [6]])

In [7]:
np.add(x,y)

array([[2],
       [4],
       [6]])

## Vector-scalar multiplication

Vector-scalar multiplication is an element-wise operation. It's defined as:

$$
\alpha \bf{x} = 
\begin{bmatrix}
\alpha \bf{x_1}\\
\vdots \\
\alpha \bf{x_n}
\end{bmatrix}
$$

Consider $\alpha = 2$ and $\bf{x} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$:

$$
\alpha \bf{x} = 
\begin{bmatrix}
2 \times 1\\
2 \times 2\\
2 \times 3
\end{bmatrix} = 
\begin{bmatrix}
2\\
4\\
6
\end{bmatrix}
$$

## PROPERTIES

Vector-scalar multiplication satisfies a series of important properties:

1. `Associativity`: $(\alpha \beta) \bf{x} = \alpha (\beta \bf{x})$
2. `Left-distributive property`: $(\alpha + \beta) \bf{x} = \alpha \bf{x} + \beta \bf{x}$
3. `Right-distributive property`: $\bf{x} (\alpha + \beta) = \bf{x} \alpha + \bf{x} \beta$
4. `Right-distributive property for vector addition`: $\alpha (\bf{x} + \bf{y}) = \alpha \bf{x} + \alpha \bf{y}$

## NUMPY MULTIPLICATIONS 

In `NumPy`, we compute scalar-vector multiplication with the `*` operator:

In [8]:
alpha = 2
x = np.array([[1],
             [2],
             [3]])

In [9]:
alpha * x

array([[2],
       [4],
       [6]])

## Linear combinations of vectors {.smaller}

There are only two legal operations with vectors in linear algebra: **addition** and **multiplication by numbers**. When we combine those, we get a **linear combination**.

$$
\alpha \bf{x} + \beta \bf{y} = 
\alpha
\begin{bmatrix}
x_1 \\ 
x_2
\end{bmatrix}+
\beta
\begin{bmatrix}
y_1 \\ 
y_2
\end{bmatrix}=
\begin{bmatrix}
\alpha x_1 + \alpha x_2\\ 
\beta y_1 + \beta y_2
\end{bmatrix}
$$

Consider $\alpha = 2$, $\beta = 3$, $\bf{x}=\begin{bmatrix}2 \\ 3\end{bmatrix}$, and $\begin{bmatrix}4 \\ 5\end{bmatrix}$.

We obtain:

$$
\alpha \bf{x} + \beta \bf{y} = 
2
\begin{bmatrix}
2 \\ 
3
\end{bmatrix}+
3
\begin{bmatrix}
4 \\ 
5
\end{bmatrix}=
\begin{bmatrix}
2 \times 2 + 2 \times 4\\ 
2 \times 3 + 3 \times 5
\end{bmatrix}=
\begin{bmatrix}
10 \\
21
\end{bmatrix}
$$

Another way to express linear combinations you'll see often is with summation notation. Consider a set of vectors $x_1, ..., x_k$ and scalars $\beta_1, ..., \beta_k \in \mathbb{R}$, then:   

$$
\sum_{i=1}^k \beta_i x_i := \beta_1x_1 + ... + \beta_kx_k
$$

Note that $:=$ means "*is defined as*".

Linear combinations are the most fundamental operation in linear algebra. Everything in linear algebra results from linear combinations. For instance, linear regression is a linear combination of vectors. **Fig. 2** shows an example of how adding two geometrical vectors looks like for intuition.

## NUMPY LINEAR COMBINATION 

In `NumPy`, we do linear combinations as:

In [10]:
a, b = 2, 3
x , y = np.array([[2],[3]]), np.array([[4], [5]])

In [11]:
a*x + b*y

array([[16],
       [21]])

## Vector-vector multiplication: dot product {.smaller}

Vector-vector multiplication is commonly known as a **dot product** or **inner product**. The dot product of $\bf{x}$ and $\bf{y}$ is defined as: 

$$
\bf{x} \cdot \bf{y} :=
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix}^T
\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix} =
\begin{bmatrix}
x_1 & x_2
\end{bmatrix}
\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix} =
x_1 \times y_1 + x_2 \times y_2 
$$

Where the $T$ superscript denotes the transpose of the vector. Transposing a vector just means to "flip" the column vector to a row vector counterclockwise. For instance:

$$
\bf{x} \cdot \bf{y} =
\begin{bmatrix}
-2 \\
2
\end{bmatrix}
\begin{bmatrix}
4 \\
-3
\end{bmatrix} =
\begin{bmatrix}
-2 & 2
\end{bmatrix}
\begin{bmatrix}
4 \\
-3
\end{bmatrix} =
-2 \times 4 + 2 \times -3 = (-8) + (-6) = -14  
$$



## Example


To multiply two vectors with dimensions (rows=2, cols=1) in `Numpy`, we need to transpose the first vector at using the `@` operator:

In [1]:
import numpy as np
x, y = np.array([[-2],[2]]), np.array([[4],[-3]])
print(x)
print(y)

[[-2]
 [ 2]]
[[ 4]
 [-3]]


In [3]:
print(x.T)

[[-2  2]]


In [2]:
x.T @ y

array([[-14]])

# Vector space, span, and subspace

## Vector space

In its more general form, a **`vector space`**, also known as **`linear space`**, is a collection of objects that follow the rules defined for vectors in $\mathbb{R}^n$.   
More colloquially, a vector space is the set of proper vectors and all possible linear combinations of the vector set.   
In addition, vector addition and multiplication must follow these eight rules:   

1. `Commutativity`: $x + y = y + x$
2. `Associativity`: $x + (y + x) = (y + x) + z$
3. `Unique zero vector`: $x + 0 = x$ $\forall$ $x$ 
4.  $\forall$ $x$ there is a unique vector $x$ such that $x + -x = 0$
5. identity element of scalar multiplication: $1x = x$
6. distributivity of scalar multiplication w.r.t vector addition: $x(y + z) = xz + zy$
7. $x(yz) = (xy)z$
8. $(y + z)x = yx + zx$


## Vector span {.smaller}

Consider the vectors $\bf{x}$ and $\bf{y}$ and the scalars $\alpha$ and $\beta$. If we take *all* possible linear combinations of $\alpha \bf{x} + \beta \bf{y}$ we would obtain the **span** of such vectors. 

<center> Fig. 5: Vector Span <center/>

<center>
<img src="./images/b-vector-span.svg">
<center/>

## Vector subspaces

A **`vector subspace` (or linear subspace) is a vector space that lies within a larger vector space**. These are also known as linear subspaces. Consider a subspace $S$. For a vector to be a valid subspace it has to meet **three conditions**:

1. Contains the zero vector, $\bf{0} \in S$
2. Closure under multiplication, $\forall \alpha \in \mathbb{R} \rightarrow  \alpha \times s_i \in S$
3. Closure under addition, $\forall s_i \in S \rightarrow  s_1 + s_2 \in S$


##

<center> Fig. 6: Vector subspaces <center/>

<center>
<img src="./images/b-vector-subspace.svg">
<center/>

## Example  - Part 1

Is $\bf{x}=\begin{bmatrix} 1 \\ 1 \end{bmatrix}$ a valid subspace of $\mathbb{R^2}$?   
Let's evaluate $\bf{x}$ on the three conditions:

**`Contains the zero vector`**: it does. Remember that the span of a vector are all linear combinations of such a vector. Therefore, we can simply multiply by $0$ to get $\begin{bmatrix}0 \\ 0 \end{bmatrix}$:

$$
\bf{x}\times 0=0
\begin{bmatrix}
1 \\ 
1 
\end{bmatrix}
=
\begin{bmatrix}
0 \\ 
0 
\end{bmatrix}
$$

## Example - Part 2

**`Closure under multiplication`** implies that if take any vector belonging to $\bf{x}$ and multiply by any real scalar $\alpha$, the resulting vector stays within the span of $\bf{x}$. Algebraically is easy to see that we can multiply $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$ by any scalar $\alpha$, and the resulting vector remains in the 2-dimensional plane (i.e., the span of $\mathbb{R}^2$).

**`Closure under addition`** implies that if we add together any vectors belonging to $\bf{x}$, the resulting vector remains within the span of $\mathbb{R}^2$. Again, algebraically is clear that if we add $\bf{x}$ + $\bf{x}$, the resulting vector will remain in $\mathbb{R}^2$. There is no way to get to $\mathbb{R^3}$ or $\mathbb{R^4}$ or any space outside the two-dimensional plane by adding $\bf{x}$ multiple times. 