# EVeMa 2018

![title](imgs/evema2018.png)

# ESCUELA DE VERANILLO EN MACHINE LEARNING EVeMa

# Topic: MATRICES

# Felipe Meza

<br>

**Why matrices in machine learning?**

- Matrices are a foundational element of linear algebra.
- Matrices are used in the field of machine learning in the description of algorithms and processes such as the input data variable (X) when training an algorithm.
- Matrix operations are used in the description of many machine learning algorithms.

<img src="imgs/mnist2.png" alt="Drawing" style="width: 500px;"/>
<h4 align="center">Matrix representation of a input data variable.</h4> 

Linear algebra facilitates the expression of multiple operations, such as operations in linear equations in the following system:

\begin{array}{c}
4x_{1}-5x_{2}=-13\\
-2x_{1}+3x_{2}=9
\end{array}

the system of equations above has the same number of equations and variables, so it presents a unique solution if the equations are linearly independent (none of the equations is a linear combination of another). In matrix notation, the system of equations above is expressed as follows:

\begin{equation}
A\,\vec{x}=b
\end{equation}

with

\begin{equation}
A=\begin{bmatrix}4 & -5\\
-2 & 3
\end{bmatrix},\qquad b=\begin{bmatrix}-13\\
9
\end{bmatrix}
\end{equation}

In [1]:
# speaking of equation systems, the class linalg is part of the library numpy as was created 
# to solve linear equation systems in matrix representation, such as the one described above. 

import numpy as np

# Se crea la matriz A 2x2 como arreglo
A = np.array([[4, -5],[-2, 3]])

# De igual forma se procede con B
B = np.array([-13, 9]) 

# En numpy linalg resuelve el sistema
print ('Solutions:\n',np.linalg.solve(A, B))

Solutions:
 [ 3.  5.]


The following notation will be used in the course material:

- With $A\in\mathbb{R}^{m\times n}$ a matrix with $m$ rows and $n$ columns is defined, where in this case all the entries of $A$ are real numbers.

- With $\vec{x}\in\mathbb{R}^{n\times1}=\mathbb{R}^{n}$ denotes a vector with $n$ entries. By convention, a $n$ dimensional vector is defined as an matrix of $n$ rows and $1$ column, known as the ** column vector **:

\begin{equation}
\overrightarrow{x}=\begin{bmatrix}x_{1}\\
x_{2}\\
\vdots\\
x_{n}
\end{bmatrix}
\end{equation}

and the $i$ element of the vector is denoted as $x_{i}$. A row vector is then defined in the following way (using the definition of the transpose):

\begin{equation}
\overrightarrow{x}^{T}=\begin{bmatrix}x_{1} & x_{2} & \ldots & x_{n}\end{bmatrix}
\end{equation}

- To denote the elements of a matrix, use the notation $a_{i,j}$ ($A_{ij}$, $A_{i,j}$,$A\left(i,j\right)$, etc) to denote an entry of the $A$ matrix in the $i$ row and the $j$ column:

\begin{equation}
A=\begin{bmatrix}a_{1,1} & a_{1,2} & \ldots & a_{1,n}\\
a_{2,1} & a_{2,2} & \ldots & a_{2,n}\\
\vdots & \vdots & \ddots & \vdots\\
a_{m,1} & a_{m,2} &  & a_{m,n}
\end{bmatrix}
\end{equation}

<img src="imgs/matrixnm.png" alt="Drawing" style="width: 200px;"/>

and define the $j$ column of the matrix $A$ with $a_{j}$ or $A_{:,j}$, so that the $A$ array is defined in terms of column vectors by:

\begin{equation}
A=\begin{bmatrix}| & | & \ldots & |\\
\overrightarrow{a}_{:,1} & \overrightarrow{a}_{:,2} & \ldots & \overrightarrow{a}_{:,n}\\
| & | & \ldots & |
\end{bmatrix}
\end{equation}

and the row $i$ of such matrix is defined as $\vec{a}_{i,:}^{T}$ o $A_{i,:}$, so in terms of such row vectors the matrix $A$ is expressed as:

\begin{equation}
A=\begin{bmatrix}- & \vec{a}_{1,:}^{T} & -\\
- & \vec{a}_{2,:}^{T} & -\\
 & \vdots\\
- & \vec{a}_{m,:}^{T} & -
\end{bmatrix}
\end{equation}

<br>

In [2]:
# How can we create matrices in python?
# There are several ways to do that, let's explore two common methods:

# 1- using square brackets [[]] as nested lists

# matriz 3*4 matrix where 3 is number of rows and 4 is number of columns.

A = [[10,80,75,85],[20,90,85,95],[5,40,30,15]]

print(A)
print("\n")

# 2- using conventional arrays

A1 = np.array([[10,80,75,85],[20,90,85,95],[5,40,30,15]])

print(A1)

[[10, 80, 75, 85], [20, 90, 85, 95], [5, 40, 30, 15]]


[[10 80 75 85]
 [20 90 85 95]
 [ 5 40 30 15]]


In [3]:
# Type validation of A

type(A)

list

In [4]:
# Type validation of A1

type(A1)

numpy.ndarray

In [9]:
# 3- using the matrix type in numpy 

A2 = np.matrix( ((2,3), (3, 5)) )
print(A2)
print("\n")

# Type validation of A

type(A2)

[[2 3]
 [3 5]]




numpy.matrixlib.defmatrix.matrix

In [10]:
# we can also create matrices with values of a range with numpy, and specifing the size of the matrix

from numpy import * 

B = range(16)

B = reshape(B,(4,4)) 

print(B) 

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [11]:
B1 = reshape(B,(2,8)) 

print(B1) 

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]]


In [12]:
B2 = reshape(B,(2,9)) 

print(B2) 

# range and matrix dim must be the same!!!!!!

ValueError: cannot reshape array of size 16 into shape (2,9)

<img src="imgs/matdim.png" alt="Drawing" style="width: 200px;"/>

In [13]:
# to access specific data of a matrix, we do it just like in regular lists by specifing the values of subindices
# in square brackets, for example, for a matrix Z in the form Z[row][col]

# let's remember matrix B
print(B)
print("\n")

# We are interested in the first row (0)
print(B[0])
print("\n")

# We are interested in the last row (3)
print(B[3])
print("\n")

# We are interested in the "1" that belongs to row (0) and column (1)
print(B[0][1])
print("\n")

# We are interested in the "10" that belongs to row (2) and column (2)
print(B[2][2])


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


[0 1 2 3]


[12 13 14 15]


1


10


In [43]:
# We are interested in the "14" that belongs to row (?) and column (?)

print(B[][])

0


In [14]:
# Another way to access the values of a matrix is by using negative indexes, where -1 refers to the
# last data, -2 to the last penultimate etc

# again B matrix
print(B) 
print("\n")

# we are interested in the last row
print(B[-1]) 
print("\n")

# we are interested in the last row, the penultimate value
print(B[-1][-2]) 
print("\n")

# we are interested in the penultimate row, the antepenultimate value
print(B[-2][-3]) 


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


[12 13 14 15]


14


9


In [18]:
# Now we are interested in the "5" that belongs to row (?) and column (?)

print(B[][])

0


In [19]:
# let's remember the matrix B again

print (B)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [23]:
# It is also possible to split a matrix, using the format (start: end: increment) with numpy
# We are interested in spliting the matrix B and leaving only the first 3 rows :3 with two columns [0,1]

print(B[:3,[0,1]])

# :3 implies rows 0-1-2 (not 3) and [0,1] implies columns 0 and 1

[[0 1]
 [4 5]
 [8 9]]


In [24]:
# we are interested in spliting matrix B and leaving only the last two rows with two central columns

print(B[2:4,[1,2]])

# 2:4 implies rows 2-3 and [1,2] implies columns 1 and 2

[[ 9 10]
 [13 14]]


In [26]:
# it is also possible to insert elements to a matrix
# for instance, add a complete new row, to do that we use "append"

B1 = append(B,[[16, 17, 18, 19]],0)

# here 0 is axis that represents the dimensions where 0 stands for row and 1 stands for column

print (B)
print("\n")
print (B1)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]


In [28]:
# it is also possible to insert a column, in this case we use "insert"

B2 = insert(B,[4],[[100],[200],[300],[400]],axis=1) 

# here axis represents the dimensions where 0 stands for row and 1 stands for column 

print (B2)

[[  0   1   2   3 100]
 [  4   5   6   7 200]
 [  8   9  10  11 300]
 [ 12  13  14  15 400]]


In [39]:
# Another common task is to erase or remove elements from a matrix or array
# for example, to delete a row use "delete", in this case for row i = 2

B3 = delete(B2,[2],0)

# here 0 is axis that represents the dimensions where 0 stands for row and 1 stands for column

print (B2)
print("\n")
print (B3)

[[  0   1   2   3 100]
 [  4   5   6   7 200]
 [  8   9  10  11 300]
 [ 12  13  14  15 400]]


[[  0   1   2   3 100]
 [  4   5   6   7 200]
 [ 12  13  14  15 400]]


In [40]:
# to delete a column use delete, in this case for column i = 4

B4 = delete(B2, [4], 1)

# here 0 is axis that represents the dimensions where 0 stands for row and 1 stands for column

print (B2)
print("\n")
print (B4)

[[  0   1   2   3 100]
 [  4   5   6   7 200]
 [  8   9  10  11 300]
 [ 12  13  14  15 400]]


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [33]:
# to delete several columns we can use slicing

B5 = delete(B2, s_[0::3], 1)

# here 0 is axis that represents the dimensions where 0 stands for row and 1 stands for column

print (B2)
print("\n")
print (B5)


# s_[x::y] x first column to delete and then continue in the order of y-1 

[[  0   1   2   3 100]
 [  4   5   6   7 200]
 [  8   9  10  11 300]
 [ 12  13  14  15 400]]


[[  1   2 100]
 [  5   6 200]
 [  9  10 300]
 [ 13  14 400]]


In [34]:
# max and min, it's possible to extract from a matrix the max and min values.

print (B)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [35]:
B.max()

15

In [36]:
B.min()

0

** The identity and diagonal matrix **

The identity matrix, defined as a square matrix $I\in\mathbb{R}^{n\times n}$ and is formed by a diagonal of ones, and the rest of entries in the matrix are zero:

\begin{equation}
I_{i,j}=\begin{cases}
1 & i=j\\
0 & i\neq j
\end{cases}
\end{equation}

and is the neutral of the matrix multiplication, so for all $A\in\mathbb{R}^{m\times n}$ we have:

\begin{equation}
A\,I=A
\end{equation}

the identity matrix is a particular case of a diagonal matrix, where all non-diagonal elements are 0, which is denoted as: $D=\textrm{diag}\left(d_{1},d_{2},\ldots,d_{n}\right)$ with:

\begin{equation}
D_{i,j}=\begin{cases}
d_{i} & i=j\\
0 & i\neq j
\end{cases}
\end{equation}

so then $I=\textrm{diag}\left(1,1,\ldots,1\right)$.


<img src="imgs/iden.png" alt="Drawing" style="width: 400px;"/>

In [37]:
# in numpy we can create an identity matrix by specifying the number of 1's i.e the # of rows

np.identity(5)

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

** The transposed matrix **

The transpose of a matrix is the result of changing rows to columns. Be a matrix $A\in\mathbb{R}^{m\times n}$, its transpose is written as $A^{T}\in\mathbb{R}^{n\times m}$ and its entries are given by:

\begin{equation}
\left(A^{T}\right)_{i,j}=A_{j,i}.
\end{equation}

The following are properties of the transpose:

- $\left(A^{T}\right)^{T}=A$
- $\left(AB\right)^{T}=B^{T}A^{T}$
- $\left(A+B\right)^{T}=A^{T}+B^{T}$.


In [51]:
AZ = np.array([[1,2,3],[4,5,6]])
BZ = np.array([[7,8,9],[10,11,12]])
print(AZ)
print("\n")
#print(BZ)
print("\n")
trans=np.transpose(AZ)
print(trans)
print("\n")
print(np.transpose(trans))

[[1 2 3]
 [4 5 6]]




[[1 4]
 [2 5]
 [3 6]]


[[1 2 3]
 [4 5 6]]


In [75]:
# Recomended Practice: Validate properties.

** Symmetric matrices **

A square matrix $A\in\mathbb{R}^{n\times n}$ is symmetric if $A=A^{T}$ and is anti-symmetric if $A=-A^{T}$. For all matrix $A\in\mathbb{R}^{n\times n}$ it is easy to show that the matrix $A+A^{T}$ is symmetric and the matrix $A-A^{T}$ is anti-symmetric, so it can be followed that any square matrix can be expressed in terms of a symmetric and anti-symmetric matrix:
    
\begin{equation}
A=\frac{1}{2}\left(A+A^{T}\right)+ \frac{1}{2}\left(A-A^{T}\right).
\end{equation}

The set of symmetric matrices of dimensions $n\times n$ is defined as $\mathbb{S}^{n}$ so that $A\in\mathbb{S}^{n}$ if t's symmetric. The symmetric matrices are very frequent in the recognition of patterns, and present a series of very useful properties that we will see later.



In [58]:
# example of symmetric

S = np.array([[1,1,-1],[1,2,0],[-1,0,5]])
print(S)
print("\n")
trans=np.transpose(S)
print(trans)

[[ 1  1 -1]
 [ 1  2  0]
 [-1  0  5]]


[[ 1  1 -1]
 [ 1  2  0]
 [-1  0  5]]


In [60]:
# example of antisymmetric

AS = np.array([[0,-7,8],[7,0,-1],[-8,1,0]])
print(AS)
print("\n")
trans=np.transpose(AS)
print(trans)

[[ 0 -7  8]
 [ 7  0 -1]
 [-8  1  0]]


[[ 0  7 -8]
 [-7  0  1]
 [ 8 -1  0]]


In [61]:
# case of symmetric matrix property
# B is our original matrix

print(B)
print("\n")
S = 0.5*(B + np.transpose(B)) + 0.5*(B - np.transpose(B)) 

print (S)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]]


** The trace of a matrix **

The trace of a square matrix $A\in\mathbb{R}^{n\times n}$ denoted as $\textrm{tr}\left(A\right)$ is the sum of the elements in the diagonal of a matrix:

\begin{equation}
\textrm{tr}\left(A\right)=\sum_{i=1}^{n}A_{i,i}
\end{equation}

The trace has the following properties:

- $\textrm{tr}\left(A\right)=\textrm{tr}\left(A^{T}\right)$
- Overlap: $\textrm{tr}\left(A+B\right)=\textrm{tr}\left(A\right)+\textrm{tr}\left(B\right)$
- Homogeneity: Be $t\in\mathbb{R}$, $\textrm{tr}\left(t\,A\right)=t\,\textrm{tr}\left(A\right)$
- For $A$ and $B$ square, we have $\textrm{tr}\left(A\,B\right)=\textrm{tr}\left(B\,A\right)$




In [62]:
# trace examples

# case for identity matrix 
IDN = np.eye(3)
T_IDN = np.trace(IDN)
print (IDN)
print("\n")

print (T_IDN)
print("\n")

# recall matrix B
print (B)
print("\n")

# trace calculate

print(np.trace(B))

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]


3.0


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


30


** Matrix product **

The product of two matrices $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{n\times p}$ is the matrix:

\begin{equation}
C=A\circ B=A\,B\in\mathbb{R}^{m\times p}
\end{equation}

where:

\begin{equation}
C_{i,j}=A_{i,1}B_{1,j}+\ldots+A_{i,n}B_{n,j}=\sum_{k=1}^{n}A_{i,k}\,B_{k,j}
\end{equation}

note that to make the matrix product, the number of columns in $A$ must be equal to the number of rows in the array $B$. Next, the particular cases of the matrix product are examined.


<img src="imgs/2x2.png" alt="Drawing" style="width: 200px;"/>

In [66]:
# 2x3 X 3x2 --- Expected 3x1 ?????!!!!

a = np.array([[ 1, 2 ,3], [ 4, 5 ,6]])
b = np.array([[7, 8], [ 9, 10], [ 11, 12]])

print (a)
print("\n")
print (b)
print("\n")
print (a*b)

[[1 2 3]
 [4 5 6]]


[[ 7  8]
 [ 9 10]
 [11 12]]




ValueError: operands could not be broadcast together with shapes (2,3) (3,2) 

In [67]:
print (a.dot(b))

# numpy arrays are not matrices, and the standard operations *, +, -, / work element-wise on arrays
# now, we obtain the expected array 3x1 

[[ 58  64]
 [139 154]]


In [68]:
# you can choose to define matrices type directly and apply dot, however in this case.

x = np.matrix( ((1,2,3), (4,5,6)) )
y = np.matrix( ((7,8), (9, 10),(11,12)) )
np.dot(x,y)

matrix([[ 58,  64],
        [139, 154]])

In [69]:
# since we are specifing theMATRIX type, the operator * can be used. 
x*y

matrix([[ 58,  64],
        [139, 154]])

**Product vector-vector or product point**

Let two vectors $\overrightarrow{x},\overrightarrow{y}\in\mathbb{R}^{n}$ be the ** inner product ** or point product can be defined, in terms of the product between such vectors of the following way:

\begin{equation}
\overrightarrow{x}\cdot\overrightarrow{y}=\overrightarrow{x}^{T}\:\overrightarrow{y}\in\mathbb{R}^{1}=\begin{bmatrix}x_{1} & x_{2} & \cdots & x_{n}\end{bmatrix}\begin{bmatrix}v_{1}\\
v_{2}\\
\vdots\\
v_{n}
\end{bmatrix}=\sum_{i=1}^{n}x_{i}\:y_{i}
\end{equation}

Note that the internal product is a special case of the multiplication of matrices, in addition, it is always true that:

\begin{equation}
\overrightarrow{x}^{T}\overrightarrow{y}=\overrightarrow{y}^{T}\overrightarrow{x}.
\end{equation}

The ** external product ** on the other hand, is given for two vectors $\overrightarrow{x}\in\mathbb{R}^{m\times1}$, $\overrightarrow{y}\in\mathbb{R}^{1\times n}$ (not necessarily of the same dimensionality) is defined as:

\begin{equation}
\overrightarrow{x} \otimes \overrightarrow{y}=\vec{x}\:\vec{y}^{T}\in\mathbb{R}^{m\times n}=\begin{bmatrix}x_{1}\\
x_{2}\\
\vdots\\
x_{m}
\end{bmatrix}\begin{bmatrix}y_{1} & y_{2} & \cdots & y_{n}\end{bmatrix}=\begin{bmatrix}x_{1}y_{1} & x_{1}y_{2} & \cdots & x_{1}y_{n}\\
x_{2}y_{1} & x_{2}y_{2} & \cdots & x_{2}y_{n}\\
\vdots & \vdots & \ddots & \vdots\\
x_{m}y_{1} & x_{m}y_{2} & \cdots & x_{m}y_{n}
\end{bmatrix}
\end{equation}

The external product allows, for example, to create a matrix $A\in\mathbb{R}^{m\times n}$ whose columns are equal to a vector $x\in\mathbb{R}^{m}$ using a unit vector $\overrightarrow{1}\in\mathbb{R}^{n}$, as follows:

\begin{equation}
\overrightarrow{x}\,\overrightarrow{1}^{T}=\begin{bmatrix}x_{1}\\
x_{2}\\
\vdots\\
x_{m}
\end{bmatrix}\begin{bmatrix}1 & 1 & \cdots & 1\end{bmatrix}=\begin{bmatrix}| & | &  & |\\
\vec{x} & \vec{x} & \cdots & \vec{x}\\
| & | &  & |
\end{bmatrix}
\end{equation}



In [72]:
F = np.array([[0, 5], [2, -1]])

print(F)
print("\n")

GG = np.multiply(F, F)

print(GG)
print("\n")

G = np.dot(F, F)
print(G)
print("\n")

GI = np.inner(F, F)
print(GI)
print("\n")

GO = np.outer(F, F)
print(GO)
print("\n")

[[ 0  5]
 [ 2 -1]]


[[ 0 25]
 [ 4  1]]


[[10 -5]
 [-2 11]]


[[25 -5]
 [-5  5]]


[[ 0  0  0  0]
 [ 0 25 10 -5]
 [ 0 10  4 -2]
 [ 0 -5 -2  1]]




**Matrix-vector product**

Be a matrix $A\in\mathbb{R}^{m\times n}$ and a vector (column) $\overrightarrow{x}\in\mathbb{R}^{n\times1}$ your product is the vector $\overrightarrow{y}\in\mathbb{R}^{m\times1}$.

If you write the $A$ matrix by columns, then you can express $A\,\overrightarrow{x}$ as:

\begin{equation}
\vec{y}=A\,\overrightarrow{x}=\begin{bmatrix}- & \vec{a}_{1,:}^{T} & -\\
- & \vec{a}_{2,:}^{T} & -\\
 & \vdots\\
- & \vec{a}_{m,:}^{T} & -
\end{bmatrix}\,\overrightarrow{x}=\begin{bmatrix}- & \vec{a}_{1,:}^{T} & -\\
- & \vec{a}_{2,:}^{T} & -\\
 & \vdots\\
- & \vec{a}_{m,:}^{T} & -
\end{bmatrix}\,\begin{bmatrix}x_{1}\\
x_{2}\\
\vdots\\
x_{n}
\end{bmatrix}=\begin{bmatrix}\vec{a}_{1,:}^{T}\:\vec{x}\\
\vec{a}_{2,:}^{T}\:\vec{x}\\
\vdots\\
\vec{a}_{m,:}^{T}\:\vec{x}
\end{bmatrix}
\end{equation}

In other words, row $i$ de $y$ , $y_{i}$ is equal to the inner product of row $b_{i}$ with vector $\overrightarrow{x}$.

Alternatively, if the $A$ matrix is written in the form of columns, the matrix-vector product can be expressed as:

\begin{equation}
\overrightarrow{y}=A\,\overrightarrow{x}=\begin{bmatrix}| & | & \ldots & |\\
\vec{a}_{:,1} & \vec{a}_{:,2} & \ldots & \vec{a}_{:,n}\\
| & | & \ldots & |
\end{bmatrix}\,\begin{bmatrix}x_{1}\\
x_{2}\\
\vdots\\
x_{n}
\end{bmatrix}=\left[\vec{a}_{:,1}\right]x_{1}+\left[\vec{a}_{:,2}\right]x_{2}+\ldots+\left[\vec{a}_{:,n}\right]x_{n}.
\end{equation}

this is easily corroborated if we make the multiplication of its transposed:

\begin{equation}
\overrightarrow{y}^{T}=\vec{x}^{T}\,A^{T}=\begin{bmatrix}x_{1} & x_{2} & \cdots & x_{n}\end{bmatrix}\,\begin{bmatrix}- & \vec{a}_{:,1}^{T} & -\\
- & \vec{a}_{:,2}^{T} & -\\
 & \vdots\\
- & \vec{a}_{:,n}^{T} & -
\end{bmatrix}=x_{1}\left[\vec{a}_{:,1}^{T}\right]+x_{2}\left[\vec{a}_{:,1}^{T}\right]+\ldots+x_{n}\left[\vec{a}_{:,n}^{T}\right].
\end{equation}

The above represents the fact that the vector $\overrightarrow{y}$ is a ** linear combination ** of the columns of the matrix **$A$**, where the coefficients are defined in the vector $\overrightarrow{x}$.

In [17]:
# matrix-vector example  3x3   1x3    1x3

a = np.array([[ 5, 1 ,3], [ 1, 1 ,1], [ 1, 2 ,1]])
b = np.array([1, 2, 3])

print (a.dot(b))


[16  6  8]


In [18]:
# using python3.5+ the @ operator works

print(a @ b)

[16  6  8]


**Matrix-matrix product**

The matrix-matrix product in general of two matrices $A\in\mathbb{R}^{m\times n}$ y $B\in\mathbb{R}^{n\times p}$ given by $C\in\mathbb{R}^{m\times p}$ can be defined in terms of the rows and columns, ** where for each entry $C_{i,j}$ the internal product of the row ** $i$ of $A$ and the ** column $j$ of ** $B$, symbolically this is expressed as follows:


\begin{equation}
C=A\,B=\begin{bmatrix}- & \vec{a}_{1,:}^{T} & -\\
- & \vec{a}_{2,:}^{T} & -\\
 & \vdots\\
- & \vec{a}_{m,:}^{T} & -
\end{bmatrix}\,\begin{bmatrix}| & | & \ldots & |\\
\vec{b}_{1,:} & \vec{b}_{2,:} & \ldots & \vec{b}_{p,:}\\
| & | & \ldots & |
\end{bmatrix}=\begin{bmatrix}\vec{a}_{1,:}^{T}\vec{b}_{1,:} & \vec{a}_{1,:}^{T}\vec{b}_{2,:} & \cdots & \vec{a}_{1,:}^{T}\vec{b}_{p,:}\\
\vec{a}_{2,:}^{T}\vec{b}_{1,:} & \vec{a}_{2,:}^{T}\vec{b}_{2,:} & \cdots & \vec{a}_{2,:}^{T}\vec{b}_{p,:}\\
\vdots & \vdots & \ddots & \vdots\\
\vec{a}_{m,:}^{T}\vec{b}_{1,:} & \vec{a'}_{m}^{T}\vec{b}_{2,:} & \cdots & \vec{a}_{m,:}^{T}\vec{b}_{p,:}
\end{bmatrix}
\end{equation}

\begin{equation}
C=A\,B=\begin{bmatrix}| & | & \ldots & |\\
A\vec{b}_{1,:} & A\vec{b}_{2,:} & \ldots & A\vec{b}_{p,:}\\
| & | & \ldots & |
\end{bmatrix}
\end{equation}

The last equality represents the fact that the $j$ column of the $C$ matrix is a linear combination of the column vectors of the $A$ matrix with the coefficients defined by the column vector $\vec{b}_{j,:}$.

The following properties are easily corroborable for the matrix product:

- Associativity: $\left(A\,B\right)C=A\left(B\,C\right)$.
- Distributivity: $A\left(B+C\right)=A\,B+A\,C$.
- No commutativity: $A\,B\neq B\,A$.


In [76]:
# first recall F

print (F)


np.multiply(F, F)

[[ 0  5]
 [ 2 -1]]


array([[ 0, 25],
       [ 4,  1]])

In [77]:
# Recomended Practice: Validate properties.

**The inverse matrix**

The inverse of the square matrix $A\in\mathbb{R}^{n\times n}$ is denoted as $A^{-1}$ is the only matrix that fulfills the following:

\begin{equation}
A^{-1}A=I=A\,A^{-1}
\end{equation}

Note that not all matrices have inverses, for example non-square matrices do not have inverses by definition, and even, there may be square matrices without inverses.

- It is said that $A$ is an **invertible** matrix or non-singular if $A^{-1}$ exists, if the matrix $A$ presents **full range**.
- If the matrix $A^{-1}$ does not exist, it is said that the matrix is ** not invertible ** or singular.

The following are the properties of the inverse, assuming that $A,B\in\mathbb{R}^{n\times n}$ are non-singular:


- $\left(A^{-1}\right)^{-1}=A$.
- $\left(A\,B\right)^{-1}=B^{-1}A^{-1}$.
- $\left(A^{-1}\right)^{T}=\left(A^{T}\right)^{-1}$


In [34]:
matrix = np.matrix([[1, 4],[2, 0]])

inverse = np.linalg.inv(matrix)
print(inverse)

[[ 0.     0.5  ]
 [ 0.25  -0.125]]


In [35]:
inverse = np.linalg.inv(matrix)
print(inverse)

[[ 0.     0.5  ]
 [ 0.25  -0.125]]


**Range and null space of the matrix**

A **space generated** from a vector set $\left\{ \vec{a}_{1},\vec{a}_{2},\ldots,\vec{a}_{m}\right\} \qquad\vec{a}_{i}\in\mathbb{R}^{n}$ is the set of vectors that can be expressed as a linear combination of such vectors:

\begin{equation}
\textrm{espacioGenerado}\left(\left\{ \vec{a}_{1},\vec{a}_{2},\ldots,\vec{a}_{m}\right\} \right)=\left\{ \vec{v}:\vec{v}=\sum_{i=1}^{m}x_{i}\vec{a}_{i}\qquad x_{i}\in\mathbb{R}^{1}\right\} .
\end{equation}

It can be shown that if the vector set $\left\{ \vec{a}_{1},\vec{a}_{2},\ldots,\vec{a}_{m}\right\} \qquad\vec{a}_{i}\in\mathbb{R}^{n}$ is **linearly independent** (with $ m \ geq n $), the space generated by such a set of vectors is:

\begin{equation}
\textrm{espacioGenerado}\left(\left\{ \vec{a}_{1},\vec{a}_{2},\ldots,\vec{a}_{m}\right\} \right)=\mathbb{R}^{n}.
\end{equation}


For example, the unit vectors $\hat{i}=\begin{bmatrix}1\\
0\\
0
\end{bmatrix}$, $\hat{j}=\begin{bmatrix}0\\
1\\
0
\end{bmatrix}$ y $\hat{k}=\begin{bmatrix}0\\
0\\
1
\end{bmatrix}$ they are linearly independent, so then it is easy to observe that the linear combination of such vectors can generate any vector in the space $ \ mathbb {R} ^ {3} $. For example, a vector

$\vec{v}=\begin{bmatrix}3\\
5\\
7
\end{bmatrix}$ it can be represented as: 

\begin{equation}
\vec{v}=3\hat{i}+5\hat{j}+7\hat{k}=3\,\begin{bmatrix}1\\
0\\
0
\end{bmatrix}+5\,\begin{bmatrix}0\\
1\\
0
\end{bmatrix}+7\,\begin{bmatrix}0\\
0\\
1
\end{bmatrix}
\end{equation}

so then $\vec{v}\in\textrm{espacioGenerado}\left(\left\{ \vec{i},\vec{j},\vec{k}\right\} \right) \mathbb{R}^{3}$, in this case with $x_{1}=3$, $x_{2}=5$ y $x_{3}=7$.

**The projection of a vector** $\vec{y}\in\mathbb{R}^{n}$ in the space generated by the vector set $\left\{ \vec{a}_{1},\vec{a}_{2},\ldots,\vec{a}_{m}\right\} \qquad\vec{a}_{i}\in\mathbb{R}^{n}$ corresponds to the vector $\vec{v}\in\textrm{espacioGenerado}\left(\left\{ \vec{a}_{1},\vec{a}_{2},\ldots,\vec{a}_{m}\right\} \right)$ such that $\vec{v}\in\mathbb{R}^{n}$ is as close as possible to the vector $\vec{y}\in\mathbb{R}^{n}$, measured with, for example, a Euclidean norm $\left\Vert \vec{v}-\vec{y}\right\Vert _{2}$ and can be formally defined as:

\begin{equation}
\textrm{proy}\left(\vec{y};\left\{ \vec{a}_{1},\vec{a}_{2},\ldots,\vec{a}_{m}\right\} \right)=\textrm{argmin}_{\vec{v}\in\textrm{espacioGenerado}\left(\left\{ \vec{a}_{1},\vec{a}_{2},\ldots,\vec{a}_{m}\right\} \right)}\left\Vert \vec{v}-\vec{y}\right\Vert _{2}.
\end{equation}

On the other hand, the **column space** of a matrix $A\in\mathbb{R}^{m\times n}$ denoted as $\mathcal{C}\left(A\right)$ corresponds to the space generated by the columns of the $A$ matrix, which is represented as follows:

\begin{equation}
\mathcal{C}\left(A\right)=\left\{ \vec{v}\in\mathbb{R}^{m}:\vec{v}=A\,\vec{x},\;\vec{x}\in\mathbb{R}^{m},\:A\in\mathbb{R}^{n\times m}\right\} ,
\end{equation}

where we remember that the matrix multiplication $A\,\vec{x}$ corresponds to a linear combination of the vector $\vec{x}$:

\begin{equation}
A\,\overrightarrow{x}=\begin{bmatrix}| & | & \ldots & |\\
\vec{a}_{:,1} & \vec{a}_{:,2} & \ldots & \vec{a}_{:,n}\\
| & | & \ldots & |
\end{bmatrix}\,\begin{bmatrix}x_{1}\\
x_{2}\\
\vdots\\
x_{n}
\end{bmatrix}=x_{1}\left[\vec{a}_{:,1}\right]+x_{2}\left[a_{:,2}\right]+\ldots+x_{n}\left[a_{:,n}\right],
\end{equation}

so then the column space of the array $A$ equals:

\begin{equation}
{{\mathcal{C}\left(A\right)}=espacioGenerado}\left(\left\{ \vec{a}_{:,1},\vec{a}_{:,2},\ldots,\vec{a}_{:,n}\right\} \right)=\left\{ v:v=\sum_{i=1}^{n}x_{i}\vec{a}_{:,i}\qquad x_{i}\in\mathbb{R}^{1}\right\} .
\end{equation}

Assuming that $A$ is **full range** and that $n<m$. To project the vector $\vec{y}\in\mathbb{R}^{n}$ into the column space of the matrix $A$ we have:

\begin{equation}
\textrm{proy}\left(\vec{y};A\right)=\textrm{argmin}_{\vec{v}\in\mathcal{C}\left(A\right)}\left\Vert \vec{v}-\vec{y}\right\Vert _{2}=\textrm{argmin}_{\vec{x}}\sqrt{\left(A\:\vec{x}-\vec{y}\right)\cdotp\left(A\:\vec{x}-\vec{y}\right)}
\end{equation}

\begin{equation}
\Rightarrow\textrm{proy}\left(\vec{y};A\right)=\textrm{argmin}_{\vec{x}}\sqrt{\left(A\:\vec{x}-\vec{y}\right)^{T}\left(A\:\vec{x}-\vec{y}\right)}
\end{equation}

Finding the vector that minimizes the equation $\left(A\:\vec{x}-\vec{y}\right)^{T}\left(A\:\vec{x}-\vec{y}\right)$
it is called the problem of **least squares**. We are concerned with such equation, since the original equation of the projection is usually raised, since taking its square does not alter the minimum:

\begin{equation}
\textrm{argmin}_{\vec{v}\in\mathcal{C}\left(A\right)}\left\Vert \vec{v}-\vec{y}\right\Vert _{2}^{2}=\textrm{argmin}_{\vec{x}}\left(A\:\vec{x}-\vec{y}\right)\cdotp\left(A\:\vec{x}-\vec{y}\right).
\end{equation}

With the concept of a matrix gradient, it's possible to demostrate that:

\begin{equation}
\textrm{proy}\left(\vec{y};A\right)=\textrm{argmin}_{\vec{v}\in\mathcal{C}\left(A\right)}\left\Vert \vec{v}-\vec{y}\right\Vert _{2}=A\,\left(A^{T}A\right)^{-1}A^{T}\vec{y}
\end{equation}

For the case in which $A$ is formed by a single column $\vec{a}\in\mathbb{R}^{m}$ (corresponding to a generating space of a vector), we have the special case of the projection of a vector on another vector:

\begin{equation}
\textrm{proy}\left(\vec{y};\vec{a}\right)=\frac{\vec{a}\,\vec{a}^{T}}{\vec{a}^{T}\,\vec{a}}\vec{y}
\end{equation}

Note that in such case of setting a generator set of a single vector, the subspace generated corresponds only to the scaling of such vector, but the dimensionality of the projected vector has the same original dimensionality (for what is called a projection to a sub-space). The Figure below shows the projection of a vector on another vector.


<img src="imgs/proy.png" alt="Drawing" style="width: 200px;"/>
<h4 align="center">Projection of vector $\vec{a}$ over $\vec{b}$.</h4> 

        function proyectar     
            v1 = [3; 7];  
            v2 = [9; 1];   
            proy = proyectarVector(v1, v2);  
            figure; 
            plotv([proy v1 ]);
            figure; 
            plotv([v2 v1 ]); 
        end
        function proyec = proyectarVector(b, a)
            %proyecta b sobre a     
            coefMatricial = ((a * a') / (a' * a)); 
           proyec = coefMatricial * b; 
         end

The **null space** of a matrix $A\in\mathbb{R}^{m\times n}$, is defined as the set of all the vectors that when multiplied with the $A$ matrix result in $0$ , and it is denoted as:

\begin{equation}
\mathcal{N}\left(A\right)=\left\{ \vec{x}\in\mathbb{R}^{n}:A\,\vec{x}=0\right\} 
\end{equation}



**Linear independence and matrix rank**

A set of vectors $\left\{ \vec{x}_{1},\vec{x}_{2},\ldots,\vec{x}_{n}\right\} \subset\mathbb{R}^{m}$ is said to be linearly independent, if any vector of such set can be represented as a linear combination of the remaining vectors. Otherwise, if one of the vectors in such a set can be represented as a linear combination of the remaining vectors, then the vectors are **linearly dependent**, which is expressed as:

\begin{equation}
\vec{x}_{j}=\sum_{i=1}^{n-1}\alpha_{i}\vec{x}_{i}
\end{equation}

for any set of scalar values $\alpha_{1},\ldots,\alpha_{n-1}\in\mathbb{R}$ it is said that the vector $\vec{x}_{j}\in\mathbb{R}^{m}$ is linearly dependent of the vectors $\vec{x}_{i}$.

The ** range of columns ** of the matrix $A\in\mathbb{R}^{m\times n}$ corresponds to the largest number of columns in the matrix $A$ linearly independent, so similarly, the ** range of rows ** refers to the largest number of rows in such matrix linearly independent.

For any matrix $A\in\mathbb{R}^{m\times n}$ you can check that the row and column ranges are the same, so that the linearly independent number of rows and columns will be refer to the ** range **:

$\textrm{range}\left(A\right),$ with the following properties:
    
- $\forall A\in\mathbb{R}^{m\times n}$, $\textrm{range}\left(A\right)\leq\min\left(m,n\right)$, y si $\textrm{range}\left(A\right)=\textrm{min}\left(m,n\right)$ it is said that $A$ of **complete range**.
- $\textrm{range}\left(A\right)\leq\textrm{range}\left(A^{T}\right)$
- $\textrm{range}\left(A\,B\right)\leq\min\left(\textrm{range}\left(A\right),\textrm{range}\left(B\right)\right)$
- $\textrm{range}\left(A+B\right)\leq\textrm{range}\left(A\right)+\textrm{range}\left(B\right)$

Example:

Observe the following matrix:

\begin{equation}
\begin{bmatrix}1 & 2 & -1 & 3 & -2\\
2 & 1 & 0 & 1 & 1\\
2 & 4 & -2 & 6 & -4\\
0 & 0 & 0 & 0 & 0\\
5 & 4 & -1 & 5 & 0
\end{bmatrix}
\end{equation}

It can easily be noticed that the row $f_{3}=2f_{1}$ and also that $f_{5}=2f_{2}+f_{1}$, and that since the row $f_{4}$ is null, then it can be expressed in terms of any other row in a linear combination.

# Complementary topics...

**Orthogonal matrices**

Two vectors $\vec{x},\vec{y}\in\mathbb{R}^{n}$ are orthogonal if $\vec{x}^{T}\vec{y}=0$. It is said that a vector $\vec{x}\in\mathbb{R}^{n}$ is normalized if $\left\Vert \vec{x}\right\Vert _{2}=1$.

A square matrix $U\in\mathbb{R}^{n\times n}$ is ** orthogonal ** if all columns are orthogonal to each other. If, in addition, all the vectors are normalized, the matrix is said to be ** orthonormal **.

The following are properties of orthogonal matrices:
    
- For every orthonormal matrix $U\in\mathbb{R}^{n\times n}$, it follows that: $U^{T}U=I=U\,U^{T}$ and knowing that $U\,U^{-1}=I$ it reaches that $U^{T}=U^{-1}$. If $U\in\mathbb{R}^{m\times n}$ and $n<m$ but their columns are orthonormal, then $U^{T}U=I$ but $U\,U^{T}\neq I$.

- For every orthogonal matrix $U\in\mathbb{R}^{n\times n}$ and the vector $\vec{x}\in\mathbb{R}^{n}$, it is true that when the vector operates with an orthogonal matrix, the Euclidean norm does not change:

\begin{equation}
\left\Vert U\,\vec{x}\right\Vert _{2}=\left\Vert \vec{x}\right\Vert _{2}
\end{equation}

**Determinant of a matrix**

The determinant of a square matrix $A\in\mathbb{R}^{n\times n}$ is a function denoted by $\textrm{det}\left(A\right):\mathbb{R}^{n\times n}\rightarrow\mathbb{R}$. Before detailing the formula that defines the determinant, we will examine the geometric interpretation of the determinant. Be a matrix composed of multiple rows:

\begin{equation}
A=\begin{bmatrix}- & \vec{a}_{1,:}^{T} & -\\
- & \vec{a}_{2,:}^{T} & -\\
 & \vdots\\
- & \vec{a}_{n,:}^{T} & -
\end{bmatrix}
\end{equation}

consider the set of points $S\subset\mathbb{R}^{n}$ formed by taking all the possible linear combinations of the vector row $\vec{a}_{i,:}^{T}$, where the coefficients of such linear combination satisfy that $0\leq\alpha_{i}\leq1,i=1,\ldots,n$, which is formally denoted as:

\begin{equation}
S=\left\{ \vec{v}\in\mathbb{R}^{n}:\vec{v}=\sum_{i=1}^{n}\alpha_{i}\vec{a}_{i,:},\qquad0\leq\alpha_{i}\leq1,i=1,\ldots,n\right\} 
\end{equation}

The absolute value of the determinant of the matrix $A$, $\left|\textrm{det}\left(A\right)\right|$, corresponds to a measure of the "volume" of the whole set $S$.

For example, for the matrix $A\in\mathbb{R}^{2\times2}$:

\begin{equation}
A=\begin{bmatrix}1 & 3\\
3 & 2
\end{bmatrix}
\end{equation}

whose row vectors are given by:

\begin{equation}
\vec{a}_{1,:}=\begin{bmatrix}1\\
3
\end{bmatrix}\qquad\vec{a}_{2,:}=\begin{bmatrix}3\\
2
\end{bmatrix}
\end{equation}

The set of points $S$ are shown shaded in the figure below. Notice that the "extreme" point 
$\vec{a}_{1,:}+\vec{a}_{2,:}=\begin{bmatrix}4\\
5
\end{bmatrix}$, is given when $\alpha_{1}=\alpha_{2}=1$. The determinant for a $2\times2$ matrix is defined as:

\begin{equation}
\textrm{det}\left(\begin{bmatrix}a & b\\
c & d
\end{bmatrix}\right)=a\,d-b\,c
\end{equation}

and for any matrix of $n\times n$ dimensions, the determinant is recursively defined as:

\begin{equation}
\textrm{det}\left(A\right)=A_{1,1}\textrm{det}\left(A_{\backslash1,\backslash1}\right)-A_{1,2}\textrm{det}\left(A_{\backslash1,\backslash2}\right)+\ldots\pm A_{1,n}\textrm{det}\left(A_{\backslash1,\backslash n}\right)
\end{equation}

which is equivalent also to choosing any row or column to eliminate:

\begin{equation}
\textrm{det}\left(A\right)=\sum_{i=1}^{n}\left(-1\right)^{i+j}A_{i,j}\left|A_{\backslash i,\backslash j}\right|=\sum_{j=1}^{n}\left(-1\right)^{i+j}A_{i,j}\left|A_{\backslash i,\backslash j}\right|
\end{equation}

Note that the determinant consists of the linear combination of the determinants of the submatrices resulting from eliminating the row and column $i$ (denoted as $\textrm{det}\left(A_{\backslash i,\backslash j}\right)$), multiplied by the element $A_{1,i}$. With the example matrix $A=\begin{bmatrix}1 & 3\\ 3 & 2 \end{bmatrix}$, the determinant is then given by: $\textrm{det}\left(A\right)=1\cdot2-3\cdot3=-7$, and taking its absolute value, we have that $\left|\textrm{det}\left(A\right)\right|=7$, which corresponds to the area of the parallelogram formed by the set of points $S$ (in $n$ dimensions, it is referred to as a parallelogram).

<img src="imgs/regEjemplo.png" alt="Drawing" style="width: 400px;"/>
<h4 align="center">Región $S$ de ejemplo.</h4> 

The following are properties of the determinant function $\textrm{det}\left(A\right)$ for a square matrix $A\in\mathbb{R}^{n\times n}$:

- The volume of a unit hypercube is $\textrm{det}\left(I\right)=1$.
- Homogeneity: Given an scalar $s\in\mathbb{R}$, $\textrm{det}\left(s\,A\right)=s\,\textrm{det}\left(A\right)$
- $\textrm{det}\left(A\right)=\textrm{det}\left(A^{T}\right)$
- $\textrm{det}\left(A\,B\right)=\textrm{det}\left(A\right)\,\textrm{det}\left(B\right)$
- $\textrm{det}\left(A\right)=0$, implies that $A$ is a singular matrix (not invertible), so it does not have full range, and its columns are ** linearly dependent **, which also implies that the surface $S$ has no volume, since the vectors are a linear combination .
- $\textrm{det}\left(A^{-1}\right)=1/\textrm{det}\left(A\right)$


In [28]:
# ejemplo determinantes

In [79]:
aa = np.array([[1, 3], [3, 2]])
np.linalg.det(aa)

-7.0000000000000009

In [81]:
FF = [[1, 3], [3, 2]]
np.linalg.det(FF)

-7.0000000000000009

Authors: *Saul Calderon, Angel García, Blaz Meden, Felipe Meza, Juan Esquivel, Alcides Ramirez, Mauro Mendez, Manuel Zumbado*