# __Matrices__

# Basics

A matrix is described as having M rows and N columns. When given the shape of a matrix, the number of rows comes first, and the number of columns comes second. _Example:_ This is a 2x3 matrix. M = 2 and N = 3.

$ \begin{bmatrix}
-5 & 0 & 2 \\ 
 4 & 3 & 0
\end{bmatrix} $

# Matrix Multiplication

## Dot Product

When multiplying two matrices, the dimensions of the first matrix are denoted by MxN, and the dimensions of the second matrix are denoted by NxP. To perform multiplication, the Ns must be equal. The result is an MxP matrix. https://www.mathsisfun.com/algebra/matrix-multiplying.html

To perform the dot product, multiply the numbers in each row of Matrix \#1 by the corresponding numbers in each column of Matrix \#2. Sum over the products.  

$ \begin{bmatrix}
1 & 2 & 3 \\ 
4 & 5 & 6
\end{bmatrix} \cdot \begin{bmatrix}
7 & 8 \\ 
9 & 10 \\
11 & 12
\end{bmatrix}$

$Row \ \#1 * Column \ \#1: (1, 2, 3) • (7, 9, 11) = 1×7 + 2×9 + 3×11 = 58$</br>
$Row \ \#1 * Column \ \#2: (1, 2, 3) • (8, 10, 12) = 1x8 + 2x10 + 3x12 = 64$</br>
$Row \ \#2 * Column \ \#1: (4, 5, 6) • (7, 9, 11) = 4×7 + 5×9 + 6×11 = 139$</br>
$Row \ \#2 * Column \ \#2: (4, 5, 6) • (8, 10, 12) = 4×8 + 5×10 + 6×12 = 154$</br>

$ \begin{bmatrix}
58 & 64 \\ 
139 & 154
\end{bmatrix}$

## Matrix by Column Vector Multiplication

Multiplication of a matrix by a column vector is a special case of matrix-by-matrix multiplication. The column vector can be treated as a Nx1 matrix.

$ \begin{bmatrix}
1 & 2 & 3 \\ 
4 & 5 & 6
\end{bmatrix} \cdot \begin{bmatrix}
7 \\ 
9 \\
11
\end{bmatrix}$

$Row \ \#1 * Column \ \#1: (1, 2, 3) • (7, 9, 11) = 1×7 + 2×9 + 3×11 = 58$</br>
$Row \ \#2 * Column \ \#1: (4, 5, 6) • (7, 9, 11) = 4×7 + 5×9 + 6×11 = 139$</br>

$ \begin{bmatrix}
58 \\ 
139 
\end{bmatrix}$

# Operators

## Python Operators

`*` = element-wise product.<br>
`**2` = element-wise square.<br>
`np.dot` = dot product.

## Math Operators

$•$ = dot product.<br>
$||A||_2$ = Frobenius norm

# Application to Logistic Regression

## Lazy Programmer

The Logistic Regression course by Lazy Programmer Inc. at https://www.udemy.com/course/data-science-logistic-regression-in-python/ uses notation including a N samples x D features input matrix.

### Notation from Logistic Regression Notebook
$N = Number \ of \ samples $ <br>
$D = Number \ of \ dimensions \ (features) $ <br>
$\textbf{X} = N\ x\ D\  matrix $ <br>
$\textbf{w} = N\ x\ 1\  matrix \ of \ weights $ <br>
$h(x) = hypothesis \ function $ <br>
$z = \textbf{w}^{\textbf{T}}\textbf{x}$ <br>

In [19]:
import numpy as np

In [20]:
X = np.matrix([[1,2,3],[3,2,1],[3,3,3],[4,4,4],[5,4,3]])
print('X')
print(X)
# print('shape = {}, N = {} D = {}'.format(X.shape), X.shape[0], X.shape[1])
print('shape = {}, N = {} D = {}'.format(X.shape, X.shape[0], X.shape[1]))
print()
w = np.matrix([[10], [9], [8], [7], [6]])
print('w.T')
print(w.T)
print('shape = {}, N = {} D = {}'.format(w.T.shape, w.T.shape[0], w.T.shape[1]))
print()

product = w.T.dot(X)
print('Product')
print(product)
print('shape = {}, N = {} D = {}'.format(product.shape, product.shape[0], product.shape[1]))
print()
# print(X*w)

X
[[1 2 3]
 [3 2 1]
 [3 3 3]
 [4 4 4]
 [5 4 3]]
shape = (5, 3), N = 5 D = 3

w.T
[[10  9  8  7  6]]
shape = (1, 5), N = 1 D = 5

Product
[[119 114 109]]
shape = (1, 3), N = 1 D = 3



## Professor Andrew Ng (Stanford)
The deep learning course by Andrew Ng uses an `n`x`m` input matrix.

$m =$ number of examples in the dataset.<br>
$n_{x} =$ input size <br>
$n_{y} =$ output size <br>
$X \in \mathbb{R}^{n_x \ x \ m} $ = is the input matrix  <br>
$x^{(i)} \in \mathbb{R}^{n_x}$ = the i<sup>th</sup> example represented as a column vector.

## Several Techniques to Multiply

Several methods are available to multiply numpy matrices.

In [21]:
product = np.dot(w.T, X)
print('Product')
print(product)

Product
[[119 114 109]]


In [22]:
print('Product')
print(w.T*X)

Product
[[119 114 109]]


In [23]:
print('Product')
print(w.T.dot(X))

Product
[[119 114 109]]


# Warnings about Matrices in Python

## Rank-1 Array
An array with a shape like (5,) does not behave consistently as a row vector or a column vector. The solution is to declare a 1x5 or 5x1 matrix. A 5x1 is a colum vector. A 1x5 is a row vector.

In [24]:
import numpy as np

a = np.random.randn(5)

In [25]:
print(a.shape)

(5,)


In [26]:
print(a.T.shape)

(5,)


In [27]:
print(np.dot(a,a.T))

3.915311471066201


In [28]:
a = np.random.randn(5,1)
print(a)

[[-1.12373755]
 [ 1.27613696]
 [-0.23290096]
 [ 1.17966469]
 [ 1.42182608]]


In [29]:
print(a.T)

[[-1.12373755  1.27613696 -0.23290096  1.17966469  1.42182608]]


In [30]:
print(np.dot(a, a.T))

[[ 1.26278608 -1.43404302  0.26171955 -1.32563351 -1.59775935]
 [-1.43404302  1.62852554 -0.29721352  1.50541371  1.81444481]
 [ 0.26171955 -0.29721352  0.05424286 -0.27474504 -0.33114466]
 [-1.32563351  1.50541371 -0.27474504  1.39160877  1.67727801]
 [-1.59775935  1.81444481 -0.33114466  1.67727801  2.02158939]]


In [34]:
# It is possible to reshae a rank-1 array to a matrix.
b = np.random.randn(5)
print("Rank-1 Array")
print(b)
print("Shape")
print(b.shape)
print()
b = b.reshape([5,1])
print("Matrix")
print(b)
print("Shape")
print(b.shape)

Rank-1 Array
[ 0.10233763  0.10351315 -1.92455824 -1.23169823 -0.36765966]
Shape
(5,)

Matrix
[[ 0.10233763]
 [ 0.10351315]
 [-1.92455824]
 [-1.23169823]
 [-0.36765966]]
Shape
(5, 1)


# Frobenius Norm

1. Take the element-wise square of matrix `A`.
1. Sum over the elements of resulting matrix `A**2`. This yields a scalar.
1. Take the square root of the result.

## Formula

<font size=5>
$\left \| A \right \|_{F} = \left \| A \right \|_{2}$<p>
    $= \sqrt{\sum_{i} \sum_{j}|a_{ij}|^2}$

## Example

$A = \begin{bmatrix}
2 & -2 & 1\\ 
-1 & 3  & -1\\
2 & -4 & 1
\end{bmatrix}$<p>
    $= \sqrt{( |2|^2 + |-2|^2 + |1|^2 + |-1|^2 + |3|^2 + |-1|^2 + |2|^2 + |-4|^2 + |1|^2 }$<p>
    $= \sqrt{(41)} \approx 6,40 $
    
https://s-mat-pcs.oulu.fi/~mpa/matreng/eem5_1-1.htm

## Numpy

In [13]:
import numpy as np
from numpy import linalg as LA

In [14]:
A = np.array([2, -2, 1, -1, 3, -1, 2, -4, 1])
A

array([ 2, -2,  1, -1,  3, -1,  2, -4,  1])

In [15]:
B = a.reshape((3, 3))
B

array([[ 2, -2,  1],
       [-1,  3, -1],
       [ 2, -4,  1]])

In [21]:
LA.norm(B, ord='fro')

6.4031242374328485

## Numpy without `linalg`

In [17]:
B_squared = B ** 2
B_squared

array([[ 4,  4,  1],
       [ 1,  9,  1],
       [ 4, 16,  1]], dtype=int32)

In [19]:
B_sum = np.sum(B_squared)
B_sum

41

In [20]:
B_F = np.sqrt(B_sum)
B_F

6.4031242374328485

# 1-Norm

The 1-norm is the maximum of the column sums.

## Formula
<p>
<font size=5>
$\left \| A \right \|_{1} = max_{1 \le j \le n} \sum_{i=1}^{m}|a_{ij}|$
    </font>

## Example

 $max[ |2| + |-1| + |2|, |-2| + |3| + |-4|, |1| + |-1| + |2| ] = max[ 5, 9, 4 ] = 9$

## Numpy

In [22]:
LA.norm(B, ord=1)

9.0

# $\infty$-Norm

This norm is the maximum of the row sums.

## Formula
<p>
<font size=5>
$\left \| A \right \|_{\infty} = max_{1 \le i \le m} \sum_{j=1}^{n}|a_{ij}|$
    </font>

## Example

$= max[ |2| + |-2| + |1|, |-1| + |3| + |-1|, |2| + |-4| + |1| ] = max[ 5, 5, 7 ] = 7$

## Numpy

In [26]:
LA.norm(B, ord=np.inf)

7.0