# Matrix Multiplication

## Linear Algebra

An error terms is sometimes associated with the first formula but we can delete that because it is unobservable and uncontrollable (really just omitting it for convenience).


* $y_i = \theta^TX_i$
* $h_\theta(x) = \theta_0 + \theta_1x_1 +... +\theta_nx_n$

Sources

* https://www.investopedia.com/terms/e/errorterm.asp


### System of Linear Equations

Consider:
<pre>
4x1 - 2x2 +  x3 =  3
-x1 + 7x2 -  x3 = 10
 x1 +  x2 - 2x3 = -3
</pre>

Then
<pre>
 4 -2  1  | x1 |  3
-1  7 -1  | x2 | 10
 1  1 -2  | x3 | -3
</pre>

#### Gaussian Elimination Method

* Multiply any row by any number
* add any row to another row
* swap any rows

#### Example

<pre>
x1 + x2 - 2x3 = -3
    3x2 - 3x3 = 7
       6.75x3 = 20.25
</pre>

Source:
* https://math.libretexts.org/Bookshelves/Linear_Algebra/Matrix_Algebra_with_Computational_Applications_%28Colbry%29/07%3A_04_Pre-Class_Assignment_-_Python_Linear_Algebra_Packages/7.1%3A_The_Syntax_for_Systems_of_Linear_Equations
* https://medium.com/datadenys/solving-systems-of-linear-equations-using-matrices-and-python-5e203a9ea146

### Vectors, Matrices, and Scalars

#### Vectors

* https://www.mathsisfun.com/algebra/scalar-vector-matrix.html (scroll down to see image)
* vector has magnitude $\sqrt{x^2 + y^2}$ and direction $\angle 45^{\circ}$
* vector = ($x_1$, $x_2$) = $r \angle \theta$
* Note: r can be thought of as $\beta$ and $\angle\theta$ as $\alpha$ in $y=\alpha X + \beta$
* $\hat{i} = (1, 0)$ unit vector (1 step)
* $\hat{j} = (0, 1)$ unit vector (1 step)
* $\vec{v} = x_1\hat{i} + x_2\hat{j}$; $v$ is also represented as bold
* vector addition is head to tail or elementwise i.e., $\sum({x_{1i} + x_{2i}})$
* dot product is elementwise or $\sum{x_{1i} x_{2i}}$
* or $||x|| * ||y|| * cos(\theta)$
* if $\vec{x_1} \vec{x_2} = 0$ (orthogonal) then $x_1$ and $x_2$ are uncorrelated
* rows (the numbers) in a dataframe (each row can be thought of as a sample) are examples of vectors

#### Vector Equation

* Vector equations ares used to represent the equation of a line or a plane with the help of the variables x, y, z.
* https://www.cuemath.com/algebra/vector-equation/
* $r = a + t\vec{AB}$
* https://www.w3schools.blog/cartesian-equation-and-vector-equation-of-a-line


#### Matrix or Matrices

* A dataframe (numerical) can be considered a matrix (rows by cols matrix is convention used in this class)
* Matrix representation of graph: rows = [a, b, c]; cols = [a, b, c]
* [a, a] = 1 (diagonal of 1s with b,b, c,c ...)
* And, for example, all off diagonals such as [a, b] = 0 etc
* Graph matrix
* Covariance matrix
* Probability matrix
* Vector matrix: by convention are represented by column vectors

In [None]:
import pandas as pd

data = {'a': [1, 0, 0], 'b': [0, 1, 0], 'c': [0, 0, 1]}
pd.DataFrame(data, index=['a', 'b', 'c'])

Unnamed: 0,a,b,c
a,1,0,0
b,0,1,0
c,0,0,1


#### Matrix Multiplication

* To multiply an m×n matrix by an n×p matrix, the ns must be the same,
and the result is an m×p matrix
* Instead of an element-wise product, matrix multiplication is more like a generalization of the dot product. Specifically, you we have $C = AB$, then $c_{ij}$ is the dot product between the ith row of A and the jth column of B

Sources:
* https://towardsdatascience.com/a-complete-beginners-guide-to-matrix-multiplication-for-data-science-with-python-numpy-9274ecfc1dc6
* https://mathinsight.org/matrix_vector_multiplication
* https://timeseriesreasoning.com/contents/deep-dive-into-variance-covariance-matrices/
* Deep Learning Courses by the Lazy Programmer

#### Dot Product

* The numpy dot() function returns the dot product of two arrays. The result is the same as the matmul() function for one-dimensional and two-dimensional arrays
* dot product is the sum of two same size matrices, $\sum{x_{1i} x_{2i}}$ vs. element-wise which returns a similar matrix
* Another approach: $||x|| * ||y|| * cos(\theta)$ (magnitude and direction or angle between vectors)

Source:

* https://www.digitalocean.com/community/tutorials/numpy-matrix-multiplication#:~:text=The%20numpy%20dot()%20function,dimensional%20and%20two%2Ddimensional%20arrays.

#### Identity Matrix

* In linear algebra, the identity matrix of size n is the n x n square matrix with ones on the main diagonal and zeros off diagonal. It has unique properties, for example when the identity matrix represents a geometric transformation, the object remains unchanged by the transformation. In other contexts, it is analogous to multiplying by the number 1.
* A square matrix * the inverse of itself equals the identity matrix

Source:
* https://en.wikipedia.org/wiki/Identity_matrix

#### Matrix Inverse

* Gaussian Elimination Method can be used to get the inverse of a matrix
* Matrix inverse of $X$ is $X^{-1}$
* $X^{-1}X = XX^{-1} = I$
* Inverse Example: $8x = 6$ is like saying $x = 6/8 = 6*8^{-1}$
* $1/4 * 4 = 4^{-1} * 4 = 1$
* Fractional exponents serve as roots e.g. since $4^3 = 64$, then $64^{1/3} = \sqrt[3]{64} = 4$

#### Matrix Transform

* Transforms rows to columns and vice versa
* A row vector gets transformed to a column vector
* Notation: matrix.T
* Serves the (m, n) * (n, p) requirement

In [None]:
# matmul
import numpy as np
import pandas as pd

x = np.array([1, 2, 3, 4, 5])
print(x)
y = np.array([[1], [3], [2], [3], [5]])
print(y)
print()
print('matrix(m, n) * matrix(n, p)')
print(np.matmul(x, y))

[1 2 3 4 5]
[[1]
 [3]
 [2]
 [3]
 [5]]

matrix(m, n) * matrix(n, p)
[50]


In [None]:
import numpy as np
import pandas as pd

def xy(r):
    return r.x * r.y

x = np.array([1, 2, 3, 4, 5])
print(x)
y = np.array([1, 3, 2, 3, 5])
print(y)
m_table = pd.DataFrame({'x': x, 'y': y})
print('table of sums')
N = len(x)

m_table['x^2'] = m_table['x'].apply(lambda x: x**2)
m_table['xy'] = m_table.apply(xy, axis=1)
print(m_table.to_string(index=False))
sums = list(m_table.sum())
print('sums:', sums)
print()

x = np.vstack([np.ones(len(x)).astype(int), x])
print('x')
print(x)

print('x.T')
print(x.T)
print()

print('y')
print(y)

# For a 1-D array, this returns an unchanged view of the original array, as a transposed vector is simply the same vector.
print('y.T')
print('no transform for single column: ', y.T)
print('reshape is used (-1, 1) row to col; (1, -1) col to row')
yreshape = y.reshape(-1, 1)
print(yreshape)
yreshape = yreshape.reshape(1, -1)
print(yreshape)
print()

print('matmul(x, x.T)')
print(np.matmul(x, x.T))
print('inverse of matmul(x, x.T)')
print(np.matrix(np.matmul(x, x.T)).I)
print('identity matrix')
print(np.matmul(np.matmul(x, x.T), np.matrix(np.matmul(x, x.T)).I).astype('int'))
print()

print('matmul(x, y.T)')
print(np.matmul(x, y.T))
print()

# coeffs = np.matmul(np.linalg.inv(np.matmul(x, x.T)), np.matmul(x, y.T))
coeffs = np.matmul(np.matrix(np.matmul(x, x.T)).I, np.matmul(x, y))
print('coeffs: ', coeffs)
print('np.matmul(np.linalg.inv(np.matmul(x, x.T)), np.matmul(x, y.T))')

[1 2 3 4 5]
[1 3 2 3 5]
table of sums
 x  y  x^2  xy
 1  1    1   1
 2  3    4   6
 3  2    9   6
 4  3   16  12
 5  5   25  25
sums: [15, 14, 55, 50]

x
[[1 1 1 1 1]
 [1 2 3 4 5]]
x.T
[[1 1]
 [1 2]
 [1 3]
 [1 4]
 [1 5]]

y
[1 3 2 3 5]
y.T
no transform for single column:  [1 3 2 3 5]
reshape is used (-1, 1) row to col; (1, -1) col to row
[[1]
 [3]
 [2]
 [3]
 [5]]
[[1 3 2 3 5]]

matmul(x, x.T)
[[ 5 15]
 [15 55]]
inverse of matmul(x, x.T)
[[ 1.1 -0.3]
 [-0.3  0.1]]
identity matrix
[[1 0]
 [0 1]]

matmul(x, y.T)
[14 50]

coeffs:  [[0.4 0.8]]
np.matmul(np.linalg.inv(np.matmul(x, x.T)), np.matmul(x, y.T))


#### Note

* Consider $y = \alpha X + \beta$ where $\beta = constant$ and inserted into $X$ and $\alpha$ is renamed to $\theta$
* So, $\theta X = y$
* $\theta = y X^{-1}$ or $\theta = X^{-1} y$
* np.matmul(np.linalg.inv(np.matmul(x, x.T)), np.matmul(x, y.T))

and so we have
* np.matmul(x, X.T)^-1 * np.matmul(x, y.T)

#### Another Example

In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 3, 2, 3, 5])
print('x: ', x)
print('y: ', y)

x:  [1 2 3 4 5]
y:  [1 3 2 3 5]


In [None]:
x = x.reshape(-1, 1)
x

array([[1],
       [2],
       [3],
       [4],
       [5]])

In [None]:
X = np.append(arr = np.ones((5, 1)).astype(int), values = x, axis = 1)
print(X)
print(y)
print('weights = ', np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y))

[[1 1]
 [1 2]
 [1 3]
 [1 4]
 [1 5]]
[1 3 2 3 5]
weights =  [0.4 0.8]


### The DataFrame as a Matrix

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import statsmodels.api as sm

advertising = pd.read_csv('https://raw.githubusercontent.com/gitmystuff/INFO4050/main/Datasets/Advertising.csv', usecols=['TV', 'radio', 'newspaper', 'sales'])
X_train, X_test, y_train, y_test = train_test_split(
    advertising.drop('sales', axis=1),
    advertising['sales'],
    test_size=0.25,
    random_state=42)

X_train.insert(0, 'const', 1)
X = X_train
y = y_train

model = sm.OLS(y, X).fit()
print('StatsModel')
print(model.params)
print()
print('Numpy Linear Algebra')
print('weights = ', np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y))
np.matmul(np.linalg.inv(np.matmul(X.values.T, X.values)), np.matmul(X.values.T, y))

StatsModel
const        2.778303
TV           0.045434
radio        0.191457
newspaper    0.002568
dtype: float64

Numpy Linear Algebra
weights =  [2.77830346e+00 4.54335586e-02 1.91456536e-01 2.56809082e-03]


array([2.77830346e+00, 4.54335586e-02, 1.91456536e-01, 2.56809082e-03])