# 4. Linear algebra operations

In [1]:
import numpy

## Basic operations

Linear algebra is a branch of mathematics dealing with vector spaces. Linear algebra operations such as transposes, dot products, matrix multiplications and others are often very useful when manipulating numeric datasets. Using these operations often allows us to avoid writing explicit loops, and thus make our code more readable, more concise and faster to execute.

In the following steps we'll implement linear regression using `numpy` linear algebra operations. We will start with the functional form of the regression: how the predicted target values depend on the regression coefficients the of features (aka predictors or regressors). 

Once we have the coefficients $\mathrm{\beta}$ and the matrix with the predictors $\mathrm{X}$ we can calculate predicted targets $\mathrm{\hat{y}}$ according to:

$$
\mathrm{\hat{y}} = \beta_0 + \mathrm{X}\mathrm{\beta}
$$

If we add an extra column with all $1$s to the matrix $\mathrm{X}$, we can write this without the intercept $\beta_0$:

$$
\mathrm{\hat{y}} = \mathrm{X}\mathrm{\beta}
$$

This operation is the matrix multiplication. In this case the first matrix is $N\times M$ where N is the number datapoints (rows) and $M$ the number of predictors (including the extra row of ones). The second matrix is $M\times 1$, so it is a column vector.

In this particular case, for each row of M we multiply it with $\beta$ and the sum these multiplications is our predicted $\hat{y}$ for each row. This can be written explicitly as:

$$
\mathrm{\hat{y}_i} = \sum_{m=1}^M X_{i,m} \beta_m
$$

We can see that the matrix multiplication version is much more concise. The same is true in numpy code: it's more concise to implement this using the matrix multiplication function `numpy.dot` than write a bunch of loops.


### Matrix multiplication / dot product

In numpy the concept of dot product (aka scalar product) is treated as
a special case of matrix multiplication. The numpy function `dot` for simple dot product between vectors, for multiplying a matrix by a vector, as well as for multyplying two matrices.

The definition of dot product between two vectors $u$ and $v$ is:

$$\langle u, v\rangle = \sum_{i=1}^N u_i v_i$$

Other notations that you will come across for this operation are:
$u \cdot v$, $u^T v$.

In numpy we write
```python
numpy.dot(u, v)
```
or
```python
u.dot(v)
```

#### Exercise 4.1

Create two vectors of random values between -10 and 10 of size 100. Compute:
- elementwise product between them
- dot (scalar) product between them.


In [7]:
vector1 = numpy.random.uniform(-10, 10, 100)
vector2 = numpy.random.uniform(-10, 10, 100)

In [9]:
elementwise = vector1 * vector2
print(elementwise)
print()

print(numpy.sum(elementwise))
print()

dotproduct = numpy.dot(vector1, vector2)
print(dotproduct)
print(vector1.dot(vector2))

[ 71.89905849 -29.97089273  46.85974421  -0.92662335  11.48213135
  -1.82677259   2.4774272    0.88666023 -33.84688977  29.13192863
 -14.5315394  -68.48000924 -13.76156706 -22.38400592   7.4246125
   0.23607065  -1.5575897   54.94812788  44.63145533 -41.48684576
   3.80942059  19.94159908   1.58541186  50.50875734  12.15777918
  -4.79830196 -22.14356974 -23.75486891  24.9111864  -19.42273337
  27.19236953 -20.62450484  50.58068856   6.06303123 -22.95721795
  78.15514549  -6.21175438   6.92033924  60.17432251  32.04992027
  37.75887267  19.79863895  28.45387282   5.81796778   0.09236031
  -4.73340016   8.36653219   1.12055304  12.70354075  48.49062435
  -5.24966374  57.109445   -13.03395999  11.22561044 -17.42709249
   1.02872049  -5.78227579  35.49944583  49.5680693    8.22848225
  15.04297902  79.86328801   4.84077165  26.25358209 -10.24992539
  -5.43039007 -23.77158702 -25.96816011  -3.93948138  23.64052795
 -15.35472174 -59.76996267  61.96443482 -47.0067051   16.18959042
  10.211127

When multiplying two matrices, the number of columns in the first one needs to be equal to the number of rows in the second one. For matrices $A_{m\times n}$ and $B_{n \times p}$, the resulting matrix will be $C_{m \times p}$. It is defined as:

$$
C_{i,j} = \sum_{k=1}^n A_{i,k}B_{k,j}
$$
![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/eb/Matrix_multiplication_diagram_2.svg/470px-Matrix_multiplication_diagram_2.svg.png)

In `numpy` we simply use `dot`:

```python
numpy.dot(A, B)
```
or

```python
A.dot(B)
```

### Exercise 4.2

- Create a random matrix $A_{3\times 4}$ and another random matrix $B_{4 \times 2}$. Multiply AB.  

In [18]:
matrixA = numpy.random.random((3,4))
matrixB = numpy.random.random((4,2))

print(matrixA)
print(matrixA.shape)
print()
print(matrixB)
print(matrixB.shape)
print()

print(numpy.dot(matrixA, matrixB))
print(numpy.dot(matrixA, matrixB).shape)

[[0.65560961 0.42015041 0.10455039 0.11892487]
 [0.27546937 0.26843849 0.02512957 0.93561529]
 [0.07162822 0.23465553 0.34877245 0.16969463]]
(3, 4)

[[0.6371753  0.94749228]
 [0.06387346 0.72932339]
 [0.74600749 0.00136171]
 [0.6921111  0.42085953]]
(4, 2)

[[0.60487931 0.9778036 ]
 [0.85896494 0.8505804 ]
 [0.43826239 0.31089948]]
(3, 2)


- Create a random matrix $C_{3\times 3}$ and $D_{3\times3}$. Multiply CD. Multiply DC. Is matrix multiplication commutative?

In [12]:
matrixC = numpy.random.random((3,3))
matrixD = numpy.random.random((3,3))

print(matrixC)
print()
print(matrixD)
print()

print(numpy.dot(matrixC, matrixD))
print()
print(numpy.dot(matrixD, matrixC))

#matrix is not commutative


[[0.11265491 0.99124785 0.8593189 ]
 [0.36266017 0.62802979 0.90182206]
 [0.57762511 0.62624552 0.49789601]]

[[0.41416825 0.55967315 0.74647548]
 [0.44857138 0.18367017 0.76003362]
 [0.58020062 0.95007136 0.44800592]]

[[0.98988085 1.06152686 1.22245577]
 [0.95515623 1.17511681 1.1520623 ]
 [0.80902937 0.91134063 1.13021099]]

[[0.68081223 1.22951173 1.23229535]
 [0.55615813 1.03596341 0.92952138]
 [0.66869496 1.45235744 1.57843302]]


- Create a identity matrix $I_{3\times 3}$.  Multiply IC, CI, DI, ID. What do you notice?

In [23]:
matrixI = numpy.eye(3)

print(matrixI)
print()

print(numpy.dot(matrixI, matrixC))
print()
print(numpy.dot(matrixC, matrixI))
print()
print(matrixI.dot(matrixC) == matrixC.dot(matrixI))

print()


print(numpy.dot(matrixD, matrixI))
print()
print(numpy.dot(matrixI, matrixD))

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

[[0.11265491 0.99124785 0.8593189 ]
 [0.36266017 0.62802979 0.90182206]
 [0.57762511 0.62624552 0.49789601]]

[[0.11265491 0.99124785 0.8593189 ]
 [0.36266017 0.62802979 0.90182206]
 [0.57762511 0.62624552 0.49789601]]

[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]

[[0.41416825 0.55967315 0.74647548]
 [0.44857138 0.18367017 0.76003362]
 [0.58020062 0.95007136 0.44800592]]

[[0.41416825 0.55967315 0.74647548]
 [0.44857138 0.18367017 0.76003362]
 [0.58020062 0.95007136 0.44800592]]


In [25]:
##check if matrixes are approxamitly the same with allclose(u, v)

print(matrixI.dot(matrixC)==matrixC)
print()

print(numpy.allclose(matrixI.dot(matrixC), matrixC))

[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]

True


- What will be the result of multiplying a matrix $Z_{m\times n}$ by a matrix $O_{n \times p}$ whose all entries are zero? Check your answer using some examples in `numpy`.

In [28]:
Z = numpy.random.normal(0,1,(4,3))
O = numpy.zeros((3,2))

Z.dot(O)

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

### Exercise 4.3

- Create a random matrix $A_{3\times 4}$ and another random matrix $B_{2 \times 4}$. Can you transform one of them so that they can be multiplied? Try this in `numpy`.

In [26]:
rand_mat_a = numpy.random.random((3,4))
rand_mat_b = numpy.random.random((2,4))
rand_mat_b.T

numpy.dot(rand_mat_a, rand_mat_b.T)

array([[0.93767016, 1.17331324],
       [1.0289241 , 1.03986877],
       [0.81818565, 1.3285594 ]])

## Transpose

We have already encountered matrix transpose. The mathematical notation for the transpose of matrix $A$ is $A^T$. Transposing a matrix simply means making the rows into columns and columns into rows. If $A$ is $m \times n$ then $A^T$ is $n \times m$. The values are:

$$A^T_{i,j} = A_{j,i}$$

In `numpy` the transpose is simply written `A.T`.

### Exercise 4.4

- Create a random $4 \times 5$ matrix and verify that the above equality holds for it.
- What would be the outcome of $(A^T)^T$? Check this in `numpy`.

In [36]:
mat4_4 = numpy.random.random((4,5))
print(mat4_4)
print(mat4_4.shape)
print()
print(mat4_4.T)
print(mat4_4.T.shape)

print()
for i in range(4):
    for j in range(5):
        print(i,j, mat4_4[i,j] == mat4_4.T[j,i])

[[0.95809782 0.95767047 0.92541031 0.23153246 0.71233035]
 [0.47883628 0.64685903 0.85492462 0.10493499 0.48859102]
 [0.44536807 0.90865784 0.9212215  0.28198822 0.9832023 ]
 [0.7839081  0.02105208 0.99672151 0.25755787 0.95125631]]
(4, 5)

[[0.95809782 0.47883628 0.44536807 0.7839081 ]
 [0.95767047 0.64685903 0.90865784 0.02105208]
 [0.92541031 0.85492462 0.9212215  0.99672151]
 [0.23153246 0.10493499 0.28198822 0.25755787]
 [0.71233035 0.48859102 0.9832023  0.95125631]]
(5, 4)

0 0 True
0 1 True
0 2 True
0 3 True
0 4 True
1 0 True
1 1 True
1 2 True
1 3 True
1 4 True
2 0 True
2 1 True
2 2 True
2 3 True
2 4 True
3 0 True
3 1 True
3 2 True
3 3 True
3 4 True


In [33]:
print((mat4_4.T).T)

print()
print(numpy.allclose(mat4_4.T.T, mat4_4))

[[8.41278646e-01 8.33606630e-01 8.69349008e-01 9.05895523e-01
  4.54945531e-01]
 [5.29361258e-01 2.82692566e-01 9.52271245e-02 6.95917076e-01
  3.73916163e-01]
 [4.54612356e-01 5.26531609e-01 3.34487212e-01 5.24233902e-01
  1.34005407e-01]
 [1.59038396e-01 9.53777125e-01 4.45535997e-01 8.41448669e-04
  3.03401956e-01]]

True


## Inverse

For scalar numbers the multiplicative inverse (aka reciprocal) of number $n$ is $\frac{1}{n}$, also written as $n^{-1}$. This inverse has certain properties, like:
- $n^{-1}n = 1$
- $(n^{-1})^{-1} = n$

There is an analogous concept for matrices. For a square matrix $A_{m \times m}$, its inverse is written $A^{-1}$ and it satisfies:

- $A^{-1}A = I$ where $I$ is the $m \times m$ identity matrix
- $(A^{-1})^{-1} = A$
- $(A^T)^{-1} = (A^{-1})^T$

Not all matrices are invertible: a matrix needs to be square, and its [determinant](https://en.wikipedia.org/wiki/Determinant) needs to be non-zero. There is a function to invert matrices in `scipy.linalg` called `inv`.


In [38]:
from scipy.linalg import inv

A = numpy.random.uniform(0,1,(3,3))

print(A)
print()
print(inv(A))

[[0.13189452 0.57748782 0.98073518]
 [0.46598057 0.2173346  0.42080987]
 [0.6876802  0.81984538 0.63122127]]

[[-1.04987823  2.22051533  0.15087855]
 [-0.02401854 -2.98665154  2.0283985 ]
 [ 1.17497936  1.46000787 -1.21467599]]


- $A^{-1}A = I$ where $I$ is the $m \times m$ identity matrix
- $(A^{-1})^{-1} = A$
- $(A^T)^{-1} = (A^{-1})^T$

### Exercise 4.5

Verify the three properties of the matrix inverse operations listed above for a random $m \times m$ numpy matrix. 

In [44]:
print(numpy.dot(inv(A), A))
print(numpy.allclose(inv(A).dot(A), numpy.eye(3)))
print()

print(inv(inv(A)))
print(numpy.allclose(inv(inv(A)), A ))
print()



print(inv(A.T))
print()
print(inv(A).T)
print(numpy.allclose(inv(A.T), inv(A).T))
print()



[[ 1.00000000e+00  6.32949196e-17 -2.41419601e-16]
 [ 1.70400560e-16  1.00000000e+00 -1.40437144e-17]
 [-4.05736850e-17  5.09995450e-17  1.00000000e+00]]
True

[[0.13189452 0.57748782 0.98073518]
 [0.46598057 0.2173346  0.42080987]
 [0.6876802  0.81984538 0.63122127]]
True

[[-1.04987823 -0.02401854  1.17497936]
 [ 2.22051533 -2.98665154  1.46000787]
 [ 0.15087855  2.0283985  -1.21467599]]

[[-1.04987823 -0.02401854  1.17497936]
 [ 2.22051533 -2.98665154  1.46000787]
 [ 0.15087855  2.0283985  -1.21467599]]
True



## Ordinary Least Squares formula for Linear Regression

We are now ready to implement the formula which can be used to find the coefficients of linear regression:

$$\hat\beta = (X^TX)^{-1}X^Ty$$

Remember that $X$ has $N$ rows corresponding to the $N$ datapoints, and $M$ columns corresponding to the $M$ predictors. The formula defines the vector $\hat\beta$ with the $M$ regression coefficients.

We will apply this formula to the winequality dataset.

Previously we loaded this dataset into a structured array.


In [46]:
# Load winequality as a structured array
data = numpy.genfromtxt("winequality-red.csv", names=True, delimiter=';')
# Convert the array into a matrix. We'll have the target in the last column
print(data.shape)

(1599,)


In [51]:
# Convert structured array into array of numeric values
Xy = data.view((data.dtype[0], len(data.dtype)))  #View changes how we view the underlying data
print(Xy.shape)

print(data.dtype[0])
print(len(data.dtype))

(1599, 12)
float64
12


In [48]:
# Extract X and y from Xy
X = Xy[:,:-1]
y = Xy[:,-1:]
print(X.shape)
print(y.shape)
print(X)

(1599, 11)
(1599, 1)
[[ 7.4    0.7    0.    ...  3.51   0.56   9.4  ]
 [ 7.8    0.88   0.    ...  3.2    0.68   9.8  ]
 [ 7.8    0.76   0.04  ...  3.26   0.65   9.8  ]
 ...
 [ 6.3    0.51   0.13  ...  3.42   0.75  11.   ]
 [ 5.9    0.645  0.12  ...  3.57   0.71  10.2  ]
 [ 6.     0.31   0.47  ...  3.39   0.66  11.   ]]


In [49]:
X_new = numpy.hstack([ numpy.ones((1599,1)), X ])
print(X_new)

[[ 1.     7.4    0.7   ...  3.51   0.56   9.4  ]
 [ 1.     7.8    0.88  ...  3.2    0.68   9.8  ]
 [ 1.     7.8    0.76  ...  3.26   0.65   9.8  ]
 ...
 [ 1.     6.3    0.51  ...  3.42   0.75  11.   ]
 [ 1.     5.9    0.645 ...  3.57   0.71  10.2  ]
 [ 1.     6.     0.31  ...  3.39   0.66  11.   ]]


### Exercise 4.6

Implement function `fit` which takes a matrix of predictors and a vector of targets, and returns the vector of regression coefficients computed according to the OLS formula.
Apply this function to the winequality data.

$$(X^T X)^{-1} X^T y$$

In [61]:
def fit(X, y):
    
    beta = inv(X.T.dot(X)).dot(X.T).dot(y)
    return beta
    
    

print(fit(X_new, y))
print(fit(X_new, y).shape)

[[ 2.19652084e+01]
 [ 2.49905527e-02]
 [-1.08359026e+00]
 [-1.82563948e-01]
 [ 1.63312698e-02]
 [-1.87422516e+00]
 [ 4.36133331e-03]
 [-3.26457970e-03]
 [-1.78811638e+01]
 [-4.13653144e-01]
 [ 9.16334413e-01]
 [ 2.76197699e-01]]
(12, 1)


### Exercise 4.7

Implement function `predict` which takes a vector of coefficients and a vector of predictors, and returns the predicted targets according to the regression formula (see beginning of notebook). Apply this function to the coefficients from the previous exercise, and the winequality data.

In [74]:
def predict(beta, X):
    
    return X.dot(beta)
    
y_pred = predict(fit(X_new, y), X_new)

print(y_pred)
print(len(y_pred))


[[5.03285045]
 [5.13787974]
 [5.20989474]
 ...
 [5.94304255]
 [5.47075621]
 [6.00819633]]
1599


### Exercise 4.8

Define the following two functions to quantify how well the regression is able to predict the targets:
- `mse` - mean squared error, defined as the mean of the squared difference between each prediction and true target: $$MSE(y, \hat{y}) = \frac{1}{N}\sum_{i=1}^N (y_i-\hat{y}_i)^2$$
- `mae` - mean absolute error, defined as the mean of the absolute difference between each prediction and true target: $$MAE(y, \hat{y}) = \frac{1}{N}\sum_{i=1}^N abs(y_i-\hat{y}_i)$$

Check how well your regression functions predict the targets in winequality according to these error measures.

In [75]:
#mse

def mse(ytrue, ypred):
    
    return  numpy.square(ytrue - ypred).mean() 
    
mse(y, y_pred)

0.41676716722140794

In [76]:
#mae

def mae(ytrue, ypred):
    
    return  numpy.abs(ytrue - ypred).mean() 
    
mae(y, y_pred)

0.5004899634722085

### Exercise 4.9

- Load the iris data and extract the first three column into a predictor matrix, and the fourth column into a target vector. Apply the functions `fit` and `predict` to this data, and check the MSE and MAE of your predictions.

In [81]:
dtype = [('sepallength', 'float64'), ('sepalwidth', 'float64'), ('petallength', 'float64'), ('petalwidth', 'float64'), ('species', 'U10')]
irisa = numpy.loadtxt('irisa.txt', dtype=dtype)
irisb = numpy.loadtxt('irisb.txt', dtype=dtype)
irisc = numpy.loadtxt('irisc.txt', dtype=dtype)

iris = numpy.hstack([irisa, irisb, irisc])
print(iris.shape)

(150,)


In [79]:
# Convert structured array into array of numeric values
Xy = iris.view((iris.dtype[0], len(iris.dtype)))  #View changes how we view the underlying data
print(Xy.shape)

print(iris.dtype[0])
print(len(iris.dtype))

ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged