- This notebook serves as a reference material and also an introduction to Linear algebra specified in this book
[Deep Learning book](https://www.deeplearningbook.org/)

- We will be going throught the concepts and also try to implement the mathematical aspects using Numpy (A python Library)

### <b>2.1 Basics:</b>

In [1]:
import numpy as np 
import math

In [2]:
scalar = [1]
scalar

[1]

In [3]:
vector = [1,2,3] #array of numbers
vector

[1, 2, 3]

In [5]:
matrix = [[1,2],[3,4],[5,6]] #it is a 2d array of numbers
matrix

[[1, 2], [3, 4], [5, 6]]

In [6]:
tensor = [[[1,2]
          ,[3,4]]
         ,[[5,6]
          ,[7,8]]]
tensor #array with more than 2 axes

[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]

In [9]:
print(f"Shape of the scalar : {np.array(scalar).shape}\n Shape of vector : {np.array(vector).shape}\nShape of matrix : {np.array(matrix).shape}\nShape of Tensor:  {np.array(tensor).shape}")

Shape of the scalar : (1,)
 Shape of vector : (3,)
Shape of matrix : (3, 2)
Shape of Tensor:  (2, 2, 2)


- Rank : number of dimensions or axes an object has
- Shape : number of elements across each dimension

In [11]:
x = np.array([1,2,3])
x.ndim #ndim is the numpy function which tells us regarding the rank

1

In [14]:
x1 = np.array([[1,2,3],[4,5,6]])
x1.shape #tells us the shape
# here we have 2 rows and 3 columns

(2, 3)

### <b>Conventions and accessing examples:</b>

- A[i,j]
- A[:,j]
- A[i,:]
- A[:,:]

In [15]:
A = np.arange(12).reshape(3,4)
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [17]:
A[1,3] #accessing the ith row , jth column

7

In [19]:
A[:,2] # ":" represent the entire axes

array([ 2,  6, 10])

In [20]:
A[:,:] # all the elements

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [21]:
A[2,:]# 2nd row , entire axes

array([ 8,  9, 10, 11])

In [68]:
b =5
A = np.array([[1,1],[1,1]])
A+b

array([[6, 6],
       [6, 6]])

- How does the above operation take place?
- This is called <b>broadcasting</b>
- Where 'b' is implicitly copied to many locations.
- Simply put 'b' is copied onto each row before doing the addition.

### <b>2.2 Multiplying matrices and vectors:</b>

The notation is C=AB or A*B

In [22]:
A = np.array([[1,2],[3,4],[5,6]])
B = np.array([[2,4,6],[1,2,3]])
C = A*B # this tries to do element wise multiplication

ValueError: operands could not be broadcast together with shapes (3,2) (2,3) 

In [23]:
C = np.matmul(A,B) # matmul refers to matrix multiplication
C

array([[ 4,  8, 12],
       [10, 20, 30],
       [16, 32, 48]])

In [24]:
#Addition of arrays
print(B+C) #error would arise as the shapes arent similar

ValueError: operands could not be broadcast together with shapes (2,3) (3,3) 

In [27]:
# A*(B*C) == (A*B)*C the associative property of matrices
np.matmul(A,np.matmul(B,C)) == np.matmul(C,np.matmul(A,B))

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [28]:
#matrix multiplication is not commutative always
# however the dot product is commutative
np.matmul(A,B) == np.matmul(B,A)

  np.matmul(A,B) == np.matmul(B,A)


False

In [35]:
#Transpose of a matrix
# transpose flips a matrix alongside its diagonal
print(f"A={A}\nTranspose of A={np.transpose(A)}")

A=[[1 2]
 [3 4]
 [5 6]]
Transpose of A=[[1 3 5]
 [2 4 6]]


#### Matrix mulitiplication functions in numpy

In [42]:
C_ = A@B
C1 = np.dot(A,B)
C2 = np.matmul(A,B)

In [55]:
C1 == C2

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

- This means that dot product and matrix multiplication is the same
  which is not true
- Lets dive a little further

In [40]:
C_,C1,C2

(array([[ 4,  8, 12],
        [10, 20, 30],
        [16, 32, 48]]),
 array([[ 4,  8, 12],
        [10, 20, 30],
        [16, 32, 48]]),
 array([[ 4,  8, 12],
        [10, 20, 30],
        [16, 32, 48]]))

- All three functions are used for matrix multiplication (are they?)

In [43]:
M1 = np.array([[1,2],[3,4]])
M2 = np.array([[5,6],[7,8]])
M3 = np.multiply(M1,M2)
M3

array([[ 5, 12],
       [21, 32]])

- Well now it seems like even np.multiply does the same 
- np.multiply also does the same matrix multiplication
- So where or how does all these functions differ from one another or are they same and are just referred by different names

np.multiply is the numpy alternative to basic * in python

In [101]:
A = np.array([1,2,3])
B = np.array([4,5,6])
np.multiply(A,B)

array([ 4, 10, 18])

- np.dot = This function is used to find the dot product of two arrays
- If the dimension of the array is 2D or higher, make sure the number of columns of the first array matches up with the number of rows in the second array.

In [44]:
np.dot(1,2)

2

In [45]:
np.dot([1,2],[3,4])

11

In [63]:
np.dot([1,2],[2,3,4])

ValueError: shapes (2,) and (3,) not aligned: 2 (dim 0) != 3 (dim 0)

- Hence we can tell that np.dot can take place only when the dimensions of the vectors or the matrices are same.

The difference between matrix multiplication and dot products needs to be understood.

In dot product
- It is an algebraic operation that takes two same-sized vectors and returns a single number.

In matrix multiplication
- This is a matrix version of a dot product.
- Simply the dot product repeated as many times as needed.
- Result of a matrix multiplication is a matrix.
- Where the resulted matrix consists of elements whose elements are dot products of pairs of vectors in each matrix.

In [70]:
A = [[1,2],[3,4],[5,6],[7,8]]
B = [[1,2,3],[4,5,6]]
C = np.matmul(A,B)
C

array([[ 9, 12, 15],
       [19, 26, 33],
       [29, 40, 51],
       [39, 54, 69]])

In [71]:
C = np.dot(A,B)
C

array([[ 9, 12, 15],
       [19, 26, 33],
       [29, 40, 51],
       [39, 54, 69]])

- From the above two cells we are able to infer that both <b>np.dot</b> and <b>np.matmul</b> gives the same results
- Which means they are the same !
- But they aren't ...

<b>matmul differs from dot in two important ways:</b>

- <b>Multiplication by scalars is not allowed</b> use * instead.

- Stacks of matrices are broadcast together as if the matrices were elements

In [76]:
a = np.ones([9, 5, 7, 4])
# 9 elements in 1D , 5 elements in 2D , 7 in 3D , 4 IN 4D
c = np.ones([9, 5, 4, 3])
np.dot(a,c).shape == np.matmul(a,c).shape

False

In [80]:
print(np.dot(a,c).shape)
print(np.matmul(a,c).shape)

(9, 5, 7, 9, 5, 3)
(9, 5, 7, 3)


- Here we get to see .dot produces a 6d array whereas .matmul produces a 4d array
- Simply put in the normal matrix multiplication situation where we want to treat each stack of matrices in the last two indexes, we should use matmul

Summing it up
- '*' == ```np.multiply``` != ```np.dot``` != ```np.matmul```
- ```np.multiply``` needs to be used with ```np.sum``` to perform dot product
- ```np.dot``` can be used for dot product and matrix multiplication however not recommended for matrix multiplication
- ```np.matmul``` or ```@``` can be used for matrix multiplication

### 2.3 Identity and Inverse Matrices

In [103]:
A = np.array([[1,2],[3,4]])
Identity = np.matmul(np.invert(A),A) # you dont use the invert function to calculate the inverse
Identity 

array([[-11, -16],
       [-19, -28]])

In [105]:
Identity = np.matmul(np.linalg.inv(A),A) 
Identity # it still isnt the identity matrix , cause of the multiplication processes (float values are never really the same)

array([[1.0000000e+00, 4.4408921e-16],
       [0.0000000e+00, 1.0000000e+00]])

### <b>2.4 Linear Dependence and Span:</b>

For A^-1 to exist there must be exactly one solution for every value of b where
x = inv(A)*b

The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors

In [49]:
#linear combination
v1 = np.array([0, 0, 1])
v2 = np.array([1,0,0])
v3 = np.array([0, 1, 0])
X = ([v1,v2,v3])
y = ([1,2,3])
scalars = np.linalg.solve(X,y) #v1*1 + v2*2 + v3*3 is the linear combination 
scalars

array([2., 3., 1.])

### <b>2.5 Norm:</b>

The size of vectors are found by using a function called norm.
- The default norm is L2 (eucledian norm), denoted by ||x||
- The maximum norm is used to find the absolute value of the vector with the largest magnitude
- The frobenius norm is used to measure the size of the matrix

In [112]:
A = np.array([1,2,3])
norm1 = np.linalg.norm(A) #defualt it takes L2 norm , we can also specify the norm 
norm2 = np.linalg.norm(A,ord=np.inf)
#norm3 = np.linalg.norm(A,ord='fro') #results in error
# as frobenius norm is calculated for matrices and not vectors
print(norm1)
print(norm2)
#print(norm3)

3.7416573867739413
3.0


In [113]:
B = np.array([[1,2,3],[4,5,6]])
norm3 = np.linalg.norm(B,ord="fro")
print(norm3)

9.539392014169456


### <b>2.6 Special Kinds of matrices and Vectors</b>

- Diagonal matrix
- Symmetric matrix -  the transpose of matrix is same as the matrix then its symmetric 
- Unit matrix
- Orthonormal matrix
- Orthogonal matrix

In [114]:
np.ones([2,2])

array([[1., 1.],
       [1., 1.]])

In [115]:
I = np.identity(4)
I

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [116]:
np.diag(I)

array([1., 1., 1., 1.])

- A vector x and y are orthogonal to each other if xTy=0
- If the vectors not only are orthogonal but also have unit norm, we call them orthonormal.
- An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal.

### <b>2.7 Eigen Decomposition:</b>
- Matrix is decomposed into set of eigenvalues and eigenvectors.
- Av = λv , where v is the eigenvector and λ is the eigenvalue.

In [117]:
A = np.random.rand(5,5)
eig_val = np.linalg.eigvals(A)
eig_val #eigenvalue

array([ 2.25928832+0.j       ,  0.64688344+0.j       ,
       -0.10298774+0.1777758j, -0.10298774-0.1777758j,
        0.11169926+0.j       ])

In [118]:
eig_val,eig_vec = np.linalg.eig(A)
eig_vec #eigen vector

array([[ 0.3559964 +0.j        ,  0.54323152+0.j        ,
         0.10042369-0.12050135j,  0.10042369+0.12050135j,
         0.00462908+0.j        ],
       [ 0.33774481+0.j        , -0.39532763+0.j        ,
         0.68924997+0.j        ,  0.68924997-0.j        ,
        -0.83210198+0.j        ],
       [ 0.38237906+0.j        , -0.27375315+0.j        ,
        -0.6082103 +0.14308088j, -0.6082103 -0.14308088j,
        -0.37034549+0.j        ],
       [ 0.52283658+0.j        , -0.66226292+0.j        ,
         0.1009537 -0.13286063j,  0.1009537 +0.13286063j,
         0.14307614+0.j        ],
       [ 0.58277198+0.j        ,  0.18730356+0.j        ,
         0.26582772+0.10690699j,  0.26582772-0.10690699j,
         0.38724449+0.j        ]])

### <b>2.8 Single Value Decomposition:</b>
- If a matrix is not square then we should use this method instead of eigen decomposition as it is not defined

In [5]:
A = np.arange(24).reshape(4,6)
A

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

In [9]:
U,s,vh = np.linalg.svd(A,full_matrices=True) #full_matrices is optional
print("U=",U,"\n","s=",s,"\n","vh=",vh)

U= [[-0.09979291 -0.83068729 -0.44166769 -0.32392846]
 [-0.32237784 -0.44280078  0.34744824  0.76110428]
 [-0.54496277 -0.05491426  0.63010661 -0.55042317]
 [-0.76754771  0.33297225 -0.53588715  0.11324735]] 
 s= [6.56235089e+01 4.18987869e+00 2.92322821e-15 8.22506300e-16] 
 vh= [[-0.33965997 -0.36609381 -0.39252765 -0.41896148 -0.44539532 -0.47182916]
 [ 0.6390936   0.40151391  0.16393422 -0.07364547 -0.31122516 -0.54880485]
 [ 0.66952298 -0.55398193 -0.18599188 -0.28491678 -0.00387801  0.35924562]
 [ 0.0332819  -0.42916808  0.29364362  0.01097679  0.64737838 -0.55611261]
 [ 0.12437969  0.31988869 -0.80745574  0.02972545  0.46146314 -0.12800122]
 [-0.10654323  0.33369434  0.21504253 -0.85840897  0.26962913  0.1465862 ]]


### <b>2.9 Moore-Penrose Pseudoinverse:</b>
- Used to find the inverse of non invertible matrices.
- Inverse to used to solve a system of equations.
- therefore by using this we will be able to find a value which is almost the solution.

In [17]:
M = np.arange(6).reshape(2,3)
M

array([[0, 1, 2],
       [3, 4, 5]])

In [18]:
#moore penrose pseudo inverse
M1 = np.linalg.pinv(M)
M1

array([[-0.77777778,  0.27777778],
       [-0.11111111,  0.11111111],
       [ 0.55555556, -0.05555556]])

In [21]:
M1.dot(M) #which gives a result which is quite close to the pseudo inverse.

array([[ 0.83333333,  0.33333333, -0.16666667],
       [ 0.33333333,  0.33333333,  0.33333333],
       [-0.16666667,  0.33333333,  0.83333333]])

### <b>2.10 Trace Operator:</b>
- Trace is the sum of all the elements in the diagonal of the square matrix.
- trace of matrix is equal to its transpose

In [26]:
M2 = np.arange(9).reshape(3,3)
M2

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [27]:
trace1 = M2.trace()
trace1

12

### <b>2.11 Determinant:</b>
- it is equal to the products of all the eigenvalues in the matrices.
- The determinant lets us know about the transformations of the matrix.

In [28]:
A = np.ones(4).reshape(2,2)
A

array([[1., 1.],
       [1., 1.]])

In [30]:
A_det = np.linalg.det(A)
A_det

0.0

### <b>2.12 Principal Component Analysis:</b>

- Dimensions refers to the number of features in a dataset.
- The aim of PCA is to reduce the number of dimensions in the dataset while also retaining important information.

Steps to calculate PCA:
- Normalization of the data
- Computing the covariance matrix
- Calculating the eigenvectors and eigenvalues
- Computing the Principal Components
- Reprojecting the data

Let us try to manually calculate PCA using numpy

In [32]:
M = np.arange(12).reshape(4,3) #defining the data 
M

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [34]:
M_mean = np.mean(M.T, axis=1)#calculate the mean of each column
M_mean

array([4.5, 5.5, 6.5])

In [35]:
M_center = M - M_mean #centering the columns by subtracting column means
M_center

array([[-4.5, -4.5, -4.5],
       [-1.5, -1.5, -1.5],
       [ 1.5,  1.5,  1.5],
       [ 4.5,  4.5,  4.5]])

In [36]:
M_cov = np.cov(M_center.T)# covariance matrix of the centered matrix
M_cov

array([[15., 15., 15.],
       [15., 15., 15.],
       [15., 15., 15.]])

In [37]:
eig_val,eig_vec = np.linalg.eig(M_cov) # finding the eigenvalues and eigenvectors
print(f"Eigenvalues={eig_val}\nEigenvectors={eig_vec}")

Eigenvalues=[ 4.50000000e+01 -9.86076132e-32 -1.56070767e-15]
Eigenvectors=[[ 5.77350269e-01 -8.21529965e-17  7.46872987e-01]
 [ 5.77350269e-01 -7.07106781e-01 -6.59155859e-01]
 [ 5.77350269e-01  7.07106781e-01 -8.77171288e-02]]


In [41]:
#projecting the reduced data
M_red = eig_vec.T.dot(M_center.T)
M_red.T # transposing the matrix for easier understanding

array([[-7.79422863e+00,  1.66533454e-16, -6.24500451e-16],
       [-2.59807621e+00,  5.55111512e-17, -2.08166817e-16],
       [ 2.59807621e+00, -5.55111512e-17,  2.08166817e-16],
       [ 7.79422863e+00, -1.66533454e-16,  6.24500451e-16]])

- Calculating PCA usign sklearn

In [42]:
from sklearn.decomposition import PCA
pca = PCA(3) # creating an instance
pca.fit(M)
M_red1 = pca.transform(M)
M_red1

array([[ 7.79422863e+00, -1.69309011e-15,  2.77555756e-16],
       [ 2.59807621e+00, -6.38378239e-16,  1.66533454e-16],
       [-2.59807621e+00,  6.38378239e-16, -1.66533454e-16],
       [-7.79422863e+00,  1.69309011e-15, -2.77555756e-16]])

- Here we get to see with minor changes we get to achieve the same reduced data using both methods.