# [Comprehensive guide to linear algebra](https://www.analyticsvidhya.com/blog/2017/05/comprehensive-guide-to-linear-algebra/)



## 1. Motivation – Why learn Linear Algebra?

 4 scenarios to showcase why learning Linear Algebra is important
 
#### Scenario 1
![](flw.PNG)

What do you see when you look at the image above? You most likely said flower, leaves -not too difficult. But, if I ask you to write that logic so that a computer can do the same for you – it will be a very difficult task.

You probably know that computers of today are designed to process only 0 and 1. So how can an image such as above with multiple attributes like colour be stored in a computer? This is achieved by storing the pixel intensities in a construct called Matrix. Then, this matrix can be processed to identify colours etc.

So any operation which you want to perform on this image would likely use Linear Algebra and matrices at the back end.

#### Scenario 2
If you are somewhat familiar with the Data Science domain, you might have heard about the world “XGBOOST” – an algorithm employed most frequently by winners of Data Science Competitions. It stores the numeric data in the form of Matrix to give predictions. It enables XGBOOST to process data faster and provide more accurate results. Moreover, not just XGBOOST but various other algorithms use Matrices to store and process data.

#### Scenario 3:
Deep Learning- the new buzz word in town employs Matrices to store inputs such as image or speech or text to give a state-of-the-art solution to these problems. Weights learned by a Neural Network are also stored in Matrices. Below is a graphical representation of weights stored in a Matrix.

![](wts.PNG)

#### Scenario 4:
Another active area of research in Machine Learning is dealing with text and the most common techniques employed are Bag of Words, Term Document Matrix etc. All these techniques in a very similar manner store counts(or something similar) of words in documents and store this frequency count in a Matrix form to perform tasks like Semantic analysis, Language translation, Language generation etc.

***

![](2.PNG)


if we add an extra variable z then our efforts will be increased tremendously for finding the solution of the problem. Now imagine having 10 variables and 10 equations. Solving 10 equations simultaneously can prove to be tedious and time consuming. Now dive into data science. We have millions of data points. 

Matrix is used to solve a large set of linear equations. 

Basically, a linear equation in three variables represents a plane. More technically, a plane is a flat geometric object which extends up to infinity.

As in the case of a line, finding solutions to 3 variables linear equation means we want to find the intersection of those planes. Now can you imagine, in how many ways a set of three planes can intersect? Let me help you out. There are 4 possible cases –

1. No intersection at all.
2. Planes intersect in a line.
3. They can intersect in a plane.
4. All the three planes intersect at a point.

![](mat1.PNG)

### Terms related to matrix

![](terms.PNG)



In [1]:

'''
Find Transpose of Matrix in Python
'''
 
import numpy as np
#create a 3*3 matrix
A= np.arange(21,30).reshape(3,3)

#print the matrix
print('\n\nMatrix\n\n')
print(A)

print('\n\nTranspose of Matrix\n\n')
print(A.transpose())



Matrix


[[21 22 23]
 [24 25 26]
 [27 28 29]]


Transpose of Matrix


[[21 24 27]
 [22 25 28]
 [23 26 29]]


In [2]:
# Alternate way to calculate transpose
A.T

array([[21, 24, 27],
       [22, 25, 28],
       [23, 26, 29]])

## 3. Matrix multiplication

In [3]:
A=np.arange(21,30).reshape(3,3)
A

array([[21, 22, 23],
       [24, 25, 26],
       [27, 28, 29]])

In [4]:
B=np.arange(31,40).reshape(3,3)
B

array([[31, 32, 33],
       [34, 35, 36],
       [37, 38, 39]])

In [5]:
A.dot(B)   # matrix multiplication AxB

array([[2250, 2316, 2382],
       [2556, 2631, 2706],
       [2862, 2946, 3030]])

In [6]:
B.dot(A)   # matrix multiplication BxA

array([[2310, 2406, 2502],
       [2526, 2631, 2736],
       [2742, 2856, 2970]])

#### Properties of matrix multiplication
1. Matrix multiplication is associative provided the given matrices are compatible for multiplication i.e.
    `ABC =  (AB)C = A(BC)`
    
2. Matrix multiplication is not commutative i.e. AB and  BA are not equal.


Matrix multiplication is used in linear and logistic regression when we calculate the value of output variable by parameterized vector method.


## 4 Representing equations in matrix form

![](rep.PNG)

## 5. Solving the Problem

two methods to solve matrix equations:
1. Row Echelon Form
2. Inverse of a Matrix

### 5.1 Row Echelon form

- solving the equations by substitution method can prove to be tedious and time taking. 
-  Our first method introduces you with a neater and more systematic method to accomplish the job in which, we manipulate our original equations systematically to find the solution. 


There are two conditions which have to be fulfilled by any manipulation to be valid.
1. Manipulation should preserve the solution i.e. solution should not be altered on imposing the manipulation.
2. Manipulation should be reversible.

So, what are those manipulations?
1. We can swap the order of equations.
2. We can multiply both sides of equations by any non-zero constant ‘c’.
3. We can multiply an equation by any non-zero constant and then add to other equation.

These points will become more clear once you go through the algorithm and practice it. The basic idea is to clear variables in successive equations and form an upper triangular matrix.

[Udacity Linear algebra: Rules for manipulating matrix equations](https://www.youtube.com/watch?v=0-GaihnICmo&index=17&list=PLAwxTw4SYaPlH16rY8KgDwciMZPxCnCX_)



![](eqn1.PNG)

![](eqn2.PNG)

![](eqn3.PNG)

### 5.2 Inverse of a Matrix

For solving a large number of equations in one go, the inverse is used. Don’t panic if you are not familiar with the inverse. We will do a good amount of work on all the required concepts. Let’s start with a few terms and operations.

__Determinant of a Matrix__ – The concept of determinant is applicable to square matrices only. I will lead you to the generalised expression of determinant in steps. To start with, let’s take a 2*2 matrix  A.

$$ A = \begin{bmatrix}a & b \\c & d \end{bmatrix} $$

For now, just focus on 2*2 matrix. The expression of determinant of the matrix A will be:

`det(A) = ad - bc`

Now take a 3*3 matrix ‘B’ and find its determinant.

$$ B = \begin{bmatrix}a & b & c\\ d & f & g \\ h & i & j \end{bmatrix} $$

![](det.PNG)



In [7]:
arr = np.arange(100,116).reshape(4,4)
arr

array([[100, 101, 102, 103],
       [104, 105, 106, 107],
       [108, 109, 110, 111],
       [112, 113, 114, 115]])

In [8]:
np.linalg.det(arr)

-2.958228394578808e-31

![](minor.PNG)

![](cofactor.PNG)

![](adj.PNG)

![](inverse.PNG)

### 5.3 Power of matrices

![](power.PNG)

In [9]:
#create an array arr1
arr1 = np.arange(5,21).reshape(4,4)
arr1

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20]])

In [10]:
# Finding inverse of a matrix
np.linalg.inv(arr1)

array([[-6.77233027e+13,  6.09509724e+14, -1.01584954e+15,
         4.74063119e+14],
       [-3.26200575e+15,  3.83765382e+15,  2.11070960e+15,
        -2.68635767e+15],
       [ 6.72718140e+15, -9.50383681e+15, -1.17387058e+15,
         3.95052599e+15],
       [-3.39745235e+15,  5.05667327e+15,  7.90105198e+13,
        -1.73823144e+15]])

## 6.Application of inverse in Data Science

Inverse is used to calculate parameter vector by normal equation in linear equation.

It describes the different variables of different baseball teams to predict whether it makes to playoffs or not. But for right now to make it a regression problem, suppose we are interested in predicting OOBP from the rest of the variables. So, ‘OOBP’ is our target variable. To solve this problem using linear regression, we have to find parameter vector. If you are familiar with Normal equation method, you should have the idea that to do it, we need to make use of Matrices. Lets proceed further and denote our Independent variables below as matrix ‘X’.

To find the final parameter vector(θ) assuming our initial function is parameterised by θ and X , all you have to do is to find the inverse of 
$(X^T X)$ which can be accomplished very easily by using code as shown below.

First of all, let me make the Linear Regression formulation easier for you to comprehend.

$f_θ(X)= θ^TX$, where θ is the parameter we wish to calculate and X is the column vector of features or independent variables.

In [11]:
import pandas as pd
import numpy as np

In [15]:
df = pd.read_csv("baseball.csv")
Df1 = df.head(14)
Df1

Unnamed: 0,Team,League,Year,RS,RA,W,OBP,SLG,BA,Playoffs,RankSeason,RankPlayoffs,G,OOBP,OSLG
0,ARI,NL,2012,734,688,81,0.328,0.418,0.259,0,,,162,0.317,0.415
1,ATL,NL,2012,700,600,94,0.32,0.389,0.247,1,4.0,5.0,162,0.306,0.378
2,BAL,AL,2012,712,705,93,0.311,0.417,0.247,1,5.0,4.0,162,0.315,0.403
3,BOS,AL,2012,734,806,69,0.315,0.415,0.26,0,,,162,0.331,0.428
4,CHC,NL,2012,613,759,61,0.302,0.378,0.24,0,,,162,0.335,0.424
5,CHW,AL,2012,748,676,85,0.318,0.422,0.255,0,,,162,0.319,0.405
6,CIN,NL,2012,669,588,97,0.315,0.411,0.251,1,2.0,4.0,162,0.305,0.39
7,CLE,AL,2012,667,845,68,0.324,0.381,0.251,0,,,162,0.336,0.43
8,COL,NL,2012,758,890,64,0.33,0.436,0.274,0,,,162,0.357,0.47
9,DET,AL,2012,726,670,88,0.335,0.422,0.268,1,6.0,2.0,162,0.314,0.402


In [17]:
# We are just taking 6 features to calculate θ.
X = Df1[['RS', 'RA', 'W', 'OBP','SLG','BA']]
X

Unnamed: 0,RS,RA,W,OBP,SLG,BA
0,734,688,81,0.328,0.418,0.259
1,700,600,94,0.32,0.389,0.247
2,712,705,93,0.311,0.417,0.247
3,734,806,69,0.315,0.415,0.26
4,613,759,61,0.302,0.378,0.24
5,748,676,85,0.318,0.422,0.255
6,669,588,97,0.315,0.411,0.251
7,667,845,68,0.324,0.381,0.251
8,758,890,64,0.33,0.436,0.274
9,726,670,88,0.335,0.422,0.268


In [18]:
Y=Df1['OOBP']
Y

0     0.317
1     0.306
2     0.315
3     0.331
4     0.335
5     0.319
6     0.305
7     0.336
8     0.357
9     0.314
10    0.337
11    0.339
12    0.310
13    0.310
Name: OOBP, dtype: float64

In [19]:
#Converting X to matrix
X = np.asmatrix(X)

In [20]:
X

matrix([[7.34e+02, 6.88e+02, 8.10e+01, 3.28e-01, 4.18e-01, 2.59e-01],
        [7.00e+02, 6.00e+02, 9.40e+01, 3.20e-01, 3.89e-01, 2.47e-01],
        [7.12e+02, 7.05e+02, 9.30e+01, 3.11e-01, 4.17e-01, 2.47e-01],
        [7.34e+02, 8.06e+02, 6.90e+01, 3.15e-01, 4.15e-01, 2.60e-01],
        [6.13e+02, 7.59e+02, 6.10e+01, 3.02e-01, 3.78e-01, 2.40e-01],
        [7.48e+02, 6.76e+02, 8.50e+01, 3.18e-01, 4.22e-01, 2.55e-01],
        [6.69e+02, 5.88e+02, 9.70e+01, 3.15e-01, 4.11e-01, 2.51e-01],
        [6.67e+02, 8.45e+02, 6.80e+01, 3.24e-01, 3.81e-01, 2.51e-01],
        [7.58e+02, 8.90e+02, 6.40e+01, 3.30e-01, 4.36e-01, 2.74e-01],
        [7.26e+02, 6.70e+02, 8.80e+01, 3.35e-01, 4.22e-01, 2.68e-01],
        [5.83e+02, 7.94e+02, 5.50e+01, 3.02e-01, 3.71e-01, 2.36e-01],
        [6.76e+02, 7.46e+02, 7.20e+01, 3.17e-01, 4.00e-01, 2.65e-01],
        [7.67e+02, 6.99e+02, 8.90e+01, 3.32e-01, 4.33e-01, 2.74e-01],
        [6.37e+02, 5.97e+02, 8.60e+01, 3.17e-01, 3.74e-01, 2.52e-01]])

In [21]:
#taking transpose of X and assigning it to x
x= np.transpose(X)

In [22]:
#finding multiplication
T= x.dot(X)

#inverse of T - provided it is invertible otherwise we use pseudoinverse
inv=np.linalg.inv(T)

#calculating θ
theta=(inv.dot(X.T)).dot(Y)

theta

matrix([[-2.59951184e-04,  1.53102522e-04, -6.98603725e-05,
          6.57605847e-01,  4.00767706e-01,  1.07696279e-01]])

Imagine if you had to solve this set of equations without using linear algebra. Let me remind you that this data set is less than even 1% of original date set. Now imagine if you had to find parameter vector without using linear algebra. It would have taken a lots of time and effort and could be even impossible to solve sometimes.

One major drawback of normal equation method when the number of features is large is that it is computationally very costly. The reason is that if there are ‘n’ features, the matrix $X^TX$ comes to be the order $n*n$ and its solution costs time of order $O(n*n*n)$. Generally, normal equation method is applied when a number of features is of the order of 1000 or 10,000. Data sets with a larger number of features are handled with the help another method called Gradient Descent.

## 7.  Eigenvalues and Eigenvectors

Eigenvectors find a lot of applications in different domains like computer vision, physics and machine learning. 

The concept of Eigenvectors is the backbone of Principal component analysis algorithm.

![](eigen1.PNG)

![](eigen2.PNG)

[Calculating eigen values and eigen vectors](https://www.youtube.com/watch?v=BbvCa87U15M)

Code for finding EigenVectors in python

In [23]:
#create an array
arr = np.arange(1,10).reshape(3,3)
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [24]:
#finding the Eigenvalue and Eigenvectors of arr
np.linalg.eig(arr)

(array([ 1.61168440e+01, -1.11684397e+00, -9.75918483e-16]),
 array([[-0.23197069, -0.78583024,  0.40824829],
        [-0.52532209, -0.08675134, -0.81649658],
        [-0.8186735 ,  0.61232756,  0.40824829]]))

##  Eigen values and Eigen vectors



#### applications of eigen values and eigen vectors in real life

Eigenvalues and eigenvectors prove enormously useful in linear mapping. Let's take an example: suppose you want to change the perspective of a painting. If you scale the x direction to a different value than the y direction (say x -> 3x while y -> 2y), you simulate a change of perspective. This would represent what happens if you look a a scene from close to the ground as opposed to higher in the air. Objects appear distorted and foreshortened. A change in perspective in a painting is really just a vector transformation that performs a linear map. That is, a set of points (the painting) gets transformed by multiplying the x distance by one value and the y value by a different value. You can capture the process of doing this in a matrix, and that matrix represents a vector that's called the eigenvector.   

If the mapping isn't linear, we're out of the realm of the eigenvector and into the realm of the tensor. So eigenvectors do well with linear mappings, but not with nonlinear mappings. In the case of nonlinear mappings, the fixed points in the eigenvector matrix would be replaced with functions that can take on many different values.  

Eigenvectors pop up in the study of the spread of infectious diseases or vibration studies or heat transfer because these are generally linear functions. Diseases tend to spread slowly, heat spreads gradually, and vibrations propagate gradually. Diseases and vibrations and heat patterns also tend to spread by contact, so you don't get oddball discontinuous patterns of disease spread of vibration propagation or heat transfer (typically). This means that heat patterns and vibration propagation and disease spreading can be simulated reasonably well by linear maps from one set of points to another set of points.


Nonlinear discontinuous systems like explosions or the orbits of a 3-body system in a gravitational field or a star undergoing gravitational collapse don't lend themselves to simple linear maps. As a result, eigenvectors do not offer a good way of describing those systems. For those kinds of nonlinear systems, you typically need tensors instead of linear maps.

Very Imp:

- [10 Powerful Applications of Linear Algebra in Data Science (with Multiple Resources)](https://www.analyticsvidhya.com/blog/2019/07/10-applications-linear-algebra-data-science/)