# Linear Algebra

If you are familier with following concepts you can move to next notebook:
>Reference http://www.deeplearningbook.org/contents/linear_algebra.html

    1. Scalar,Vector,Matrix and Tensors
    2. Multiplication of matrices and vectors
    3. Identity and Inverse Matrices
    4. Linear Dependence and span
    5. Nomrs
    6. Special Matrices and Vectors
    7. Eigen Decomposition
    8. Singular Value Decompositon
    9. The Moore-Penrose Pseudoinverse
    10. Trace Operator
    11. Example: Principal Component Analysis
    
   -------------------
   

In [None]:
#library imports
import numpy as np

### 1.1 Scalars

Scalars are single numbers:
>Examples
```
    A = 10
    B = -10
    C = 0.5
```

In [None]:
#scalar example in python
## Your Code here
scalar_a  = None # change it from None to scalar value 10.5
## End
print("Printing a Scalar Value:", scalar_a)

>Expected Output: Printing a Scalar Value: 10.5

### 1.2 Vectors

Vectors are represented as array of numbers/Scalars.

We will use **numpy**(a powerful parallel computation python library for many application in numerical computations.) to operate on n dimensional arrays

>Examples
```
A = [0,10,-2.5,...,21]   A ROW VECTOR
B = [[0]               
     [10]
     [-2.5]              A COLUMN VECTOR
     ...
     ...
     [21]]
```

>Note any numpy function reference can viewed in here https://docs.scipy.org or Google It


In [None]:
#vector example in python
## Your Code here
vector_a  = None # make a new row vector using np.array([0,10,-2.5,21])
vector_b  = None # make a new column vector using np.array([[0],[10],[-2.5],[21]])
## End
print("Printing a Row Vector Value:\n", vector_a)
print("Printing a Column Vector Value:\n", vector_b)

>Expected Output:
```
    Printing a Row Vector Value:
     [  0.   10.   -2.5  21. ]
    Printing a Column Vector Value:
     [[  0. ]
     [ 10. ]
     [ -2.5]
     [ 21. ]]
```

### 1.3 Matrices

 A matrix is a 2-D array of numbers.
 Dimensions are denoted by N_rows x N_cols
 
 where,

     N_rows is number of rows in a matrix
     N_cols is number of columns in a matrix

 >Examples 
 ```
 A = [[  0.   10.   -2.5  21. ]
     [  0.   10.   -2.5  21. ]
     [  0.   10.   -2.5  21. ]]
 ```

In [None]:
# matrix example in python
## Your Code here
matrix_a = None #use np.array([[0,10,-9.5,21],[-9,10,-2.5,61],[5,10,-2.5,21]]) try different dimension and values
## End
print("Printing a 3 x 4 Matrix:\n",matrix_a)

>Expected Output:
```
Printing a 3 x 4 Matrix:
 [[  0.   10.   -9.5  21. ]
 [ -9.   10.   -2.5  61. ]
 [  5.   10.   -2.5  21. ]]
```

### 1.4 Tensors

Tensors are arrays with more than two dimensions.

As a example you can imagine a 3-D objects as a tensor of stacked 2d matrices in a line.

In [None]:
# Tensor example in python ------ run this cell
np.random.seed(1) # used for making random value generation fixed according to time
tensor_a = np.random.randn(3,2,4) # used to create a n dimensional array of random scalar values.
# It can also be created using np.array. See the reference provided.
print("Printing a tensor:\n",tensor_a)

By now you would have guessed that scalars are subset of vectors which are subset of matrices and similarly matrices are subset of tensors.

Lets have look at some matrix operations

#### Transpose of a matrix

A transpose of a matrix is mirror of the matrix along its diagonal running from 1st element.

Example:

    A = [0 1                      transpose(A) = [0 1 2
         1 1                                      1 1 3]
         2 3]
    


#### Python BroadCasting
Numpy provides many advance operation on matrices.

Broadcasting a great way for working with arrays

Example

```
    A is a matrix =             [0 1 2
                                 2 1 4]
                     
    A_broadcast   = A + 3    =  [3 4 5
                                 5 4 7]
                                 
    Many such operations can be used in numpy see th reference provided.```
    
    

In [None]:
#Transpose Example run this cell
matrix_b = np.array([[0,1],[1,1],[2,3]])
matrix_b_transpose = matrix_b.T # .T is a numpy function used to make transpose of an n dimension array
matrix_b_broadcasted = matrix_b - 5
print("Matrix")
print(matrix_b)
print("\nMatrix Transpose")
print(matrix_b_transpose)
print("\nMatrix Broadcasted")
print(matrix_b_broadcasted)

<br></br><br></br>

### 2. Multiplication of matrices and vectors

In order to multiply also known as dot product of matrices the number of columns of first matrix and number of rows of second matrix must be equal.

Example:

    A = matrix of dimension 4x5
    B = matrix of dimension 5x3
    C = A.B returns C as a matrix of dimension 4x3
        where,
        C[i,j] = sum(A[i,k]*B[i,k]) on k 


In [None]:
# multiplication example
A = np.array([[1,2,1],[1,0,1]])# dimension 2x3
print("Dimensions of A",A.shape)# ndarray.shape return dimensions of a matrix
B = np.array([[1,2],[3,4],[5,6]])# dimension 3x2
print("Dimensions of B",B.shape)
## Your Code here
C = None # use np.dot(A,B) which returns a dot product of A and B of 2x2 dimension
## End
print("Matrix C is:\n",C)


>Expected Output:
```
Dimensions of A (2, 3)
Dimensions of B (3, 2)
Matrix C is:
 [[12 16]
 [ 6  8]]
```    


The matrix product follows some properties which are useful lets have a look

#### Matrix Product Properties

**A.(B+C) = A.B + A.C  ** >Distributive 

**A.(B.C) = (A.B).C     ** >Associative 

**A.B   !=   B.A    ** >Not Commutative 

**(A.B)<sup>T</sup> = B<sup>T</sup>.A<sup>T</sup>  **

> You can verify above given Properties in cell below

In [None]:
# Try verifying above properties here:
## Your Code here
A = None
B = None
C = None

## End

<br></br><br></br>
### 3. Identity and Inverse Matrices

** Identity Matrix(I)** > A square matrix(nxn)  with diagonals as 1 and all other elements as zeros.

**A.I = A** > Where I is identity matrix

**A<sup>-1</sup>.A = I** > Inverse property of a matrix

> **NOTE** Square matrices have No.-rows = No.-cols ==> NxN

In [None]:
# Run this cell to check above properties
np.random.seed(2)
matA = np.random.randn(3,3)
matA_inv = np.linalg.inv(matA) # a method to invert an n-D square matrix
matI = np.dot(matA_inv,matA)# note : numerically extremely small numbers may appear in places of zeros due to computation error
matB = np.dot(matA,matI)
print("Printing A:\n",matA)
print("\nPrinting A_inv:\n",matA_inv)
print("\nVerifying property 2 Inverse Multiplication:\n",matI)
print("\nVerifying property 1 Identity Multiplication:\n",matB)

<br></br><br></br>
### 4. Linear Dependence and span

To Solve a Linear System of equation like below, In this equation n equation are placed with <=n number of unknown variables

**A.X = B** >>Linear Combination of A and X

where,
X is a Vector with unknown variables and A is a Square Matrix. We can multiply inverse of matrix A on Both sides to get X.
    
**X = A<sup>-1</sup>.B**

> NOTE: A square matrix need not have a inverse every time due to sigularity. Please read the reference on http://www.deeplearningbook.org/contents/linear_algebra.html Section 2.4

The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors

In [None]:
#Linear Combination problem
#suppose you have two equations  2x + y = 1 and 5x + 9y = -4 how will you solve using matrices
## Your Code here
A =None #Hint: try to convert above two equation in form of matrix multiplication A.X = B
B = None
A_inv = np.linalg.inv(A)
X = None
#End
print("The values of variables are:\nX = ",X[0],",Y = ",X[1])

>Expected Output
```
The values of variables are:
X =  [ 1.] ,Y =  [-1.]
```

<br></br><br></br>
### 5. Norms
Norms are a way of measuring size / length of a vector or you can say norms are functions to map vectors to a non negative values.

The Mostly used norm definiton is as follows:

**||X||<sub>p</sub> = [sum<sub>i</sub>(|X<sub>i</sub>|)<sup>p</sup>]<sup>1/p</sup>**
>If p = 2 it's called euclidean distance or norm. If this norm is one for a vector then the vecot is called a unit vector

The norm function can also be described as a function that satisfies these conditions:
    1. f(X) = 0 --> X=0
    2. f(X+Y) <= f(X) + f(Y)
    3. f(aX) = |a|f(X) for all values of a
    
Examples:
 ```
    max_norm(x) = max(|x|)
    l1_norm(x)  = sum(|x|)
    ```

To measure size of a Matrix : **Frobeinius_norm**

**||A||<sub>F</sub> = sqrt(sum(a<sub>i,j</sub>))**

----------------------------------------

**Determinant of Matrix**
> To map a square matrix to a scalar value we ue something called determinant of a matrix. please look here its easy to understand how to calculate determinant https://en.wikipedia.org/wiki/Determinant/

In [None]:
# Norm try and check with numpy l2_norm --> ||x|| , p = 2
V = np.array([1,6,-1,2])
np_norm = np.linalg.norm(V)
## Calculate l2_norm = np.sqrt(np.sum(np.square(Vector)))
## Your Code here
my_norm = None
#End
print("numpy norm",np_norm)
print("your norm",my_norm)

### 6. Special Matrices and Vectors

** Diagonal Matrix **

Matrix that have all elements as zero other than diagonal elements.
Example:

    A = [2 0 0
         0 5 0
         0 0 3]
         
** Symmetric Matrix **

Matrix having itself as its transpose.
Example:

    A = [2  0 -1
         0  3  2
        -1  2  3]
        
** Orthogonal Matrix **

An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal.which iplies its inverse is its transpose:

A<sup>-1</sup> = A<sup>T</sup>

A.A<sup>T</sup>= I

### 7. Eigen Decomposition

While operating on larger datasets matrix decomposition can be very useful to extract meaningful data.

One of the most widely used kinds of matrix decomposition is calledeigen-decomposition, in which we decompose a matrix into a set of eigenvectors andeigenvalues.

>Applicable only for Square Matrices

**Eigenvector, Eigenvalue ?**
An Eigen Vector for a squarematrix **A** is vector **v** when multiplied with A scale in magnitude and does not change its direction

    A.v = k*v, 

where **k** is a constant known as eigenvalue corresponding to right eigenvector **v**

one can also find a left Eigenvector as follows

    u.A = j*u, 

where **j** is a constant known as eigenvalue corresponding to left eigenvector **u**

>NOTE: If we take every thing on above equation to right side and take determinant we will get **determinant(A-k.I) = 0**

<br></br>

**Decomposition ?**

Say we have a set of eigenvectors{v1,v2......vn} and repective eigenvalues{k1,k2,.....kn} with respect to matrix A.
Then we can stack them in matrix form so that:

**A = V.diag(K).V<sup>-1</sup>**, Where the V is matrix of all eigenvectors diag(K) is a diagonal matrix containing all the
eigen values.



> Example:
```
A = [0  1        
    -2 -3]
by solving determinant(A-k.I) = 0 ==> k = -1 or -2 which are eigen values of A
by putting back the eigen values into first equation we can get a set of two eigen vectors of unit magnitude
v1 = [0.7071
      -0.7071]
v2 = [-0.4472
      0.8944]     
```
For a quick tutorial look here:http://lpsa.swarthmore.edu/MtrxVibe/EigMat/MatrixEigen.html

Not every matrix can be decomposed into eigenvalues and eigenvectors. In somecases, the decomposition exists but involves complex rather than real numbers.

Speciﬁcally, every real symmetric matrix can be decomposed into an expression using only real-valued eigenvectors and eigenvalues

In [None]:
#Right Eigen example run this cell
A = np.array([[0,1],[-2,-3]])
w, v = np.linalg.eig(A) # used to compute eigen values and vectors for a matrix
print("matrix A:\n", A)
print("Eigen Values:\n",w)
print("Eigen Vectors:\n",v)

<br>
### 8. Singular Value Decompositon

The singular value decomposition(SVD) provides another way to factorize a matrix, into singular vectors and singular values.
>Applicable to all real matrices

According to SVD every real matrix can be decomposed as follows:

**A = U.D.V<sup>T</sup> **

Where, 
```
A is a m x n matrix, 
U is a m x m matrix, 
D is a m x n matrix and 
V is a n x n matrix.

U and V are both Orthogonal Matrices
D is a diagonal Matrix

The elements along the diagonal of D are known as the singular values of the matrix A. The columns of U are known as the left-singular vectors. The columns of V are known as as the right-singular vectors.
```
By now u would have guessed that Eigen Decompostion is a special case of Sigular Value Decomposition.

>**Calculation**:
>>To gain more perspective on calculation of this matrices look in here:http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm

The visualization of a SVD:  **A = UΣV<sup>T</sup>** can be presented as tranformation of data in following manner:
<img src = "linalg/svd.png">

In [None]:
#Python SVD example
A = np.array([[2,4],[1,0],[0,0],[0,0]])
u, d, v = np.linalg.svd(A, full_matrices=1, compute_uv=1)# look the reference given for complete details about this funct.
print("The matrix:\n",A)
print("\n\nThe U matrix:\n",u)
print("\nThe D matrix:\n",d)
print("\nThe V matrix:\n",v)


### 9. The Moore-Penrose Pseudoinverse
Inverse exists only for square matrices but there can be cases-

when A is not a square matrix and following condition occurs:
>A.x = y | x = B.y

Then B is called a Pseudoinverse matrix of B

### 10. Trace Operator
The sum of all diagonal elements of a matrix is defined as Trace of that matrix.

>Example 
```
A = [0 4 5
     5 6 11
     6 5 3]
trace(A) = 0 + 6 + 3 = 9
```

### 11. Example : Principal Component Analysis
Principal Component Analysis(PCA) is a machine learning algorithm which can be derived using only linear algebra knowledge that we have gained till now.

Suppose we have a collection of m points {x(1), . . . , x(m)} in Rn and we want to apply lossy compression to these points to reduce storage memory requirement

Lossy compression means storing the points in a way that requires less memory but may lose some precision. We want to lose as little precision as possible.
Since these are n dimensional data we can reduce the number of dimension by eliminating the less meaningfull dimensions.

In other words PCA is something that compresses a lot of data to esscence of the original data.

----------------------
**Redcuing dimenison using covariance matrix **

Before we jump to PCA lets see some basic statistical parameters:

For a dataset A = {x1,x2,.......xn} are all data points collected we say:

** Mean(x<sub>m</sub>) = sum<sub>i</sub>(xi)/n** gives the average of the data

** Variance(v<sub>A</sub>) = sum<sub>i</sub>((x-x<sub>m</sub>)<sup>2</sup>)/n**, tells about how varying the data is.

** Standard Deviation(SD<sub>A</sub>) = sqrt(v<sub>A</sub>) **

say there are two data sets A as same and B = {y1,y2......yn} then we can define covariance

** Covariance(C<sub>AB</sub>) = sum<sub>i</sub>((x-x<sub>m</sub>)*(y-y<sub>m</sub>))/n**, tells about how A and B vary with respect to each other. In other words it tells us about co-relation between the two datasets. It says if C is very small then the data are orthogonal and not related to each other and if its very high then the data is highly related to each other.


>Note: If you want to know more about these statistical parameters. Google (parameter) => Understood.

Now,

Suppose you have a n-dimensional dataset **X** with following dimension:: *d1,d2,d3,d4,d5,d6.*

we are going to define a covariance matrix between these dimension:

C_mat = <br>[<br>C<sub>11</sub>,C<sub>12</sub>,........,C<sub>16</sub><br>
C<sub>21</sub>,C<sub>22</sub>,........,C<sub>26</sub><br>
. . . . . . . . . . . . . .<br>
. . . . . . . . . . . . . .<br>
C<sub>61</sub>,C<sub>62</sub>,........,C<sub>66</sub><br>
]

In this matrix you would have noticed that for the  dimensional pairs if C<sub>ij</sub> is very high then the data dimensions are corelated and can be eliminated to reduce redundancy.

Now,
Compute the eigen vectors and eigen values of covariance matrix.

Arrange them in descending order and remove the unwanted eigen value and respective eigen vector.

Multiply the resulting vector matrix to the data atrix to get transformed data.


-------------------

**Reducing dimensions using SVD:**

Start by calculating:

X ==> U.D.V<sup>T</sup>

Y = U<sup>T</sup>.X

Now since U and V are orthogonal matrices you can see that :

Z = Y.Y<sup>T</sup> 
<br>Z = U<sup>T</sup>.X.(U<sup>T</sup>.X)<sup>T</sup>
<br>Z = (D.V).(V<sup>T</sup>.D<sup>T</sup>)
<br>Z = D.D<sup>T</sup>

If you have understood SVD calculation you can say Z is similar to corelation matrix between singular vectors U and the singular vectors are called the principal components.

Now,
to reduce the dimensions arrange values of Z in descending order and respectively the vectors in U.

Decide number of vectors to keep by analyzing how much percentage does the value in Z adds to trace of Z which denotes hows much data is represented by corresponding vector in U and remove the unwanted vectors.
Now that the you have your new PC Vectors calculate:

X_new = U.Y


We are going to create two calsses of 3D dimensional data and apply dimensionality reduction on them:

In [None]:
# run this cell to generate data for compression
#Example Principal Component Analysis run this cell:
%pylab inline
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import proj3d

# Generating data
np.random.seed(107) # random seed for consistency
mu_vec1 = np.array([0,0,0]) # mean value for creation of data
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]]) # covariance value for creation of data
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20).T # data fro class one
assert class1_sample.shape == (3,20), "class_1 dimension error"

# similarily for second set of class...................
mu_vec2 = np.array([2,2,2])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, 20).T
assert class2_sample.shape == (3,20), "class_2 dimension error"


## plotting the genrated data
fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(111, projection='3d')
plt.rcParams['legend.fontsize'] = 10   
ax.plot(class1_sample[0,:], class1_sample[1,:], class1_sample[2,:], 'o', markersize=8, color='blue', alpha=0.7, label='class1')
ax.plot(class2_sample[0,:], class2_sample[1,:], class2_sample[2,:], 's', markersize=8, alpha=0.7, color='green', label='class2')
plt.title('Datapoints of class 1 and class 2')
ax.legend(loc='upper right')
plt.show()


#joining the datasets to get a new single data set
data = np.concatenate((class1_sample, class2_sample), axis=1)
assert data.shape == (3,40), "joining dimensional error"

### PCA_Covariance
>Try with the Covariance method

In [None]:
#Complete this function to apply pca using covariance matrix
def pca_cov(data):
    #Compute the mean vector
    mean_x = np.mean(data[0,:])
    mean_y = np.mean(data[1,:])
    mean_z = np.mean(data[2,:])
    mean_vector = np.array([[mean_x],[mean_y],[mean_z]])
    
    #shifting data to set mean as origin
    data = data-mean_vector

    #computing covariance matrix
    cov_mat = np.cov([data[0,:],data[1,:],data[2,:]])


    #computing eigen values and vectors
    ## Your Code here
    eig_val, eig_vec = None #Compute eigen vectors and values using -->  np.linalg.eig(cov_mat)
    #End
    #sorting according to Decreasing eigen values
    key = argsort(eig_val)[::-1]
    eig_val, eig_vec = eig_val[key], eig_vec[:, key]

    #reducing 3 dimensional data to 2 dimensions
    pca = eig_vec[:,:2]

    #calculating new transformed data
    ## Your Code here
    X_pca = None #calculate dot product of the new vector and data --> np.dot(pca.T,data)
    #End
    return X_pca

### PCA_SVD
>Try with the SVD method

In [None]:
#Complete this function to apply pca using svd
def pca_svd(data):
    #Compute the mean vector
    mean_x = np.mean(data[0,:])
    mean_y = np.mean(data[1,:])
    mean_z = np.mean(data[2,:])
    mean_vector = np.array([[mean_x],[mean_y],[mean_z]])
    
    #shifting data to set mean as origin
    data = data-mean_vector

    #computing svd
    ## Your Code here
    u, d, v = None # calculate the svd of data using np.linalg.svd(data, full_matrices=1, compute_uv=1)
    #End
    ut = u.T

    #computing y and z
    y = np.dot(u.T,data)
    z = np.dot(y,y.T)

    #sorting according to Decreasing eigen values
    idx = np.diag(z)
    key = argsort(idx)[::-1]
    u_new = ut[:,key]
    u_new = u_new[0:2,:]

    #calculating new transformed data
    X_pca = np.dot(u_new,data)
    return X_pca


### PCA sk_learn library
>Try with the inbuilt sk_learn library method

In [None]:
#Comparision with sklearn.decomposition library
from sklearn.decomposition import PCA as PCAskl

sklearn_pca = PCAskl(n_components=2)
sklearn_transf = sklearn_pca.fit_transform(data.T)




#plottinng data
plt.plot(sklearn_transf[0:20,0],sklearn_transf[0:20,1], 'o', markersize=7, color='blue', alpha=0.5, label='class1')
plt.plot(sklearn_transf[20:40,0], sklearn_transf[20:40,1], 's', markersize=7, color='green', alpha=0.5, label='class2')
plt.xlabel('pca-1')
plt.ylabel('pca-2')
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.legend()
plt.title('Transformed samples: Sk_learn library')
plt.show()


# plotting the covariance pca
X_pca = pca_cov(data[:])
plt.plot(X_pca[0,0:20], X_pca[1,0:20], 'o', markersize=7, color='blue', alpha=0.5, label='class1')
plt.plot(X_pca[0,20:40], X_pca[1,20:40], 's', markersize=7, color='green', alpha=0.5, label='class2')
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.xlabel('pca-1')
plt.ylabel('pca-2')
plt.legend()
plt.title('Transformed samples: Covariance Method')
plt.show()


# plotting the svd pca
X_pca = pca_svd(data[:])
plt.plot(X_pca[0,0:20], X_pca[1,0:20], 'o', markersize=7, color='blue', alpha=0.5, label='class1')
plt.plot(X_pca[0,20:40], X_pca[1,20:40], 's', markersize=7, color='green', alpha=0.5, label='class2')
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.xlabel('pca-1')
plt.ylabel('pca-2')
plt.legend()
plt.title('Transformed samples: SVD Method')
plt.show()

As you see the data has been compressed correctly the only difference is of orientation of projection.
You can use pca for application like face recognition also.

>**Congratulation on completing the first tutorial you are doing great **

In the next part we will discuss Probability and Information Theory