<h1>Linear Algebra for Machine Learning</h1>

<li> <b> <a href="#what-is-linear-algebra">What is Linear Algebra ? </a> <b></li>
<li> <b> <a href="#why-should-you-learn-linear-algebra"> Why should you learn Linear Algebra ? </a> <b></li>   
<li> <b> <a href="#our-approach-to-linear-algebra"> Our Approach to Linear Algebra </a><b></li>   
<li> <b> <a href="#linear-algebra-basics"> Linear Algebra Basics </a><b></li>   
    <ul> 
        <li>  <em> <a href="#vectors"> Vectors </a> </em> </li>   
        <li>  <em> <a href="#matrices"> Matrices </a> </em></li>       
        <li>  <em> <a href="#operations-on-vectors-and-matrices"> Operations on Vectors & Matrices </a> </em> </li>     
        <li>  <em> <a href="#dot-product"> Dot Product </a></em></li>                   
    </ul>


<div id="what-is-linear-algebra"/><h2>  What is Linear Algebra </h2>

When you want to work with huge volumes of data that is similar in nature, Linear Algebra is helpful. Data Science & Machine Learning just happens to be some of the areas where we need to deal with high volumes of homogeneous data. However, these are not the only areas that use Linear Algebra. Digital Signal Processing, Computer Graphics, structural engineering are some of the other areas in engineering that use Linear Algebra very heavily. 

<div id="why-should-you-learn-linear-algebra" /><h2> Why should you learn Linear Algebra </h2>

Everybody starting off with Machine Learning should have atleast a fair idea of Linear Algebra. In Machine Learning, here are some of the areas where Linear Algebra is used. In addition, Linear Albegra is almost exclusively the run-time math engine behind Deep Learning. 

For starters, here are some uses cases where Linear Algebra is used in ML. 

<li> <b>Image Processing</b> </li>
An Image is represented in the computer using a sequence of numbers. Each of these numbers in the sequence could represent the intensity of the pixel or the color of the pixel. So, just to represent this kind of data you need matrices. And any further processing of this data ( like transformations, summations etc ) require linear algebra. 

<li> <b> Linear Regression </b> </li>
Linear regression is an ML technique used to fit (approximate) data points on a linear line/plane. For example, if you want to find out how the median price of a house is dependent on parameters like crime rate, pollution, tax rates etc, simple linear algebra techniques like matrix inverse, matrix transpose and dot product can solve the problem. 

<pre>
     crim zn indus chas   nox    rm  age    dis rad tax ptratio      b lstat medv
1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98 24.0
2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14 21.6
3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03 34.7
4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94 33.4
5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33 36.2
6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21 28.7
</pre>

<li> <b> Language Processing </b> </li>
Sparse matrices are extensively used in language processing to represent word count. With the sample along the x-axis and the word count across the y-axis, even for a small data set the size of the sparse matrix could get as big as 10000 x 10000. However, most of the matrix is just zeros ( and hence called sparse matrix ). So, some of the techniques from linear algebra are used to efficient work with these matrices to avoid space/time complexity that might otherwise result from these huge matrices. 


<img src="./pics/sparse_matrix.png"/>

There are many more examples, that we will see later like 
<li> Principal Component Analyzis ( PCA ) for dimensionality reduction of higher dimensional data</li>
<li> Singular Value Decomposition ( SVD ) for processing sparse matrices in recommender systems and NLP</li>
<li> One Hot Encoding for encoding categorical data to numeric data.</li>

etc. Not to mention deep learning of course. 


<div id="our-approach-to-linear-algebra"/><h2> Our approach to learn Linear Algebra </h2>

Linear algebra is a very broad subject that has applications across a wide variety of science and engineering areas. It is generally taken as a formal course in atleast 1 semester to cover it generically. However, we will do a shallow learning of just enough Linear Algebra to cover only those topics that are necessary to understand and solve the particular ML problem. 

For example, we will not learn about SVD until we come to Naive Bayes or some other Language Processing Algorithms. Similarly, we will lean inverse and transpose of a matrix only when we solve linear regression problems. 

For now, we will just be learning the basics of Linear Algebra.

<div id="linear-algebra-basics"/> <h2> Linear Algebra Basics </h2>

We will cover the fundamental building blocks of linear algebra, which are vectors and matrices. Higher dimensional data structures like tensors are also important, but for now, we will do with just vectors and matrices.

<div id="vectors"/> <h3> Vectors </h3>

Before we understand a vector, we have to understand what a <em> scalar </em> is. Let's take a simple example. 

Say a bunch of students are taking home schooling and they are trying to decide which teacher to select. Each teacher has specific skillsets and have proven to boost the performance of students by a certain percentage. Let's take the simplest case here. A single student "Ajay" gets an average CGPA of 3.0 . And a teacher t1 can boost the performance by 20 % . 

What is the expected CGPA of Ajay, after getting coached from teacher 1 ?

<img src="./pics/scalar-x-scalar.png"/>

That's right - 3.6 . It is a simple multiplication of 2 numbers. How do you program it ?

In [6]:
student_1 = 3.0
teacher_1 = 1.2

student_1 = student_1 * teacher_1
round( student_1, 2 )

3.6

Say, the student Ajay has 3 subjects - Math, Physics and Chemistry. What will be the individual performance boost to each of the subjects ?

<img src="./pics/vector-x-scalar.png"/>

This seems simple enough as well. However, there is a concept called Broadcasting that is happening here. Although the multiplication process seems intuitive enough, we have to understand the way the multiplication is happening here.

In [None]:
import numpy as np

students = np.array([  [0.9,0.7,0.6],
                       [0.8,0.6,0.5],
                       [0.7,0.6,0.8]])

In [17]:
teachers = np.array( [ [0.6,0.8],
                       [0.9,0.7],
                       [0.7,0.9]])

In [18]:
scoring = students.dot(teachers)
scoring

array([[1.59, 1.75],
       [1.37, 1.51],
       [1.52, 1.7 ]])

In [6]:
scoring.sum(axis=0)

array([6.41, 6.39])

In [22]:
teacher1 = np.array( [1.2])

In [23]:
boost = students * teacher1
boost

array([[1.08, 0.84, 0.72],
       [0.96, 0.72, 0.6 ],
       [0.84, 0.72, 0.96]])

In [24]:
teacher_subject = np.array ( [1.2,1.3,1.1])

In [25]:
boost = students * teacher_subject
boost

array([[1.08, 0.91, 0.66],
       [0.96, 0.78, 0.55],
       [0.84, 0.78, 0.88]])