## Motivations

Linear algebra and matrices are a fundamental aspect of data science models and problems, including image processing, deep learning, NLP, and PCA. You will encounter matrices _many_ times in your career as a data scientist!

### Steal your archrival's thunder!

For now, the goal is to beat your archrival to the punch. You overheard him say that the secret to pre-processing some image for machine learning is to solve a system of equations.

Here's the set of equations that you spotted on your rival's screen:

\begin{equation}
\begin{bmatrix}
2 & -1 & 4 & 6 & 3 \\
4 & 7 & 1 & 1 & 12 \\
9 & 14 & 2 & 2 & 6 \\
1 & 1 & 1 & 2 & 17 \\
-3 & -2 & -6 & 12 & -5
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
x_4 \\
x_5
\end{bmatrix}
=
\begin{bmatrix}
3 \\
15 \\
20 \\
2 \\
-6
\end{bmatrix}
\end{equation}

How can we solve this system quickly?

## Beginning with Plain Old Algebra

Let's start with a one-variable "system" before moving on to two-, three-, or many-variable systems.

Suppose we start with a one-variable system like $2X = 10$.

How do we solve this?

Now consider a two-variable system:

$2X + 4Y = 10 \\
X + 4Y = 7$

### Solution through Substitution
We _could_ solve this system by taking the first equation, solving it for X, and then plugging the result into the second:

$2X + 4Y = 10$. <br/> Thus: $\\ 2X = 10 - 4Y \\ X = 5 - 2Y$.

Plugging in to the second equation, we have:

$5 - 2Y + 4Y = 7$. <br/> Thus: $\\ 5 + 2Y = 7 \\ 2Y = 2 \\ Y = 1$.

Plugging this back into the first equation, we have:

$2X + 4 = 10$.  <br/> Thus: $\\ 2X = 6 \\ X = 3$.

And we have our solutions:  $X = 3, Y = 1$.

But this is computationally _very slow_! There is a better way:

### Solution through Elimination

Much faster is to subtract the second equation from the first:

If $2X + 4Y = 10$ and $X + 4Y = 7$,
then $(2X - X) + (4Y - 4Y) = 10 - 7$, i.e. $X = 3$. Then I could subtract this ($X + 0Y = 3$) from $X + 4Y = 7$, yielding: $4Y = 4$, i.e. $Y = 1$.

We can represent this in matrix form using the equations as our rows. The columns will correspond to the variables:


$\begin{bmatrix}
2 & 4 & 10 \\
1 & 4 & 7
\end{bmatrix}$

$\rightarrow \begin{bmatrix}
1 & 0 & 3 \\
1 & 4 & 7
\end{bmatrix}$

$\rightarrow \begin{bmatrix}
1 & 0 & 3 \\
0 & 4 & 4
\end{bmatrix}$

$\rightarrow \begin{bmatrix}
1 & 0 & 3 \\
0 & 1 & 1
\end{bmatrix}$

This is the matrix way of saying that X = 3 and that Y = 1.

There are lots of strategies in linear algebra for "reducing" a matrix to a form where there are ones down the main diagonal and zeroes everywhere else (except the rightmost column), because such a matrix represents a list of "already solved" equations: <br/>
$X_1 + 0X_2 + ... + 0X_n = b_1 \\
0X_1 + X_2 + 0X_3 + ... + 0X_n = b_2 \\
. \\
. \\
. \\
0X_1 + ... + 0X_{n-1} + X_n = b_n$

## From Scalars to Vectors

A _scalar_ has simply a single value. Any real number can be the value of a scalar.

A _vector_ must be specified by _two_ parameters: magnitude and direction. In a Cartesian coordinate system, a vector $\vec{v}$ will generally be specified by its x- and y-components, $v_x$ and $v_y$.

In that case: <br/>
    \- The magnitude of $v$ is given by $||v|| = \sqrt{v^2_x + v^2_y}$ <br/>
    \- The direction of $v$ is given by $\theta = tan^{-1}\left(\frac{v_y}{v_x}\right)$

## Vector Arithmetic

### Vector Addition

Vector addition is simple: Just add the x- and the y-components together:

$(8, 14) + (7, 6) = (15, 20)$

In [2]:
import numpy as np
# Code it!

# Consider the vectors (8, 14) and (7, 6). Let's try using Python
# to add them together.

vec_1 = (8, 14)
vec_2 = (7, 6)

vec_1 + vec_2 == (15, 20) #Result is False 

np.array(vec_1) + np.array(vec_2)


array([15, 20])

What happened? Check with a partner to make sure you understand how Python interpreted our code here. Why did we use '==' instead of '='?

In [13]:
# Try typing 'vec_1.' and then pressing TAB. What options do we have here?

vec_1.all

<function ndarray.all>

Base Python is not particularly good for non-scalar arithmetic. This is one of many places where NumPy can come in very handy!

In [14]:
# Let's try this again, but this time we'll use NumPy arrays:

import numpy as np

vec_1, vec_2 = np.array([8, 14]), np.array([7, 6])
vec_1 + vec_2

array([15, 20])

### Vector Multiplication

Is base Python any better for vector _multiplication_?

In [19]:
# Try multiplying the vectors (4, 14) and (8, 6)):

# vec_1 = (4, 14)
# vec_2 = (8, 6)

# vec_1 * vec_2
#No this does not work 

What happened? Why did we get an error?

In fact there are multiple ways of understanding the notion of vector multiplication. All are potentially useful, but the one that we'll likely be of most use is the *dot-product*, which is defined as follows:


\begin{equation}
\begin{bmatrix}
a & b \\
\end{bmatrix}
. 
\begin{bmatrix}
c \\
d
\end{bmatrix}
=
ac + bd
\end{equation}

The dot-product is the sum of the pariwise products of the vectors' entries.

In [20]:
# Now that we've got the vectors stored as NumPy arrays, let's once again
# try typing 'vec_1.' and then pressing TAB.

# Now we have many options! Notice that one of these options is 'dot'.
# This is our dot-product!

vec_1.dot(vec_2)


140

In [32]:
# Use the .dot() method to calculate the dot-product of our two vectors:

# Your code here!



#### Cross-Products

The cross-product is defined for two- and three-dimensional vectors.

Cross-multiplying vectors is much like multiplying polynomials: Each component of each vector must be multipled by each component of the other  vector. The extra bit to remember is how the cross products of the unit vectors work:

Typically we write:

$\hat{i}$: the unit vector in the x-direction <br/>
$\hat{j}$: the unit vector in the y-direction <br/>
$\hat{k}$: the unit vector in the z-direction

**The cross-product of each unit vector with itself is 0.**

Also:

$\large\hat{i}\times\hat{j} = \hat{k}$ <br/>
$\large\hat{j}\times\hat{k} = \hat{i}$ <br/>
$\large\hat{k}\times\hat{i} = \hat{j}$

Order matters! If we multiply in the other direction, we get (-) signs:

$\large\hat{j}\times\hat{i} = -\hat{k}$ <br/>
$\large\hat{k}\times\hat{j} = -\hat{i}$ <br/>
$\large\hat{i}\times\hat{k} = -\hat{j}$

In general, the product vector of two vectors is _orthogonal_ to the plane determined by those two vectors.

**Examples**:

- $(2, 3)\times(4, 5) = (2\hat{i} + 3\hat{j})\times(4\hat{i} + 5\hat{j}) \\ = (2)(4)(\hat{i}\times\hat{i}) + (2)(5)(\hat{i}\times\hat{j}) + (3)(4)(\hat{j}\times\hat{i}) + (3)(5)(\hat{j}\times\hat{j}) \\= 10\hat{k} - 12\hat{k} \\= -2\hat{k}$
<br/>

- $(3, 4, 5)\times(-1, -2, -1) = (3\hat{i} + 4\hat{j} + 5\hat{k})\times(-\hat{i} -2\hat{j} -\hat{k}) \\= \hat{i}(-4 + 10) + \hat{j}(-5 + 3) + \hat{k}(-6 + 4) \\= 6\hat{i} - 2\hat{j} -2\hat{k}$

You can use ```numpy.cross()``` for the cross-product of two vectors.

In [3]:
# Use ```numpy.cross()``` to calculate the cross-product of
# (2, 3) and (3, 5). Does the answer make sense to you?

vec_1 = np.array([2,3])
vec_2 = np.array([3,5])

np.cross(vec_1, vec_2)

array(1)

In [4]:
# Use ```numpy.cross()``` to calculate the cross-product of
# (2, 3, 1) and (3, 5, 4). Does the answer make sense to you?

vec_a = np.array([2,3,1])
vec_b = np.array([3,5,4])

np.cross(vec_a,vec_b)


array([ 7, -5,  1])

## Higher Dimensions: From Vectors to Matrices

For higher dimensions we can use _matrices_ to express ourselves. Suppose we had a two-variable system:

\begin{align}
a_{1,1}x_1 + a_{1,2}x_2 = c_1 \\
a_{2,1}x_1 + a_{2,2}x_2 = c_2
\end{align}

We can write this as:

$A\vec{x} = \vec{c}$,

where now $\vec{x}$ is the _vector_ $(x_1, x_2)$ and $\vec{c}$ is the _vector_ $(c_1, c_2)$.

Similarly, $A$ is the _matrix_ of coefficients that describe our system:
\begin{equation} A = 
\begin{bmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2}
\end{bmatrix}
\end{equation}

and

\begin{equation}
\begin{bmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2}
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix} =
\begin{bmatrix}
c_1 \\
c_2
\end{bmatrix}
\end{equation}

## Different Ways to Multiply

Just as there were different notions of "multiplication" for vectors, so too there are different notions of multiplication for matrices.

### Hadamard Product
The Hadamard product, for example, is analogous to matrix addition, and proceeds element-wise:

\begin{equation}
\begin{bmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2}
\end{bmatrix}
\circ
\begin{bmatrix}
b_{1,1} & b_{1,2} \\
b_{2,1} & b_{2,2}
\end{bmatrix}
=
\begin{bmatrix}
a_{1,1}\times b_{1,1} & a_{1,2}\times b_{1,2} \\
a_{2,1}\times b_{2,1} & a_{2,2}\times b_{2,2}
\end{bmatrix}
\end{equation}

Note that Hadamard multiplication requires that the matrices being multiplied have the same dimensions.

In [5]:
# The Hadamard product is very easy in NumPy: Just use *!

# Use NumPy to calculate the Hadamard product of
# [[1, 2], [1, 2]] and [[3, 4], [5, 9]]

# Your code here!

mat_a = np.array([[1,2],[1,2]])
mat_b = np.array([[3,4],[5,9]])

mat_a * mat_b

array([[ 3,  8],
       [ 5, 18]])

### Dot-Product
Very often when people talk about multiplying matrices they'll mean the dot-product:

\begin{equation}
\begin{bmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2}
\end{bmatrix}
\times
\begin{bmatrix}
b_{1,1} & b_{1,2} \\
b_{2,1} & b_{2,2}
\end{bmatrix}
=
\begin{bmatrix}
a_{1,1}\times b_{1,1} + a_{1,2}\times b_{2,1} \\
a_{2,1}\times b_{1,2} + a_{2,2}\times b_{2,2}
\end{bmatrix}
\end{equation}

Take the entries in each _row_ of the left matrix and multiply them, respectively, by the entries in each _column_ of the right matrix, and then add them up. This is the product we calculated above with our two vectors!

Note that matrix dot-multiplication is NOT commutative! In general, $AB \neq BA$.

#### A note about vectors and matrices

Strictly speaking, this is true for vectors as well. Above, we multiplied the _row_-vector $(a, b)$ by the _column_-vector $(c, d)$. A row-vector is simply a matrix with only one row; a column-vector is simply a matrix with only one column. What would be the result of multiplying the column-vector $(c, d)$ on the left by the row-vector $(a, b)$ on the right?

Ans.:

\begin{equation}
\begin{bmatrix}
c \\
d
\end{bmatrix}
\times
\begin{bmatrix}
a & b
\end{bmatrix}
=
\begin{bmatrix}
ca & cb \\
da & db
\end{bmatrix}
\end{equation}

#### End of note

Observe also that in order to be able to perform the dot product on two matrices A and B, the number of columns of A must equal the number of rows of B.

Also, the number of rows of the _product_ matrix will equal the number of rows of A, and the number of columns of the product matrix will equal the number of columns of B.

In order to solve an equation like $A\vec{x} = \vec{c}$ for $\vec{x}$, we can't very well divide $\vec{c}$ by $A$! But there is a notion of matrix _inversion_ that is relevant here, which is analogous to multiplicative inversion. If we have an equation like $2x = 10$, we can simply multiply both sides by the multiplicative inverse of the coefficient of $x$, viz. $2^{-1}$. And here the point, of course, is that $2^{-1} \times 2 = 1$.

In the higher-dimensional case, what we can do is to left-multiply both sides by the _inverse matrix_ of A, denoted $A^{-1}$, and here the point is that the dot-product $A^{-1}A = I$, where $I$ is the identity matrix containing 1's along the main diagonal (upper-left to lower-right) and 0's everywhere else.

Using NumPy arrays, dot-multiply the matrices
\begin{bmatrix}
3 & 2 \\
5 & 7
\end{bmatrix}

and

\begin{bmatrix}
2 & 4 \\
3 & 10
\end{bmatrix}

in the code-cell below. Remember that you need square brackets around the whole array!

In [8]:
# Your code here!

mat_1 = np.array([[3,2],[5,7]])
mat_2 = np.array([[2,4],[3,10]])

mat_1.dot(mat_2)

array([[12, 32],
       [31, 90]])

## Tensors

Sometimes you will encounter _tensors_ in your work. A tensor is to a matrix as a matrix is to a vector. A vector has one representational dimension and a matrix has two. If you need an object with three or more representational dimensions, you're talking about a tensor. A tensor has rows (that run from left to right), columns (that run from top to bottom), and _tubes_ (that run from front to back).

## Typical Data Science Problems

Consider a typical dataset and the associated multiple linear regression problem. We have many observations (rows), each of which consists of a set of values both for the predictors (columns, i.e. the independent variables) and for the target (the dependent variable).

We can think of the values of the independent variables as our matrix $A$ of coefficients and of the values of the dependent variable as our output vector $\vec{c}$.

The task here is, in effect, to solve for $\vec{\beta}$, where we have that $A\vec{\beta} = \vec{c}$, except in general we'll have more rows than columns. This is why we won't in general be computing matrix inverses. (They're computationally expensive, anyway.) This is also why we have a problem requiring not a direct solution but rather an optimization--in our case, a best-fit line.

Using $z$ for our independent variables and $y$ for our dependent variable, we have:


\begin{equation}
\beta_1\begin{bmatrix}
z_{1,1} \\
. \\
. \\
. \\
z_{m,1}
\end{bmatrix} +
... + \beta_n\begin{bmatrix}
z_{1,n} \\
. \\
. \\
. \\
z_{m,n}
\end{bmatrix} = \begin{bmatrix}
y_1 \\
.  \\
.  \\
.  \\
y_m
\end{bmatrix}
\end{equation}

## Using NumPy to Solve a System of Linear Equations

NumPy's ```linalg``` module has a ```.solve()``` method that you can use to solve a system of linear equations!

In particular, it will solve for the vector $\vec{x}$ in the equation $A\vec{x} = b$. You should know that, "under the hood", the ```.solve()``` method does NOT compute the inverse matrix $A^{-1}$. Check out this discussion on stackoverflow for a helpful discussion: https://stackoverflow.com/questions/31256252/why-does-numpy-linalg-solve-offer-more-precise-matrix-inversions-than-numpy-li

And check out the documentation for ```.solve()``` here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html

In [10]:
# Code it!

# Use the .solve() method to solve your rival's system and
# steal his thunder!

X = np.array([[2,-1,4,6,3], [4,7,1,1,12],
            [9,14,2,2,6], [1,1,1,2,17],
            [-3,-2,-6,12,-5]])

y = np.array([3,15,20,2,-6])


In [12]:
np.linalg.solve(X,y)

array([-14.51240389,   9.46394893,   8.31031481,   1.49992746,
        -0.2506891 ])

## (Bonus) Broadcasting

If you try to do arithmetic with differently sized arrays, NumPy will actually replicate the smaller array in order to perform the operation. Read about broadcasting here: https://machinelearningmastery.com/broadcasting-with-numpy-arrays/.