In [2]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.integrate
import time
%matplotlib notebook 
# For plotting. Don't include this if you submit
                     # a Jupyter Notebook to Gradescope.

# Linear algebra: vectors, matrices, and systems of equations.

Throughout this class we have seen several examples of vectors and matrices, though we have not used that terminology. Instead, we have been using the more general word *array.* 

For example, when we were finding the minimum of a 2D function using gradient descent, we had a guess/iteration, e.g., `p = np.array([6,4])`. We created this in python and can check the shape of it using the following code.

In [None]:
# Create p0
p0 = np.array([6,4])
# Print the shape of p0, using p0.shape
print(p0.shape)

Notice that the *shape* of this is (2,). That means it has 2 entries. Remember that we can take the *norm* of this array using the formula $\|p_0\| = \sqrt{6^2 + 4^2},$ which comes from Pythagorean's theorem. We can also rewrite this $\|p_0\|^2 = 6^2 + 4^2$. 

Now instead of thinking about $p_0$ as an array, we are going to call it a *vector.* A vector is the same as a 1-Dimensional array, except vectors come in the two forms *column vector* and *row vector.* For example, we could say
$$ p_0 = \begin{pmatrix} 6 \\ 4\end{pmatrix}, \qquad \text{ or } \qquad p_0 = \begin{pmatrix} 6 & 4 \end{pmatrix}.$$
The first way of writing it is as a column vector, the second is as a row vector. We use this language because the first example is a single column and the second example is a single row.

Right now we haven't added anything new, we are just using different language *to store information* (vectors, like arrays, hold information: data, variables, etc.) A new important characteristic for vectors is how we multiply them. For example, what is $p_0\cdot p_0$? **For vectors (and matrices, as we will see later), we can only multiply if the dimensions match up.**

Let's define these vectors in python. To do so,  define
$$ p_0 = \begin{pmatrix} 6 \\ 4\end{pmatrix}, \qquad \text{ and } \qquad q_0 = \begin{pmatrix} 6 & 4 \end{pmatrix}.$$
$p_0$ and $q_0$ hold the same information but are different because one is a column and one is a row. Let's define them in python.

In [6]:
p_0 = np.array([[6], [4]])
q_0 = np.array([[6,4]])

# Print the shape of each
print(p_0.shape)
print(q_0.shape)

(2, 1)
(1, 2)


Notice the difference. Before when we defined arrays we just had one set of brackets. Now we need to say what goes in the rows and what goes in the columns. We read the size of $p_0$ as "2 rows, 1 column" and the size of $q_0$ as "1 row, 2 columns." 

**The important rule about matrix and vector multiplication, is that you can only multiply two things, $x$ and $y$ if their *inner* dimensions agree.** In other words, we **can** multiply $q_0 \cdot p_0$ because the dimensions are $(1,2) \times (2,1)$. The *inner dimension* of 2 agrees. To do this in python, we use `@`. 

In [9]:
answer =  q_0 @ p_0
print(answer)

[[52]]


I will demonstrate how to do this multiplication by hand in class. **You will not need to do extensive multiplication by hand, but it will be necessary to know how it works when we talk about *systems of equations.*** 

Notice that the answer is $52 = 36 + 16 = 6^2 + 4^2 = \|p_0\|^2$. Indeed, $\|p_0\|^2 = p_0^\intercal p_0$, where the $.^\intercal$ operator is called the *transpose* and works by changing rows to columns and columns to rows. Note that $q_0 = p_0^\intercal$ and $p_0 = q_0^\intercal.$

### Systems of linear equations
Okay, so why does this all matter? *Linear algebra allows us to work compactly and efficiently with large sets of data. This is primarily due to how multiplication is defined.* The first place this usually becomes helpful is in solving *systems of linear equations.* 

A linear system is a system of 1 or more equations that is only linear in the unknown variables. For example,
$$
\begin{align*}
x + 3y − z &= 0 ,\\
z &= 12,\\
12z − x &= 0,
\end{align*}
$$
is linear because there are no terms with multiple $x$, $y$, or $z$ values multiplying one another.
$$
\begin{align*}
x + 3y − z &= 0,\\
xz &= 12, \\
12z − x &= 0
\end{align*}
$$
is not a linear system of equations because of the $xz$ term. 

Linear systems can always be written compactly using linear algebra in the form $Ax = b$. For example,
$$
\begin{align*}
x + 3y − z &= 0,\\
z &= 12,\\
12z − x &= 0,
\end{align*}
$$
can be written as $Aw = b$ where
$$
A = \begin{pmatrix} 1 & 3 & -1 \\ 0 & 0 & 1 \\ -1 & 0 & 12 \end{pmatrix},  \qquad w = \begin{pmatrix} x \\ y \\ z \end{pmatrix}, \qquad b = \begin{pmatrix}0 \\ 12 \\ 0 \end{pmatrix}.
$$
(Note that I am using the variable $w$ to represent *the column vector of unknowns* instead of $x$, since one of our unknowns is called $x$). The variable $A$ is called a *matrix.* It has 3 rows and 3 columns. We can think about it as having 3 columns, each of which is a row with 3 elements, or as having 3 rows, each of which is a column with 3 elements.

In order to understand this equation, and particular the $Aw$ term, we need to know how matrix-vector multiplication works. The way we can do this, is by thinking about each equation on its own. The first equation, $x + 3y - z = 0$ can be written as the vector-vector product
$$ \begin{pmatrix} 1 & 3 & -1 \end{pmatrix} \begin{pmatrix}x \\ y \\ z\end{pmatrix} = 0.$$
You should take a minute to verify this yourself. The second equation, $z = 12$, is the same as
$$ 
\begin{pmatrix} 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \end{pmatrix} = 12.
$$
Finally, $12z - x = 0$ is the same as 
$$
\begin{pmatrix}
-1 & 0 & 12 
\end{pmatrix}
\begin{pmatrix} x \\ y \\ z \end{pmatrix} = 0.
$$
So now we can see how the matrix is defined: it takes the three *component* vectors for the *vector of unknowns* in terms of $x,~y,$ and $z$. We can also see how matrix-vector multiplication works. The result of the matrix-vector multiplication should be the right-hand side of the three equations above:
$$
b = \begin{pmatrix}
0 \\ 12 \\ 0
\end{pmatrix}.
$$
I will demonstrate by hand again how the matrix-vector multiplication works. You should practice it.

You can do matrix-vector multiplication in python using `@`. For example, we could compute $Ab$ (not related to the $Aw = b$ above) because $A$ has dimensions $3\times 3$ and $b$ has dimensions $3\times 1$. The output will be dimension $3 \times 1$:

In [10]:
# Define A
A = np.array([[1, 3, -1], [0, 0, 1], [-1, 0, 12]])
# Define b
b = np.array([[0],[12],[0]])

# Print the shape of A and b
print(A.shape)
print(b.shape)

# Calculate A*b
answer = A@b
print(answer.shape)
print(answer)

(3, 3)
(3, 1)
(3, 1)
[[36]
 [ 0]
 [ 0]]


But we don't really want to compute $Ab$. What we really want to know when we write $Aw = b$ is **what is w?** What are my $x,~y,$ and $z$? We solve equations like this in python using `np.linalg.solve(A, b)`.

In [11]:
# Solve for w
w = np.linalg.solve(A, b)
# Print w
print(w)

[[144.]
 [-44.]
 [ 12.]]


### Systems of linear ODEs
On the homework, we have a different type of *linear* equations, but still linear equations. That means that we can write them with matrix-vector multiplication. Coding Problem 3e, 
$$x''(t) - \mu x'(t) + x(t) = 0$$
can be written as (you should check this!)
$$
\begin{align*}
x'(t) &= y(t)\\
y'(t) &= \mu y(t) - x(t).
\end{align*}
$$
Let's see if we can extract the vectors and matrix! First off, I will define my *vector of unknowns*:
$$
w = \begin{pmatrix} x(t) \\ y(t) \end{pmatrix}.
$$
**This is the thing I want to solve for.** What else do I have? I have *the derivative of the $w$ vector,*
$$
w'(t) = \begin{pmatrix} x'(t) \\ y'(t) \end{pmatrix}.
$$
So I can write this system of equations as
$$
w'(t) = A w(t),
$$
but what is $A$? We will have to do some matrix-vector multiplication to check. We need
$$
w'(t) = \begin{pmatrix} x'(t) \\ y'(t) \end{pmatrix} = A w(t) = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} x(t) \\ y(t)\end{pmatrix}.
$$
I'll give you a minute to think about this.

What you should find is that 
$$
A = \begin{pmatrix} 
0 & 1 \\ -1 & \mu
\end{pmatrix}.
$$
Using this $A$ matrix will help us with the numerical methods: it helps simplify the notation (see the homework, for example!).

## Solving linear systems

We gave one example above about solving linear systems. But let's think about an example that matters more to us. This comes from Homework 7 Coding Problem 3(i). In it, we are using Backward-Euler on the system defined above, $$x''(t) - \mu x'(t) + x(t) = 0,$$ except in linear form:
$$
w'(t) = A w(t),
$$
where $A$ is defined above. Notice that we can rewrite this as $w'(t) = f(w)$. What is $f(w)$? Well $f(w) = Aw$: it is matrix-vector multiplication, giving **a column vector.** Applying the Backward-Euler formula to the ODE gives
$$
w_{n+1} = w_n + \Delta t f(w_{n+1}) = w_n + \Delta t A w_{n+1}.
$$
We can rewrite this as $$w_n = w_{n+1} - \Delta t A w_{n+1} = (1 - \Delta t A) w_{n+1},$$
(almost). Why *almost*? Because what is 1 + a matrix? For that matter, what is $\Delta t A$? First, know that **when we multiply a matrix by a number (scalar), you get a matrix again: it just multiplies every entry.** For example,
$$
\Delta t A = \begin{pmatrix} 0 & \Delta t \\ -\Delta t & \mu \Delta t\end{pmatrix}.
$$
So how about $1 - \Delta t A$? **The $1$ we want here is called *the identity matrix*, $I$.** It is defined by
$$
I = \begin{pmatrix} 1 & 0 \\ 0 & 1\end{pmatrix}.
$$
It is called *the identity matrix* because $Iz = z$ for any vector $z$. Let's check that:
$$
I z = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} = \begin{pmatrix} z_1 \\ z_2 \end{pmatrix}
$$
(I'll check this by hand in class, you should too). 

This means we need to solve $(I - \Delta t A)w_{n+1} = w_n$ for $w_{n+1}$ (because $w_n$ is known) at each time step. For simplicitly, we'll define $C = I - \Delta t A$ and write this as $C w_{n+1} = w_n$. 

Let's do an example. On the homework we have initial condition $$w_0 = \begin{pmatrix} 2 \\ 0 \end{pmatrix}.$$ To get $w_1$, we need to solve $C w_1 = w_0.$ Let's do that in python.

In [5]:
# Define w0
w0 = np.array([[2], [0]])
# Define C
# To do so, first define I using np.eye(2)
I = np.eye(2)
# Then define dt = 0.01 
dt = 0.01
# To define A, we need to define mu = 200
mu = 200
A = np.array([[0, 1], [-1, mu]])
# Now we can define C from these variables.
C = I - dt*A
# print C to see what it looks like
print(C)

# Now solve! 
w1 = np.linalg.solve(C, w0)
# print w1
print(w1)

[[ 1.   -0.01]
 [ 0.01 -1.  ]]
[[2.00020002]
 [0.020002  ]]


Now we have $w_1$ and can find $w_2$! I'll leave that to you on the homework. I want to first show you that despite us being careful about the vector dimensions (defining them as columns), python would actually work just fine here if we used only arrays. Do the same example above, except use just regular arrays for w0.

In [6]:
# Redefine w0
w0 = np.array([2, 0])

# Solve, just as before
w1 = np.linalg.solve(C, w0)
# print w1
print(w1)

[2.00020002 0.020002  ]


Notice that *this looks like a row vector* but in reality it's not even a vector, it's that 1D *array* that python uses. 

## Data matrices
In the activity on Wednesday we will explore how matrices can be used to store different pieces of information. I'll introduce another one right now. We will use this data matrix frequently next week when we develop and employ a *movie recommendation algorithm* based on your movie ratings.

I will be posting a survey on Canvas on Wednesday. If you complete it by Monday next week **you will get 5 points of extra credit for this class.** In order to get the extra credit, your responses need to be serious and follow the instructions! Submitting the survey with no information provided will not earn the extra credit.

As an example of what we will have, is a list of movies and the score (out of 5) for each movie people have seen. For instance, if the 4 movies we are ranking are *The Batman*, *Shrek*, *Spider-Man: No Way Home*, and *Pulp Fiction*, each person will have a score for those movies. For example, a student in the class *Adnan* may provide the following rankings:
$$ \text{Adnan} = [3, 4, 2, 1] $$
meaning that they rank The Batman 3, Shrek 4, etc. Suppose we also have the results for 4 more students:
$$ 
\begin{align*}
\text{Bing} &= [5, 1, 3, 1]\\
\text{Catherine} &= [1, 1, 2, 4]\\
\text{Ding} &= [3, 3, 3, 3]\\
\text{Eric} &= [2, 1, 4, 4]
\end{align*}
$$
We can put all 5 pieces of information into a matrix of shape $5 \times 4$. Each row will represent the scores for the 4 movies for each person:
$$
A = \begin{pmatrix}
3 & 4 & 2 & 1 \\
5 & 1 & 3 & 1\\
1 & 1 & 2 & 4\\
3 & 3 & 3 & 3\\
2 & 1 & 4 & 4
\end{pmatrix}.
$$
This data matrix will be the beginning of our *movie-recommendation algorithm.*

## Matrices as an *operator* or *linear map*

We have seen that we can multiply vectors by matrices. We've also seen that this represents a function, e.g., $w'(t) = Aw(t) = f(w(t))$ in the ODE example above. 

Matrices represent *linear* transformations, meaning that they have a nice geometrical 