# Perspectives on matrix multiplication


Everyone seems to learn how to multiply matrices ([matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication)) in high school.  
We compute the product $C=AB$ of an $m \times n$ matrix $A$ with an $n \times p$ matrix $B$ to produce an $m \times p$ matrix $C$.

Did you ever wonder why "matmul" has such a fancy definition?

When we add matrices we add elements.  Why coudn't matmul be just as easy?

## Compare Elementwise Multiply

Of course the elementwise multiply is doable but never seems to be quite as important:

(I'll bet your high school teacher never mentioned elementwise multiply!)


In [9]:
A=[1 2
   3 4]
B=[1 2
   3 4]
@show(A.*B)    # Elementwise times is the "dot star"
@show(A*B);    # Matmul is just the "star"

A .* B = [1 4; 9 16]
A * B = [7 10; 15 22]


For square n x n matrices, elementwise multiply requires $n^2$ operations, while matmul requires about $2n^3$. (Think $n^2$ dot products, each requiring $n$ mults and almost $n$ adds.)

## Raising the Abstraction

Why is matmul defined this way?  We will find out later in the course when we begin to understand that a matrix represents a linear transformation, and matmul is the natural representation of the composition of transformations.  It is only then you can understand the true nature of matrix multiplication.  (Bet your high school teacher never told you that!)

One of our goals in 18.06 is to sometimes stop thinking of matrices as arrays of numbers, and more as wholistic objects.

Abstractly, the rules for matrix multiplication are determined once you define how to multiply matrices by vectors $Ax$, the central [linear operation](https://en.wikipedia.org/wiki/Linear_map) of 18.06, by requiring that multiplication be [associative](https://en.wikipedia.org/wiki/Associative_property).  That is, we require:
$$
A(Bx)=(AB)x
$$
for all matrices $A$ and $B$ and all vectors $x$.  The expression $A(Bx)$ involves only matrix × vector (computing $y=Bx$ then $Ay$), and requiring this to equal $(AB)x$ actually uniquely defines the matrix–matrix product $AB$.

## Perspective 1 (high school!): rows × columns

Regardless of how you derive it, the end result is the familar definition that you take **dot products of rows of A with columns of B** to get the product $C$.  For example:
$$
\begin{pmatrix}
 -14 &   5 & 10 \\
  \color{red}{-5} & -20 & 10 \\
  -6 &  10 &  6
\end{pmatrix} =
\begin{pmatrix}
 2 & -1 & 5 \\
  \color{red}{3} &  \color{red}{4} & \color{red}{4} \\
 -4 & -2 & 0
\end{pmatrix}
\begin{pmatrix}
\color{red}{1}  & 0 & -2 \\
  \color{red}{1} & -5 &  1 \\
 \color{red}{-3} &  0 &  3
\end{pmatrix}
$$
where we have highlighted the entry $\color{red}{-5 = 3 \times 1 + 4 \times 1 + 4 \times -3}$ (second row of $A$ ⋅ first column of $B$).

This can be written out as the formula
$$
c_{ij} = \sum_{k=1}^n a_{ik} b_{kj}
$$
in terms of the entries of the matrices, e.g. $c_{ij}$ is the entry in row $i$, column $j$ of $C$, assuming $A$ has $n$ columns and $B$ has $n$ rows.

Essentially all matrix multiplications in practice are done with a version of this formula — at least, with the same operations, but often the *order* in which you multiply/add individual numbers is re-arranged.

**In this notebook, we will explore several ways to *think* about these operations by re-arranging their order.**

In [18]:
A = [ 2  -1  5
      3   4  4
     -4  -2  0]
B = [ 1   0  -2
      1  -5   1
     -3   0   3]
C = A * B

3×3 Array{Int64,2}:
 -14    5  10
  -5  -20  10
  -6   10   6

In [29]:
## You can write your own little program if you want to be sure you understand the algorithm:

function my_own_matmul(A,B)
   m,n1 = size(A)
   n2,p = size(B)
   if n1≠n2 error("No good, n1=$(n1) ≠ n2=$(n2)") end
    
   C = [  A[i,:] ⋅ B[:,j] for i=1:m, j=1:p ]  # Matrix of dot products (explained below)
       
end
        

my_own_matmul (generic function with 1 method)

In [30]:
my_own_matmul(A,B)

3×3 Array{Int64,2}:
 -14    5  10
  -5  -20  10
  -6   10   6

In [31]:
my_own_matmul( rand(3,3), rand(2,3))

LoadError: [91mNo good, n1=3 ≠ n2=2[39m

Because matrix multiplication is generally [not commutative](https://en.wikipedia.org/wiki/Commutative_property), $AB$ and $BA$ give *different* matrices:

In [32]:
A*B - B*A

3×3 Array{Int64,2}:
 -24   2   5
  12   3  25
  12  13  21

If we want, we can compute the individual dot products in Julia too.   For example, let's compute $c_{2,1} = -5$ (the 2nd row and first column of $C$, or `C[2,1]` in Julia) by taking the dot product of the second row of $A$ with the first column of $B$.

To extract rows and columns of a matrix, Julia supports a syntax for "array slicing" pioneered by Matlab.  The second row of $A$ is `A[2,:]`, and the first column of `B` is `B[:,1]`:

In [33]:
A[2,:] # 2nd row of A

3-element Array{Int64,1}:
 3
 4
 4

In [34]:
B[:,1] # 1st column of B

3-element Array{Int64,1}:
  1
  1
 -3

Now we can compute $c_{2,1}$ by their dot product via the `dot` function:

In [35]:
dot(A[2,:], B[:,1])

-5

In [36]:
A[2,:] ⋅ B[:,1]

-5

This matches $c_{2,1}$ from above, or `C[2,1]` in Julia:

In [6]:
C[2,1]

-5

In [7]:
A[2,:]' * B[:,1]

-5

## The summation $$c_{ij} = \sum_{k=1}^n a_{ik} b_{kj}$$ directly in code

In [49]:
function matmul_ijk0(A,B)
   m,n = size(A)
   n2,p = size(B)
   if n≠n2 error("No good, n=$n ≠ n2=$(n2)") end
   
   C = fill(0,m,p) # m x p "zeros" matrix
    
   for i=1:m
     for j=1:p
        for k=1:n
          C[i,j] = C[i,j] + A[i,k]*B[k,j] 
            end
        end
    end
    return C  
end
      

matmul_ijk0 (generic function with 1 method)

In [50]:
matmul_ijk0(A,B)

3×3 Array{Int64,2}:
 -14    5  10
  -5  -20  10
  -6   10   6

## You like all those indices i,j,k, (I don't always), but you hate those three "for" loops?

In [51]:
function matmul_ijk(A,B)
   m,n = size(A)
   n2,p = size(B)
   if n≠n2 error("No good, n=$n ≠ n2=$(n2)") end
   
   C = fill(0,m,p) # m x p "zeros" matrix
    
   for i=1:m, j=1:p, k=1:n
          C[i,j] += A[i,k]*B[k,j]   # shorthand for C[i,j] = C[i,j] + A[i,k]*B[k,j] 
   end
    
   return C  
end

matmul_ijk (generic function with 1 method)

In [52]:
matmul_ijk(A,B)

3×3 Array{Int64,2}:
 -14    5  10
  -5  -20  10
  -6   10   6

## Perspective 2: matrix × columns

$AB$ can be viewed as multiplying $A$ on the *left* by each *column* of $B$.

For example, let's multiply $A$ by the first column of $B$:

In [8]:
A * B[:,1]

3-element Array{Int64,1}:
 -14
  -5
  -6

This is the first column of $C$!  If we do this to *all* the columns of $B$, we get $C$:

In [9]:
[ A*B[:,1]  A*B[:,2]  A*B[:,3] ] == C

true

Equivalently, each column of $B$ specifies a [linear combination](https://en.wikipedia.org/wiki/Linear_combination) of *columns* of $A$ to produce the columns of $C$.   So, **if you want to rearrange the *columns* of a matrix, multiply it by another matrix on the *right***.

For example, let's do the transformation that *flips the sign of the first column of $A$* and *swaps the second and third columns*.

In [10]:
A * [ -1  0  0
       0  0  1
       0  1  0  ]

3×3 Array{Int64,2}:
 -2  5  -1
 -3  4   4
  4  0  -2

As another example, let's swap the first two columns:

In [11]:
A * [ 0 1 0
      1 0 0
      0 0 1 ]

3×3 Array{Int64,2}:
 -1   2  5
  4   3  4
 -2  -4  0

In [58]:
function matmul_jik(A,B)
   m,n = size(A)
   n2,p = size(B)
   if n≠n2 error("No good, n=$n ≠ n2=$(n2)") end
   
   C = fill(0,m,p) # m x p "zeros" matrix
    
   for j=1:p, i=1:m, k=1:n
          C[i,j] += A[i,k]*B[k,j]   # shorthand for C[i,j] = C[i,j] + A[i,k]*B[k,j] 
   end
    
  ## recognize that the i,k loop above is really just a matrix times vector
  ## for j=1:p
  ##   C[:,j] = A * B[:,j]
  ## end
    
   return C  
end

matmul_jik (generic function with 2 methods)

A lot of students are perplexed.  They wonder how it could be legal to reorder in this way. 
It might take working through a few examples by hand to realize that from the perspective
of C[i,j], the same sum is accumulated in the same order, but the order in which the different elements of C finish may vary. This little Julia demo may help with this understanding.

In [53]:
function matmul_ijk(a,b,stop)
    step=0
    n=size(a,1)
    c=zeros(a)
    for i=1:n, j=1:n, k=1:n  
        if step==stop;  return(c); end
        c[i,j] +=  a[i,k] * b[k,j]
        step+=1
    end
    c
end

function matmul_jik(a,b,stop)
    step=0
    n=size(a,1)
    c=zeros(a)
    for j=1:n, i=1:n, k=1:n  
        if step==stop;  return(c); end
        c[i,j] +=  a[i,k] * b[k,j]
        step+=1
    end
    c
end

function matmul_ikj(a,b,stop)
    step=0
    n=size(a,1)
    c=zeros(a)
    for i=1:n, k=1:n, j=1:n  
        if step==stop;  return(c); end
        c[i,j] +=  a[i,k] * b[k,j]
        step+=1
    end
    c
end

function matmul_kij(a,b,stop)
    step=0
    n=size(a,1)
    c=zeros(a)
    for k=1:n, i=1:n, j=1:n  
        if step==stop;  return(c); end
        c[i,j] +=  a[i,k] * b[k,j]
        step+=1
    end
    c
end

function matmul_jki(a,b,stop)
    step=0
    n=size(a,1)
    c=zeros(a)
    for j=1:n, k=1:n, i=1:n  
        if step==stop;  return(c); end
        c[i,j] +=  a[i,k] * b[k,j]
        step+=1
    end
    c
end

function matmul_kji(a,b,stop)
    step=0
    n=size(a,1)
    c=zeros(a)
    for j=1:n, k=1:n, i=1:n  
        if step==stop;  return(c); end
        c[i,j] +=  a[i,k] * b[k,j]
        step+=1
    end
    c
end

matmul_kji (generic function with 1 method)

In [55]:
using Interact

In [57]:
n=5
o=fill(1,n,n)
@manipulate for stop=0:n^3
    matmul_ijk(o,o,stop)
end

5×5 Array{Int64,2}:
 5  5  5  5  5
 5  5  5  5  5
 5  5  2  0  0
 0  0  0  0  0
 0  0  0  0  0

In [72]:
## Perhaps a more fair matmul???
n=5
o=fill(1,n,n)
@manipulate for stop=0:n^3
    matmul_kij(o,o,stop)
end

5×5 Array{Int64,2}:
 3  3  3  3  3
 3  3  3  3  3
 3  3  2  2  2
 2  2  2  2  2
 2  2  2  2  2

In [59]:
n=5
o=fill(1,n,n)
@manipulate for stop=0:n^3
    matmul_jik(o,o,stop)
end

5×5 Array{Int64,2}:
 5  5  5  0  0
 5  5  5  0  0
 5  5  2  0  0
 5  5  0  0  0
 5  5  0  0  0

# more wholistic matrix times column view

In [69]:
function matmul_jik_whole(A,B,stop)
    step=0
    n=size(A,1)
    C=zeros(A)
    for j=1:n
        if step==stop;  return(C); end
        C[:,j] +=  A * B[:,j]
        step+=1
    end
    C
end

matmul_jik_whole (generic function with 1 method)

In [70]:
n=5
o=fill(1,n,n)
@manipulate for stop=0:n
    matmul_jik_whole(o,o,stop)
end

5×5 Array{Int64,2}:
 5  5  0  0  0
 5  5  0  0  0
 5  5  0  0  0
 5  5  0  0  0
 5  5  0  0  0

## Perspective 3: rows × matrix

$AB$ can be viewed as multiplying each *row* of $A$ by the matrix $B$ on the *right*.  Multiplying a [row vector](https://en.wikipedia.org/wiki/Row_and_column_vectors) by a matrix on the right produces another row vector.

For example, here is the first row of $A$:

In [12]:
A[1,:]

3-element Array{Int64,1}:
  2
 -1
  5

Whoops, slicing a matrix in Julia produces a 1d array, which is interpreted as a column vector, no matter how you slice it.  We can't multiply a column vector by a matrix $B$ on the *right* — that operation is not defined in linear algebra (the dimensions don't match up).  Julia will give an error if we try it:

In [13]:
A[1,:] * B

LoadError: [91mDimensionMismatch("matrix A has dimensions (3,1), matrix B has dimensions (3,3)")[39m

To get a row vector we must [transpose](https://en.wikipedia.org/wiki/Transpose) it.  In linear algebra, the transpose of a vector $x$ is usually denoted $x^T$.   In Julia, the transpose is `x.'`.

If we omit the `.` and just write `x'` it is the [complex-conjugate of the transpose](https://en.wikipedia.org/wiki/Conjugate_transpose), sometimes called the *adjoint*, often denoted $x^H$ (in matrix textbooks), $x^*$ (in pure math), or $x^\dagger$ (in physics).  For real-valued vectors (no complex numbers), the conjugate transpose is the same as the transpose, and correspondingly we usually just do `x'` for real vectors.

In [60]:
A[1,:]'

1×3 RowVector{Int64,Array{Int64,1}}:
 2  -1  5

Now, let's multiply this by $B$, which should give the first *row* of $C$:

In [61]:
A[1,:]' * B

1×3 RowVector{Int64,Array{Int64,1}}:
 -14  5  10

Yup!

Note that if we multiply a row vector by a matrix on the *left*, it doesn't really make sense.  Julia will give an error:

In [62]:
B * A[1,:]'

LoadError: [91mDimensionMismatch("matrix A has dimensions (3,3), matrix B has dimensions (1,3)")[39m

If we multiply $B$ on the right by *all* the rows of $A$, we get $C$ again:

In [17]:
[ A[1,:]'*B 
  A[2,:]'*B
  A[3,:]'*B ] == C

true

Equivalently, each row of $A$ specifies a linear combination of *rows* of $B$ to produce the rows of $C$.   So, **if you want to rearrange the *rows* of a matrix, multiply it by another matrix on the *left***.

For example, let's do the transformation that *adds two times the first row of $B$ to the third row, and leaves the other rows untouched*.  This is one of the steps of Gaussian elimination!

In [18]:
[ 1 0 0
  -1 1 0
  3 0 1 ] * B

3×3 Array{Int64,2}:
 1   0  -2
 0  -5   3
 0   0  -3

## Perspective 4: columns × rows

The key to this perspective is to observe:

* elements in column $i$ of $A$ only multiply elements in row $j$ of $B$
* a column times a row vector, sometimes denoted $xy^T$, is an [outer product](https://en.wikipedia.org/wiki/Outer_product) and produces a "rank-1" *matrix*

(See [this excellent paper by Gil Strang](http://mth1007.mathappl.polymtl.ca/MultFactMatrStrang.pdf) for more on this perspective applied to linear algebra. You will be in a better position to understand this at the end of 18.06, however.)

For example, here is column 1 of $A$ times row 1 of $B$:

In [19]:
A[:,1] * B[1,:]'

3×3 Array{Int64,2}:
  2  0  -4
  3  0  -6
 -4  0   8

If we do this for all three rows and columns and add them up, we get $C$:

In [20]:
A[:,1] * B[1,:]' + A[:,2] * B[2,:]' + A[:,3] * B[3,:]' == C

true

So, from this perspective, we could write:

$$
AB = \sum_{k=1}^3 (\mbox{column } k \mbox{ of } A) (\mbox{row } k \mbox{ of } B) = \sum_{k=1}^3 A[:,k] \, B[k,:]^T
$$

where in the last expression we have used Julia notation for slices.

## Perspective 5: submatrix blocks × blocks

It turns out that all of the above are special cases of a more general rule, by which we can break up a matrix in to "submatrix" blocks and multiply the blocks.  Rows, columns, etc. are just blocks of different shapes.

## Gaussian elimination: towards the wholistic view A=LU through elimination matrices

Let's look more closely at the process of Gaussian elimination in matrix form, using the matrix from lecture 1.

In [73]:
A = [1 3  1
     1 1 -1
     3 11 6]

3×3 Array{Int64,2}:
 1   3   1
 1   1  -1
 3  11   6

Gaussian elimination produces the matrix $U$, which we can compute in Julia as in lecture 1:

In [76]:
# LU factorization (Gaussian elimination) of the matrix A, 
# passing the ( will go away) option Val{false} to prevent row re-ordering
L, U = lu(A, Val{false}) 
U # just show U

3×3 Array{Float64,2}:
 1.0   3.0   1.0
 0.0  -2.0  -2.0
 0.0   0.0   1.0

Now, let's go through **Gaussian elimination in matrix form**, by **expressing the elimination steps as matrix multiplications.**  In Gaussian elimination, we make linear combination of *rows* to cancel elements below the pivot, and we now know that this corresponds to multiplying on the *left* by some *elimination matrix* $E$.

The first step is to eliminate in the first column of $A$.  The pivot is the 1 in the upper-left-hand corner.  For this $A$, we need to:

1. Leave the first row alone.
2. Subtract the first row from the second row to get the new second row.
3. Subtract $3 \times {}$ first frow from the third row to get the new third row.

This corresponds to multiplying $A$ on the left by the matrix `E1`.  As above (in the "row × matrix" picture), the three rows of `E1` correspond exactly to the three row operations listed above:

In [84]:
E1 = [ 1 0 0
      -1 1 0
      -3 0 1]

3×3 Array{Int64,2}:
  1  0  0
 -1  1  0
 -3  0  1

In [88]:
Int.(inv(E1))  ## What does this mean?

3×3 Array{Int64,2}:
 1  0  0
 1  1  0
 3  0  1

In [78]:
E1*A

3×3 Array{Int64,2}:
 1   3   1
 0  -2  -2
 0   2   3

As desired, this introduced zeros below the diagonal in the first column.  Now, we need to eliminate the 2 below the diagonal in the *second* column of `E1*A`.  Our new pivot is $-2$ (in the second row), and we just add the second row of `E1*A` with the third row to make the new third row.

This corresponds to multiplying on the left by the matrix `E2`, which leaves the first two rows alone and makes the new third row by adding the second and third rows:

In [79]:
E2 = [1 0 0
      0 1 0
      0 1 1]

3×3 Array{Int64,2}:
 1  0  0
 0  1  0
 0  1  1

In [80]:
E2*E1*A

3×3 Array{Int64,2}:
 1   3   1
 0  -2  -2
 0   0   1

As expected, this is upper triangular, and in fact the same as the `U` matrix returned by the Julia `lu` function above:

In [81]:
E2*E1*A == U

true

Thus, we have arrived at the formula:
$$
\underbrace{E_2 E_1}_E A = U
$$
Notice that we multiplied $A$ by the elimination matrices from *right to left* in the order of the steps: it is $E_2 E_1 A$, *not* $E_1 E_2 A$.  Because matrix multiplication is generally [not commutative](https://en.wikipedia.org/wiki/Commutative_property), $E_2 E_1$ and $E_1 E_2$ give *different* matrices:

In [82]:
E2*E1

3×3 Array{Int64,2}:
  1  0  0
 -1  1  0
 -4  1  1

In [83]:
E1*E2

3×3 Array{Int64,2}:
  1  0  0
 -1  1  0
 -3  1  1

Notice, furthermore, that the matrices $E_1$ and $E_2$ are both *lower-triangular matrices*.  This is a consequence of the structure of Gaussian elimination (assuming no row re-ordering): we always add the pivot row to rows *below* it, never *above* it.

The *product* of lower-triangular matrices is always lower-triangular too.  (In homework, you will explore a similar property for upper-triangular matrices)  In consequence, the product $E = E_2 E_1$ is lower-triangular, and Gaussian elimination can be viewed as yielding $EA=U$ where $E$ is lower triangular and $U$ is upper triangular.

# Inverse elimination: LU factors

However, in practice, it turns out to be more useful to write this as $A= E^{-1} U$, where $E^{-1}$ is the [inverse of the matrix](http://mathworld.wolfram.com/MatrixInverse.html) $E$.  We will have more to say about matrix inverses later in 18.06, but for now we just need to know that it is the matrix that **reverses the steps** of Gaussian elimination, taking us back from $U$ to $A$.  Computing matrix inverses is laborious in general, but in this particular case it is easy.   We just need to *reverse the steps one by one* starting with the *last* elimination step and working back to the *first* one.  

Hence, we need to reverse (invert) $E_2$ *first* on $U$, and *then* reverse (invert) $E_1$: $A = E_1^{-1} E_2^{-1} U$.  But reversing an individual elimination step like $E_2$ is easy: we just **flip the signs below the diagonal**, so that wherever we *added* the pivot row we *subtract* and vice-versa.  That is:
$$
\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix}^{-1} =
\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -1 & 1 \end{pmatrix}
$$
(The last elimination step was adding the second row to the third row, so we reverse it by *subtracting* the second row from the third row of $U$.)

Julia can compute matrix inverses for us with the `inv` function.  (It doesn't know the trick of flipping the sign, which only works for very special matrices, but it can compute it the "hard way" so quickly (for such a small matrix) that it doesn't matter.)   Of course that gives the same result:

In [30]:
inv(E2)

3×3 Array{Float64,2}:
 1.0   0.0  0.0
 0.0   1.0  0.0
 0.0  -1.0  1.0

Similarly for $E_1$:

In [31]:
inv(E1)

3×3 Array{Float64,2}:
 1.0  0.0  0.0
 1.0  1.0  0.0
 3.0  0.0  1.0

If we didn't make any mistakes, then $E_1^{-1} E_2^{-1} U$ should give $A$, and it does:

In [32]:
inv(E1)*inv(E2)*U == A

true

We call *inverse* elimination matrix $L = E^{-1} = E_1^{-1} E_2^{-1}$  Since the inverses of each elimination matrix were lower-triangular (with flipped signs), their product $L$ is also lower triangular:

In [33]:
L = inv(E1)*inv(E2)

3×3 Array{Float64,2}:
 1.0   0.0  0.0
 1.0   1.0  0.0
 3.0  -1.0  1.0

As mentioned above, this is the same as the inverse of $E = E_2 E_1$:

In [34]:
inv(E2*E1)

3×3 Array{Float64,2}:
 1.0   0.0  0.0
 1.0   1.0  0.0
 3.0  -1.0  1.0

The final result, therefore, is that Gaussian elimination (without row swaps) can be viewed as a *factorization* of the original matrix $A$
$$
A = LU
$$
into a **product of lower- and upper-triangular matrices**.  (Furthermore, although we didn't comment on this above, $L$ is always 1 along its diagonal.)  This factorization is called the [LU factorization](https://en.wikipedia.org/wiki/LU_decomposition) of $A$.  (It's why we used the `lu` function in Julia above.)  When a computer performs Gaussian elimination, what it computes are the $L$ and $U$ factors.

What this accomplishes is to break a complicated matrix $A$ into **much simpler pieces** $L$ and $U$.  It may not seem at first that $L$ and $U$ are *that* much simpler than $A$, but they are: lots of operations that are very difficult with $A$, like solving equations or computing the determinant, become *easy* once you known $L$ and $U$.