---
title: Matrices and Vectors
subject:  Linear Algebraic Systems
subtitle: The building blocks of linear algebra
short_title: Matrices and Vectors
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: matrices, vectors, matrix arithmetic, matrix multiplication
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

## Reading
Material related to this page, as well as additional exercises, can be found in ALA Ch. 1.2, LAA Ch 2.1, and ILA Ch. 2.4.  These notes are mostly based on ALA Ch 1.2 and LAA Ch 2.1.

## Learning Objectives

By the end of this page, you should know:
- what matrices and vectors are
- what arithmetic operations are allowed when working with matrices and vectors
- how to perform arithmetic operations on matrices and vectors
- how to represent linear system of equations using matrices and vectors

## Matrices and Vectors
A _matrix_ is a rectangular array of numbers.  For example
\begin{equation}
\begin{bmatrix} 1 & 0 & 3 \\ -2 & 4 & 1 \end{bmatrix}, \quad \begin{bmatrix} \pi & 0 \\ e & \frac{1}{2} \\ -1 & 0.83 \\ \sqrt{5} & -\frac{4}{7} \end{bmatrix}, \quad \begin{bmatrix} 0 & 0 & 0 \end{bmatrix}
\end{equation}
are all examples of matrices.  We use the notation
\begin{equation}
\label{matrix}
A = 
\begin{bmatrix} 
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{bmatrix}
\end{equation}
for a generic matrix $A$ of size $m \times n$ (read "$m$ by $n$"), where $m$ denotes the number of _rows_ in $A$ and $n$ denotes the number of _columns_[^cols].  Therefore, the preceding examples of matrices have respective sizes $2 \times 3$, $4 \times 2$, and $1 \times 3$.  A matrix is _square_ if $m=n$, i.e., it has the same number of rows as columns.  A _column vector_ is an $m \times 1$ matrix, while a _row vector_ is a $1 \times n$ matrix.  While these might seem like they are the same thing, they very much are not!  Column vectors end up playing a much more important role in our story, and so whenever we just say "vector" we will always mean a column vector.  A $1 \times 1$ matrix, which has a single entry, is both a column and row vector, and as we'll see later, behaves like an ordinary _scalar_ number.

The number that lies in the $i$th row and $j$th column of $A$ is called the $(i,j)$ _entry_ of $A$, and is denoted by $a_{ij}$.  The row index always appears first and the column index second.  Each column of $A$ is a $m \times 1$ vector, which we denote by $\vv a_1, \dots \vv a_n$. It will often be convenient to write a matrix in terms of its columns:
```{math}
:label: colmat
A = \begin{bmatrix} \vv a_1 & \vv a_2 & \cdots & \vv a_n \end{bmatrix}
```
```{note}
We will consistently use bold face lower case letters to denote vectors, and ordinary capital letters to denote matrices.
```
[^cols]: It is not a coincidence that $n$ is also the letter that we used for the number of unknowns in a [linear equation](./021-linsys-gauss.ipynb#lin-eq)! 


[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=022-linsys-matvec.ipynb)

In [None]:
# Constructing matrices and vectors

import numpy as np

# a matrix A of size 2x3
A = np.array([[1, 2, 3],
             [4, 5, 6]])
# a matrix b of size 1x4 (row-vector)
b = np.array([[0, -1, 1, 3]])
# a matrix c of size 3x1 (column-vector)
c = np.array([[0],
              [1],
              [2]])

## Matrix Arithmetic

Matrix arithmetic involves three basic operations: _matrix addition_, _scalar multiplication_, and _matrix multiplication_. 

### Matrix Addition
First we define _addition_ of matrices.  You are allowed to add two matrices only if they are of the _same size_, and matrix addition is performed entry-wise.  For example
$$
\bm 1 & 2 \\ -1 & 0\em + \bm 3 & -5 \\ 2 & 1 \em = \bm 4 & -3 \\ 1 & 1 \em.
$$
More generally, if $A$ and $B$ are $m \times n$ matrices, then their sum $C = A+ B$ is the $m \times n$ matrix whose entries are given by $c_{ij} = a_{ij} + b_{ij}$ for $i=1,\dots,m$ and $j=1,\dots,n$.  When defined, matrix addition behaves just like ordinary addition.  It is commutative, so $A + B = B + A$, and associative, so $A + (B + C) = (A+ B) + C = A + B + C$.

````{exercise}  Matrix addition
:label: matrix-addition-ex1

For (a)-(e), we are given some matrices. If they can be added together, then find their sum. Otherwise, indicate that they can't be added.

a. $\begin{bmatrix} 1 & 2 \\ -1 & 0 \end{bmatrix}$ and $\begin{bmatrix} 3 & -5 \\ 2 & 1 \end{bmatrix}$

b. $\begin{bmatrix} 5 \end{bmatrix}$ and $\begin{bmatrix} -3 \end{bmatrix}$

c. $\begin{bmatrix} 1&2\\3&4\\5&6 \end{bmatrix}$ and $\begin{bmatrix} 2&4\\6&8\\10&12 \end{bmatrix}$

d. $\begin{bmatrix} 1 & 2 \end{bmatrix}$ and $\begin{bmatrix} 1\\3 \end{bmatrix}$

e. $\begin{bmatrix} 1\\2 \end{bmatrix}$ and $\begin{bmatrix} 1\\3 \end{bmatrix}$ and $\begin{bmatrix} 2\\5 \end{bmatrix}$

```{solution} matrix-addition-ex1
:class: dropdown

a. $\begin{bmatrix} 1 & 2 \\ -1 & 0 \end{bmatrix} + \begin{bmatrix} 3 & -5 \\ 2 & 1 \end{bmatrix} = \bm 1+3&2-5\\-1+2&0+1 \em = \bm 4&-3\\1&1 \em$

b. $\begin{bmatrix} 5 \end{bmatrix} + \begin{bmatrix} -3 \end{bmatrix} = \bm 5 -3 \em = \bm -2 \em$

c. $\begin{bmatrix} 1&2\\3&4\\5&6 \end{bmatrix} + \begin{bmatrix} 2&4\\6&8\\10&12 \end{bmatrix} = \bm 1 + 2 & 2 + 4\\ 3 + 6 & 4 + 8\\5 + 10& 6 + 12 \em = \bm 3&6\\9&12\\15&18 \em$

d. They can't be added. The left matrix is a 1-by-2 matrix, whereas the right matrix is a 2-by-1 matrix, meaning they have different dimensions. 

e. $\begin{bmatrix} 1\\2 \end{bmatrix} + \begin{bmatrix} 1\\3 \end{bmatrix} + \begin{bmatrix} 2\\5 \end{bmatrix} = \bm 1 + 1 + 2 \\ 2 + 3 + 5 \em = \bm 4 \\ 10 \em$

```
````

In [None]:
## Matrix addition

B = np.array([[2, 4, 6],
             [8, 10, 12]])
add = A + B # adding two matrices
# this is a 1-D array of size 3 in python, which is different than the column-vector c created above
vec = np.array([-1, -2, 3]) 
# notice the difference how 1-D arrays add to 2-D arrays in python. # Try A+c and observe what happens
add_vec = A + vec 
add_vec

### Scalar multiplication
A _scalar_ is a fancy name for an ordinary number.  For now, we restrict ourselves to scalars, vectors, and matrices with _real_ entries, but we will eventually extend these ideas to complex numbers and matrices with complex entries.  Although technically not the same thing, we will treat the $1 \times 1$ matrix $[c]$ and the scalar $c \in \R$ as a scalar,[^notation] that is to say as an ordinary number, so we will drop the brackets. _Scalar multiplication_ takes a scalar $c$ and an $m \times n$ matrix $A$ an computes the $m \times n$ matrix $B = cA$ by multiplying each entry of $A$ by $c$.  For example:
$$
c = 3, \quad A = \bm 1 & 2 \\ -1 & 0 \em, \quad cA = 3\bm 1 & 2 \\ -1 & 0 \em=\bm 3 & 6 \\ -3 & 0\em.
$$
In general, if $B = cA$, then $b_{ij}=ca_{ij}$ for each entry $i=1,\dots,m$ and $j=1,\dots,n$.

[^notation]: Remember that we write $x \in S$ to mean that the element $x$ lives in the set $S$.  In this example, $c \in \R$ means that the element $c$ lives in the real line $\R$, and hence $c$ is a scalar.

````{exercise}  Scalar multiplication
:label: scalar-multiplication-ex1

For (a)-(e), compute the scalar-matrix product.

a. $5\bm 1 & 2 \\ 3 & 4\em$

b. $0\bm 1 & 1 \\ 2 & 3 \em$

c. $\frac 1 2 \bm 2 & 4 \\ 6 & 5 \em$

d. $\sqrt 2 \bm 1 & \sqrt 2 \\ 0 & \pi \em$

e. $i \bm 1 & i \\ -i & -1 \em$ where $i = \sqrt{-1}$

```{solution} scalar-multiplication-ex1
:class: dropdown

a. $5\bm 1 & 2 \\ 3 & 4\em = \bm 5\cdot 1 & 5\cdot 2 \\ 5\cdot 3 & 5\cdot 4\em = \bm 5 & 10 \\ 15 & 20\em$

b. $0\bm 1 & 1 \\ 2 & 3 \em = \bm 0\cdot 1 & 0\cdot 1 \\ 0\cdot 2 & 0\cdot 3 \em = \bm 0 & 0\\ 0 & 0\em$

c. $\frac 1 2 \bm 2 & 4 \\ 6 & 5 \em = \bm 2/2 & 4/2 \\ 6/2 & 5/2 \em = \bm 1 & 2 \\ 3 & 5/2 \em$

d. $\sqrt 2 \bm 1 & \sqrt 2 \\ 0 & \pi \em$

e. $i \bm 1 & i \\ -i & -1 \em = \bm i\cdot 1 & i\cdot i \\i \cdot -1 & i \cdot -i \em= \bm i & -1 \\1 & -i \em$

```
````

In [None]:
# scalar multiplication
scalar = 3
scalar_mult = scalar*A
scalar_mult
# Try vec*A and notice the difference


### Summary
Using the definitions and properties above, it is not too hard, albeit a bit tedious, to show that the usual rules of algebra apply to sums and scalar multiples of matrices, as the following theorem shows.
```{prf:theorem}
:label: matalg-thm
Let $A$, $B$, and $C$ be matrices of the same size, say $m \times n$, and let $r$ and $s$ be scalars.  Then\
a. $A + B = B + A$\
b. $(A+B) + C = A + (B + C)$\
c. $A + 0 = A$\
d. $r(A+B) = rA + rB$\
e. $(r+s)A = rA + sA$\
f. $r(sA)=(rs)A$
```

## Matrix-vector multiplication
A fundamental idea in linear algebra is to define the product of a matrix $A$ and a vector $\vv x$ as a _linear combination_ of the columns of $A$ weighted by the coefficients in the vector $\vv x$.[^matmul]  We use this as an opportunity to remind you of the definition of a linear combination of vectors, and then use it to define matrix-vector multiplication.  The idea of a linear combination is fundamental, and we will revisit it several times over the course of the semester from many different perspectives.

[^matmul]: You may have seen other definitions of matrix-vector multiplication before, for example, based on the sum of products across rows.  This definition is equivalent to the one introduced in this section, but we will find it less convenient for our purposes.

```{prf:definition} Linear Combinations of Vectors
:label: lincombo
Given a collection of $n \times 1$ columng vectors $\vv v_1, \vv v_2, \dots, \vv v_p$ and a collection of scalars $c_1,c_2,\dots, c_p$, the vector $\vv y$ defined by
$$
\vv y = c_1 \vv v_1 + c_2 \vv v_2 + \cdots + c_p \vv v_p
$$
is called a _linear combination_ of $\vv v_1, \vv v_2, \dots, \vv v_p$ with _weights_ $c_1,c_2,\dots, c_p$.
```

The weights $c_i$ in a linear combination can be any real numbers, including zero.  For example, some linear combinations of $\vv v_1$ and $\vv v_2$ are
$$
\pi \vv v_1 + v_2, \quad \frac{1}{2}\vv v_1 (=\frac{1}{2}\vv v_1 + 0 \vv v_2), \quad \vv 0(=0\vv v_1 + 0 \vv v_2).
$$

````{exercise}  Linear Combinations of Vectors 
:label: linear-combinations-ex1 

For (a)-(c), simplify the linear combination of vectors. 

a. $4\bm 1 \\ 2\em - 1 \bm 3 \\ 3\em + 2 \bm 0 \\ 2\em$ 

b. $0\bm 1238 \\ 34643 \em + 1 \bm  3 \\ 4 \em$ 

c. $\pi \bm 2 \\ \sqrt 2 \em + \sqrt 3 \bm \sqrt 2 \\ 1 \em$ 

```{solution} linear-combinations-ex1 
:class: dropdown 

a. $4\bm 1 \\ 2\em - 1 \bm 3 \\ 3\em + 2 \bm 0 \\ 2\em = \bm 4(1) - 1(3) + 2(0) \\ 4(2) - 1(3) + 2(2) \em = \bm 1 \\ 9\em$ 

b. $0\bm 1238 \\ 34643 \em + 1 \bm  3 \\ 4 \em = \bm 0(1238) + 1(3) \\ 0(34643) + 1(4) \em = \bm 3 \\ 4 \em$ 

c. $\pi \bm 2 \\ \sqrt 2 \em + \sqrt 3 \bm \sqrt 2 \\ 1 \em = \bm 2\pi + \sqrt 3\sqrt 2 \\ \pi \sqrt 2 + \sqrt 3\em = \bm 2\pi + \sqrt 6 \\ \pi \sqrt 2 + \sqrt 3 \em$ 
``` 
````

With [](#lincombo) in hand, we introduce (a maybe new to you definition of) matrix-vector multiplication as a linear combination of the columns of a matrix.
````{prf:definition} Matrix-Vector Multiplication
If $A$ is an $m \times n$ matrix, with columns $\vv a_1, \dots, \vv a_n$, and if $\vv x$ is a $n \times 1$ vector, then the _product of $A$ and $\vv x$_, denoted $A\vv x$, is the [linear combination](#lincombo) of the columns of $A$ using the corresponding entries in $\vv x$ as weights:
```{math}
:label: mat-vec
A\vv x = \bm \vv a_1 & \vv a_2 & \cdots & \vv a_n\em\bm x_1 \\ x_2 \\ \vdots \\ x_n\em = x_1 \vv a_1 + x_2 \vv a_2 + \cdots \vv x_n a_n.
```
Note that $A\vv x$ is defined only if the number of columns of $A$ equals the number of entries in $\vv x$.
````

````{exercise}  Matrix-Vector Multiplication
:label: matrix-vector-multiplication-ex1 

For (a)-(d), simplify the matrix-vector products if they exist; otherwise, indicate that the product doesn't exist.

a. $\bm 1 & 1 \\ 1 & 0 \em \bm 13\\ 8 \em$

b. $\bm 2 & 4 & 1 \\ 0 & 3 & -1 \\ 5 & -2 & 2 \em \bm 1\\2\\3 \em$

c. $\bm 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ 7 & 8 \em \bm 2\\ 1 \em$

d. $\bm 2 & 3 \\ 4 & 5 \\ 6 & 7 \em \bm 1 \\ 2 \\ 0 \em$

```{solution} matrix-vector-multiplication-ex1 
:class: dropdown 

a. $\bm 1 & 1 \\ 1 & 0 \em \bm 13\\ 8 \em = 13\bm 1 \\ 1 \em + 8\bm 1 \\ 0 \em = \bm 1(13) + 1(8) \\ 1(13) + 0(8)\em = \bm 21 \\ 13\em$

b. $\bm 2 & 4 & 1 \\ 0 & 3 & -1 \\ 5 & -2 & 2 \em \bm 1\\2\\3 \em = 1 \bm 2\\0\\5\em + 2\bm 4\\3\\-2\em + 3\bm 1\\-1\\2\em = \bm 13 \\ 3\\7 \em$

c. $\bm 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ 7 & 8 \em \bm 2\\ 1 \em = 2 \bm 1\\ 3\\ 5\\ 7\em + 1 \bm 2\\ 4\\ 6\\ 8\em = \bm 2(1) + 1(2) \\ 2(3) + 1(4) \\ 2(5) + 1(6) \\ 2(7) + 1(8) \em = \bm 4\\ 10\\ 16\\ 22 \em$

d. Doesn't exist, because the matrix has 2 columns, whereas the vector has 3 entries.

``` 
````

In [None]:
# matrix vector multiplication
product = A @ c # multipying matrix A with column-vector c
product
# Try A @ vec and notice the difference in the size of the result

## Linear Systems in Matrix-Vector Notation
A general linear system of $m$ equations in $n$ unknowns takes the form
\begin{equation}
\label{gen-linsys}
\begin{array}{cccl}
a_{11} x_1 + a_{12} x_2 +& \cdots &+ a_{1n} x_n  = & b_1,\\
a_{21} x_1 + a_{22} x_2 +& \cdots &+ a_{2n} x_n  = & b_2,\\
\vdots & \vdots && \vdots \\
a_{m1} x_1 + a_{m2} x_2 +& \cdots &+ a_{mn} x_n  = & b_m,\\
\end{array}
\end{equation}
which we rewrite compactly as
```{math}
:label: matvec-eq
A \vv x = \vv b.
```
Equation [](#matvec-eq) is composed of three basic ingredients: the $m \times n$ _coefficient matrix_ $A$, with entries $a_{ij}$ as in [](#matrix), the column vector $\vv x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n\end{bmatrix}$ containing the _unknowns_ or _variables_ and the column vector $\vv b = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{bmatrix}$ containing the _right-hand sides_.  As you can see, it is a bit unwieldy to write column vectors inline, and so we will often equivalently write them as $\vv x = (x_1, x_2, \cdots, x_n)$ and $\vv b = (b_1, b_2, \cdots, b_m)$ instead.

```{warning}
Both $(1,2,3)$ and $\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$ represent the same $3 \times 1$ column vector.  However, $\begin{bmatrix} 1 & 2 & 3 \end{bmatrix}$ is a _different_ $1 \times 3$ row vector!  $(1,2,3)$ is understood to be a column vector because of the round brackets and the commas, whereas $\begin{bmatrix} 1 & 2 & 3 \end{bmatrix}$ is understood to be a row vector because of the square brackets and no commas.  It's a bit tricky, but you'll get used to it in no time!
```

Revisiting linear system [](./021-linsys-gauss.ipynb#simple-linsys00), we see that the coefficient matrix $A$, the unknown vector $\vv x$, and the right hand side vector $\vv b$ can be read off as
$$
A = \begin{bmatrix} 1 & 2 & 1 \\
2 & 6 & 1 \\
1 & 1 & 4 \end{bmatrix}, \quad \vv x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}, \quad \vv b = \begin{bmatrix} 2 \\ 7 \\ 3 \end{bmatrix}.
$$

If a variable doesn't appear in an equation, then the corresponding matrix entry is 0.


````{exercise}  Writing linear systems in matrix-vector notation
:label: matrix-vector-notation-ex1 

Write the following system of linear equations in matrix-vector notation:

\begin{align*}
2x + 3y - z + 4w &= 10 \\
3x - 2y + 5z - w &= 5 \\
x + 4y - 3z + 2w &= -3 \\
\end{align*}

```{solution} matrix-vector-notation-ex1 
:class: dropdown 

The corresponding linear system is:

\begin{align*}
    \bm 2 & 3 & -1 & 4 \\3 & -2 & 5 & -1 \\1 & 4 & -3 & 2 \\ \em\bm x \\y \\z \\w \\\em =\bm 10 \\5 \\-3 \\\em
\end{align*}

``` 
````

### Consequences on Systems of Linear Equations
Our definition of $A\vv x$ leads directly to the following very useful fact.
```{important}
The equation $A\vv x = \vv b$ has a solution if and only if $\vv b$ is a linear combination of the colums of $A$.
```
When we introduce vector spaces in the next chapter, we will revisit this fact and see that it gives a very clear picture of the _geometry_ of systems of linear equations, and will allows us to answer questions like "Given a matrix $A$, for what right hand sides $\vv b$ does the equation $A\vv x=\vv b$ have a solution?" and "When does $A\vv x = \vv b$ have no solution? infinite solutions?"

### Properties of the Matrix-Vector Product $A\vv x$
The facts in the next theorem are important, and will be used often throughout this semester.  The proof is straightforward, and relies on the definition of $A\vv x$ and how we defined matrix addition and scalar multiplication above (remember that vectors are a special kind of matrix!).

````{prf:theorem} Properties of $A \vv x$
:label: matvec-thm
(matvec-linearity)=
If $A$ is an $m \times n$ matrix, $\vv u$ and $\vv v$ are $n \times 1$ vectors, and $c$ is a scalar, then:\
a. $A(\vv u + \vv v) = A\vv u + A\vv v$\
b. $A(c \vv u) = c(A \vv u)$
```{prf:proof} Proof of [](#matvec-thm)
:class: dropdown
To keep things simple, we take $n=3$ so that $A=\bm \vv a_1 & \vv a_2 & \vv a_3\em$, and $\vv u$ and $\vv v$ are $3 \times 1$ vectors.  The proof of the general case is similar.  For $i=1,2,3$ let $u_i$ and $v_i$ be the $i$th entries in $\vv u$ and $\vv v$, respectively, i.e., $\vv u = (u_1, u_2, u_3)$ and $\vv v = (v_1, v_2, v_3)$.  

To prove statement (a), we compute $A(\vv u + \vv v)$ as a linear combination of the columns of $A$ using the entries in $\vv u + \vv v$ as weights:
\begin{eqnarray}
A(\vv u + \vv v)&=\bm \vv a_1 & \vv a_2 & \vv a_3\em\bm u_1 + v_1 \\ u_2 + v_2 \\ u_3 + v_3 \em \\
& = (u_1 + v_1)\vv a_1 + (u_2 + v_2)\vv a_2 + (u_3 + v_3)\vv a_3 \\
& = (u_1\vv a_1 + u_2 \vv a_2 + u_3\vv a_3) + (v_1\vv a_1 + v_2 \vv a_2 + v3\vv a_3) \\
& = A\vv u + A\vv v.
\end{eqnarray}
To go from line 1 to 2, we used the definition of matrix-vector multiplication, taking a linear combination of the columns of $A$ weighted by the entries $(u_i + v_i)$ of $\vv u + \vv v$.  From line 2 to 3, we used the property that scalar multiplication is distributive to split apart the terms depending the $u_i$ from those depending on the $v_i$.  Finally, from line 3 to 4, we used the definition of matrix-vector multplication again to go from a linear combination of the columns of $A$ to matrix-vector products.

to prove statement (b), we compute $A(c\vv u)$ as a linear combination of the columns of $A$ using the entries in $c\vv u$ as weights:
\begin{eqnarray}
A(c\vv u)&=\bm \vv a_1 & \vv a_2 & \vv a_3\em\bm cu_1 \\ cu_2 \\ cu_3 \em \\
& = c(u_1\vv a_1) + c(u_2\vv a_2) + c(u_3\vv a_3) \\
& = c(u_1 \vv a_1 + u_2 \vv a_2 + u_3\vv a_3) \\
& = c(A\vv u).
\end{eqnarray}
To go from line 1 to 2, we used the definition of matrix-vector multiplication, taking a linear combination of the columns of $A$ weighted by the entries $cu_i$ of $c\vv u$.  From line 2 to 3, we used the property that scalar multiplication is distributive to factor out a $c$ from every term.  Finally, from line 3 to 4, we used the definition of matrix-vector multplication again to go from a linear combination of the columns of $A$ to matrix-vector products.
```
````

#### A Geometric Perspective
This perspective on matrix-vector products gives us a very geometric interpretation of what is going on.  Given a $n \times 1$ vector $\vv x$, an $m \times n$ matrix $A$ maps $\vv x$ to a $m \times 1$ vector $A\vv x$, i.e., the map[^mapsto] $\vv x \mapsto A\vv x$ is function that transforms $n \times 1$ vectors into $m \times 1$ vectors.  We will revisit this idea in much greater detail in the next chapter on _vector spaces_, but this idea will help us gain some intuition about some of the rules of matrix-matrix multiplication that we will see shortly.

[^mapsto]: The symbol $\mapsto$ means ``maps to.''  So when we see $\vv x \mapsto A\vv x$, this is mathematical notation for a function that takes in $\vv x$ and maps it to $A\vv x$.

## Matrix multiplication

When a matrix $B$ multiplies a vector $\vv x$, it transforms $\vv x$ into the vector $B \vv x$.  If this vector is then multiplied in turn by a matrix $A$, the resulting vector is $A(B\vv x)$.  We can think of the vector $A(B\vv x)$ as being produced from $\vv x$ by a _composition_ of mappings: first we apply the transformation $\vv x \mapsto B \vv x$ to the vector $\vv x$, and then we apply the transformation $\vv y \mapsto A \vv y$ to the vector $\vv y = B \vv x$.  Our goal is to represent this composite mapping as multiplication by a single matrix, denoted by $AB$, so that
\begin{equation}
\label{matmul}
A(B\vv x) = (AB)\vv x.
\end{equation}
This idea is illustrated in [](#matmul-fig)

```{figure} ../figures/02-matmul-AB.png
:label: matmul-fig
:alt: Multiplication by $B$ and then $A$ and multiplication by $AB$.
:align: center

Multiplication by $B$ and then $A$ and multiplication by $AB$. Figure 3 from Ch 2.1 of ALA.
```
Fortunately, we already have all of the tools we need to do this!  Suppose that $A$ is $m \times n$, $B$ is $n \times p$, and $\vv x$ is a $p \times 1$ vector.  Denote the columns of $B$ by $\vv b_1,\dots,\vv b_p$ and the entries in $\vv x$ by $x_1,\dots, x_p$.  Then
$$
B\vv x = x_1\vv b_1 + \cdots + x_p\vv b_p.
$$
Now by the [linearity of multiplication by $A$](#matvec-linearity), we have
\begin{eqnarray}
A(B\vv x) &= A(x_1\vv b_1) + \cdots + A(x_p\vv b_p) \\
& = x_1A\vv b_1 + \cdots + x_pA\vv b_p.
\end{eqnarray}
We see that the vector $A(B\vv x)$ is a linear combination of the vectors $A\vv b_1,\dots,A\vv b_p$ weighted by the entries $x_1,\dots,x_p$ of $\vv x$.  We can rewrite this in matrix-vector notation as
$$
A(B\vv x) = \bm A\vv b_1 & A\vv b_2 & \cdots & A\vv b_p\em \vv x.
$$
Therefore multiplying $\vv x$ by $\bm A\vv b_1 & A\vv b_2 & \cdots & A\vv b_p\em$ directly transforms $\vv x4 into $A(B\vv x)$: we found the matrix we were looking for!
````{prf:definition}
:label: matmul-def1

If A is an $m \times n$ matrix, and if $B$ is an $n \times p$ matrix with columns $\vv b_1,\dots,\vv b_p$, then the matrix product $AB$ is then the $m \times p$ matrix whose columns are $A\vv b_1, A\vv b_2,\dots, A\vv b_p$, that is,
```{math}
:label: matmul-def
AB = A\bm \vv b_1 & \vv b_2 & \cdots & \vv b_p\em = \bm A\vv b_1 & A\vv b_2 & \cdots & A\vv b_p\em.
```

Note that the product $AB$ is only defined if the number of rows of $B$ is the same as the number of columns of $A$, as otherwise the matrix-vector product $A\vv b_i$ is not defined.
````
This definition makes equation [](#matmul) true for all possible $p \times 1$ vectors $\vv x$ and also helps us understand that _multiplication of matrices corresponds to a composition of transformations_.

:::{prf:example}  Matrix Multiplication, [](#matmul-def1)
:label: matmul-ex1

Let's compute the matrix product

\begin{align*}
\bm 1&2&3\\4&5&6\\7&8&9 \em \bm -1&0&1\\2&-2&3\\4&5&-3\em
\end{align*}

using [](#matmul-def1):

\begin{align*}
 &\bm 1&2&3\\4&5&6\\7&8&9 \em \bm -1&0&1\\2&-2&3\\4&5&-3\em \\
 &= \bm \bm 1&2&3\\4&5&6\\7&8&9 \em \bm -1\\2\\4\em & \bm 1&2&3\\4&5&6\\7&8&9 \em \bm 0\\-2\\5\em & \bm 1&2&3\\4&5&6\\7&8&9 \em \bm 1\\3\\-3\em \em\\
 &= \bm \bm 15 \\ 30 \\ 45 \em & \bm 11 \\ 20 \\ 29 \em & \bm -2 \\ -1 \\ 4 \em \em\\
 &= \bm 15 & 11 & -2 \\ 30 & 20 & 1 \\ 45 & 29 & 4 \em
\end{align*}

:::

In [None]:
## Matrix Matrix multiplication
C = np.array([[-1, -2],
              [3, 2],
              [1, -1]])
A_C = A @ C
D = np.array([[3, 4, 2],
              [5, 1, 2],
              [-1, 0, 3],
              [0, 0, 1]])
b_D = b @ D
print("A_C: ", A_C, "\nb_D: ", b_D)

### Row-by-Column and Column-by-Row Definition of Matrix Multiplication
Although we will work with the definition of matrix multiplication introduced above, it will sometimes be more convenient to work with alternative definitions, which we quickly introduce here.  We assume that you have seen these definitions before, and so just introduce them quickly, and leave verifying their equivalence as an exercise.

```{prf:definition} Row-by-Column Matrix Multiplication
:label: rc-matmul
Let $A$ be an $m \times n$ matrix, and $B$ an $n \times p$ matrix.  Let $A$ have rows $\vv r^A_i$ and $B$ have columns $\vv b_i$, i.e.,
$$
A = \bm \vv r^A_1 \\ \vv r^A_2 \\ \vdots \\ \vv r^A_m \em, \quad B = \bm \vv b_1 & \vv b_2 & \cdots \vv b_p\em.
$$
Then if we define $C=AB$ to be their $m \times p$ matrix product, we have that the elements $c_{ij}$ of $C$ are given by the product of the $i$th row of $A$ and the $j$th column of $B$:
$$
c_{ij}=\vv r^A_i \vv b_i, \quad i=1,\dots,m,\, j=1,\dots,p.
$$
```

As an example, this tells us an alternative way to compute the matrix-vector product $A\vv x$.  Again, assuming that the rows of $A$ are $\vv r^A_i$, we have that the $i$th element of the product $A\vv x$ is given by the row-vector columnt-vector product $(A\vv x)= \vv r^A_i \vv x$.

:::{prf:example}  Matrix Multiplication, [](#rc-matmul)
:label: rc-matmul-example

Let's compute the matrix product

\begin{align*}
\bm 1&2&3\\4&5&6\\7&8&9 \em \bm -1&0&1\\2&-2&3\\4&5&-3\em
\end{align*}

using [](#rc-matmul) (this is the same product in [](#matmul-ex1)):

\begin{align*}
 &\bm 1&2&3\\4&5&6\\7&8&9 \em \bm -1&0&1\\2&-2&3\\4&5&-3\em \\
 &= \bm 
    1(-1)+ 2(2) +3(4)&1(0) +2(-2) +3(5)&1(1)+ 2(3)+ 3(-3)\\
    4(-1)+ 5(2)+ 6(4)&4(0) +5(-2) +6(5)&4(1) +5(3) +6(-3)\\
    7(-1) +8(2)+ 9(4)&7(0) +8(-2) +9(5)&7(1) +8(3)+ 9(-3)
 \em\\
  &= \bm 15 & 11 & -2 \\ 30 & 20 & 1 \\ 45 & 29 & 4 \em
\end{align*}

This is the same result we got in [](#matmul-ex1).

:::

```{prf:definition} Column-by-Row Matrix Multiplication
:label: cr-matmul
Let $A$ be an $m \times n$ matrix, and $B$ an $n \times p$ matrix.  Let $A$ have columns $\vv a_i$ and $B$ have rows $\vv r^B_i$, i.e.,
$$
A = \bm \vv a_1 & \vv a_2 & \cdots \vv a_p\em, \quad B = \bm \vv r^B_1 \\ \vv r^B_2 \\ \vdots \\ \vv r^B_n \em.
$$
Then we can compute their $m \times p$ matrix product $C=AB$ by adding together the products of the columns of $A$ with the rows of $B$:
$$
C=\vv a_1 \vv r^B_1 + \vv a_2 \vv r^b_2 + \cdots + \vv a_n \vv r^B_n.
$$
```

Try to calculate the product in [](#matmul-ex1) and [](#rc-matmul-example) using [](#cr-matmul), and confirm that you get the same result! 

```{note}
While [](#rc-matmul) is probably most familiar to you, we'll see that depending on the setting, it is sometimes more convenient to work with one definition of matrix-matrix multiplication rather than another.  For example, [](#cr-matmul) is kind of annoying to deal with when computing matrix-matrix products by hand (it involves a lot of writing!).  But that's ok, because while you will learn how to compute things by hand for small examples, our goal in this class is to get you solving big problems using computers, and it turns out, computers are really good at dealing with something like [](#cr-matmul).
```

### Properties of Matrix Multiplication
We use $I_m$ to denote the $m \times m$ identity matrix.
```{prf:definition}
:label: identity
The _identity matrix $I_m$_ is the $m \times m$ matrix
$$
I_m = \bm 
1 & 0 & 0 & \cdots & 0 & 0 \\
0 & 1 & 0 & \cdots & 0 & 0 \\
0 & 0 & 1 & \cdots & 0 & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 1 & 0 \\
0 & 0 & 0 & \cdots & 0 & 1
\em.
$$
The entries along the _main diagonal_, which runs from top left to bottom right, are equal to 1, and all other entries are equal to 0.  The identify matrix satisfies $I_m \vv x = \vv x$ for all $m \times 1$ vectors $\vv x$.
```

The following theorem lists standard properties of matrix multiplication. 
```{prf:theorem} Matrix Multiplication Properties
:label: matmul-thm
Let $A$ be an $m \times n$, and let $B$ and $C$ have sizes for which the indicated sums and products are defined.  Then\
\begin{equation}
\begin{array}{ll}
\text{a. } A(BC)=(AB)C & \text{(associative law of multiplication)}\\
\text{b. } A(B+C)=AB + AC &\text{(left distributive law)}\\
\text{c. } (B+C)A=BA + CA &\text{(right distributive law)}\\
\text{d. } r(AB)=A(rB) &\text{for any scalar $r$} \\
\text{e. } I_mA=A=AI_n &\text{(identity for matrix multiplication)} \\
\end{array}
\end{equation}
```
We omit the proof of this theorem, but comment on some of its implications.  The associative and distributive laws of [](#matalg-thm) and [](#matmul-thm) tell us that pairs of parentheses in matrix expressions can be inserted and deleted in the same way as in algebra for ordinary numbers.  This is why we will usually write $ABC$ for the product that can be computed as either $A(BC)$ or $(AB)C$: it does not matter how we group the matrices when computing the product, so long as the left-to-right order of the matrices is preserved.

```{warning} Matrix Multiplication is *NOT* Commutative
The left-to-right order in matrix products is critical because $AB$ and $BA$ are usually not the same.  In fact, unless $A$ and $B$ are both square matrices, only one of $AB$ or $BA$ will even be defined.  Given what we've seen so far, this shouldn't be surprising: the columns of $AB$ are linear combinations of the columns of $A$, whereas the columns of $BA$ are constructed from the columns of $B$.  We emphasize the position of the factors in the product $AB$ by saying $A$ is _right-multiplied_ by $B$ or that $B$ is _left-multiplied_ by $A$.  If $AB=BA$, we say that $A$ and $B$ _commute_ with each other.
```

```{prf:example} TODO: mat-mul commute/don't commute
write me
```

```{warning} Cancellation Laws *DO NOT* hold
If $AB = AC$, then it is _not_ true in genereal that $B=C$
```
```{warning} $AB=0$
If the product $AB$ is the zero matrix, you _cannot_ conclude in general that either $A=0$ or $B=0$.
```
We will get a better understanding for why these properties that hold for scalar multiplication breakdown for matrix multiplication later, but for now, it is still important that you are aware of these quirks.

## Block Matrices

You may not have known it at the time, but we've already been working with examples of block matrices.  For example, when we wrote a matrix $A$ in terms of its columns $\vv a_i$ as $A = \bm \vv a_1 & \vv a_2 & \cdots & \vv a_n \em$, what we are really saying is that we are going to _partition_ the matrix $A$ into _blocks_ (or sub-matrices) $\vv a_i$ such that I can build $A$ by concatenating or stacking the column vectors $\vv a_i$ together appropriately.  It turns out, we can generalize this idea: the blocks can be arbitrary sized matrices, and we can stack them together in any way that we'd like as long as dimensions match up.  We'll work through a couple of examples that give you the basic idea of what we mean.

### Building Block Matrices
Suppose we have a scalar $c$,[^myref] a $2 \times 1$ column vector $\vv c$, a $1 \times 3$ row vector $\vv r$, a $2 \times 2$ matrix $A$, and a $2 \times 3$ matrix $B$.  What are some _block matrices_ we can build out of these pieces?
- **Block vectors:** We can vertically stack column vectors with other column vectors, and horizontally stack row vectors with other row vectors. For example, all of the following are perfectly legitimate block vectors:
  $$ \bm \vv c \\ c \\ \vv c\em, \quad \bm \vv r & \vv r & c & \vv r\em.$$
- **Block matrices:** We can stack elements both horizontally and vertically as long as dimensions match up.  In the following examples, we include element dimensions as subscripts to emphasize dimension compatibility:
  $$ \bm \begin{array}{c} A_{2\times2} & \vv c_{2 \times 1}\end{array} \\ \vv r_{3\times 1}\em, \quad \bm A_{2\times 2} & \vv c_{2\times 1} & B_{2\times 3} & A_{2\times 2}\em. $$ The first example is a little tricky, because the top _block row_ has two elements $\bm A_{2\times2} & \vv c_{2 \times 1}\em$, which is a $2 \times 3$ matrix, whereas the second _block row_ $\vv r$ has one element and is a $1 \times 3$ matrix.  But, this is still ok, because the dimensions match, and we end up with a $3 \times 3$ matrix.
- **Stacks on stacks:** You can put block matrices as blocks within other matrices, so long as dimensions agree.  For example, if we define $D = \bm A_{2\times2} & \vv c_{2 \times 1}\em$ we can write $$ \bm D_{2\times 3} \\ \vv r_{1\times 3}\em,$$ which is a much cleaner and easier expression to read.
```{tip}
When stacking elements horizontally, they all need to have the same number of rows.  For example, if we look at the first dimension index of $A_{2\times 2}$, $\vv c_{2\times 1}$, and $B_{2\times 3}$ we quickly see that they all have 2 rows, and so we are free to stack them horizontally in any way that we want.  Similarly, when stacking elements vertically, we need to make sure that they have the same that they all have the same number of columns.  We can check this quickly by looking at the second dimension index.
```


```{prf:example} TODO: Block Matrices and Vectors
Give the above things numerical values, and write out what they actually end up being.
```
[^myref]: Remember that a scalar $c$ is both a row vector and a column vector!


In [None]:
## Building block matrices.
# Remember: A is of size 2x3
A_1 = np.array([[-1, -1, 0],
                [-3, 4, 2]])
A_2 = np.eye(4) # identity matrix of size 4x4
A_3 = np.zeros((4, 2)) # a matrix of size 4x2 with all entries zero
A_block = np.block([[A, A_1],
                    [A_2, A_3]])
A_3_1 = np.zeros((3, 2))
A_4 = np.array([[-3, 3]])
A_block_1 = np.block([[A, A_1],
                    [A_2, np.block([[A_3_1], [A_4]])]])
print("A_block: \n", A_block, "\nA_block_1: \n", A_block_1)

In [None]:
## Another method of building block matrices
# hstack: horizontal stack, vstack: vertical stack
A_block_alter = np.vstack((np.hstack((A, A_1)), np.hstack((A_2, A_3))))
A_block_alter

### Block Matrix Multiplication

Suppose I am given two block matrices,
$$
M_1 = \bm A & B \\ C & D\em, \quad M_2 = \bm E & F \\ G & H \em,
$$
and to make things simple, we will assume that all of the _block elements_ $A,B,C,D,E,F,G$ are $2 \times 2$.  How should we compute the _block matrix product_ $M_1 M_2$?

The good news is that all of the rules for matrix-matrix multiplication we saw above extend to the block matrix setting, with the usual caveats that dimensions need to be compatible so that the various product terms exist.  For example, if we let $K_1$ and $\vv K_2$ be the _block columns_ of $M_2$, i.e.,
$$
 K_1 = \bm E \\ G\em, \quad K_2 = \bm F \\ H\em,
$$
then we can follow our [first definition of matrix-matrix multiplication](#matmul-def) equation and write
$$
M_1M_2 = \bm M_1 K_1 & M_1 K_2 \em.
$$
Now, if we turn our attention to computing $M_1 K_1$, we can again follow our definition of [matrix-vector multiplication](#mat-vec) to compute
$$
M_1 K_1 = \bm A & B \\ C & D\em\bm E \\ G\em = \bm A \\ C \em E + \bm B \\ D\em G = \bm AE + BG \\ CE + DG \em.
$$
In the same way we compute
$$
M_1 K_2 = \bm A & B \\ C & D\em\bm F \\ H\em = \bm A \\ C \em F + \bm B \\ D\em H = \bm AF + BH \\ CF + DH \em,
$$
to get
(blockmatmul)=
$$ M_1 M_2 = \bm M_1 K_1 & M_1 \vv K_2 \em = \bm AE + BG & AF + BH \\ CE + DG & CF + DH \em.$$

```{prf:example} TODO: Block Matrix Multiplication
Give the above things numerical values, and write out what they actually end up being.  Compare what happens when you do block matrix multiplication versus "normal" matrix multiplication.
```

```{warning}
Although block matrix multiplication follows the same "rules" as regular matrix multiplication, there are some things we need to be careful about.  First, dimensions of $A,B,C,D,E,F,G$ need to be such that all of the products we need to compute exist, i.e., the products $AE, BG, AF, BH, CE, DG, CF, DH$ need to exist.  Second, because in general $AB \neq BA$, i.e., matrix multiplication does not commute in general, the order matters.  For example, I cannot replace the $(1,1)$ block of $M_1 M_2$, which is given by $AE + BG$, with $EA + BG$!
```