# 18.06 pset 1 solutions

## Problem 1 (10 points)

In linear algebra, it is critically important to **think about the shapes** (size) of matrices and vectors, and whether operations make sense.   You can multiply $AB$ if $A$ is $m \times n$ ($m$ rows and $n$ columns) and $B$ is $n \times p$ — the "middle" dimensions need to match up.  You can also add matrices of *equal* sizes, or multiply them by scalars.  Multiplying $Ax$, an $m \times n$ matrix $A$ by an $n$-component vector, can be thought of as a special case of this rule if you think of $x$ as an $n \times 1$ "matrix".  On exams, it is common for people to panic and start writing down nonsense, and an easy way to catch this is to make sure that the operations you are writing down **have the correct shapes**.

**(a)**

If $A$ is a $3\times 4$ matrix, $B$ is $4\times 5$, $x$ is $4 \times 1$ (a 4-component column vector) and $r$ is $1 \times 3$ (a 3-component [row vector](https://en.wikipedia.org/wiki/Row_and_column_vectors)), **which of the following make sense** and (for those that make sense) **what is the shape of the result**?

1. $A^2 = AA$, $AB$, and/or $BA$?
2. $3x + A$ and/or $3x + x$?
3. $Ax$, $Bx$, $Ar$, $Br$, $xA$, $xB$, $rA$, and/or $rB$?
4. $xx$, $xr$, $rx$

**Check your answers** in Julia with some random matrices and vectors.  (Julia should give the expected shape for operations that make sense, or print an error for operations that don't.)

## Solution 

1. $A^2$ is the product of two $(3\times 4)$ matrices, which doesn't make sense because the middle dimensions (4 and 3) don't match. $AB$ is the product of $(3\times4)$ and $(4\times5)$ matrices, which makes sense because the middle dimensions (4) match, and so the result is a $3\times5$ matrix. $BA$ is the product of a $(4\times 5)$ and $(3\times4)$ matrix, which doesn't make sense because the middle dimensions (5 and 3) don't match.
2. $3x+A$ doesn't make sense. $3x$ is a $4\times1$ matrix, while $A$ is a $3\times 4$ matrix. They cannot be added. $3x+x$ makes sense. $3x$ and $x$ are both $4\times1$ matrices, so the result is a $4\times1$ matrix. ($3x+x=4x$).
3. $Ax$ makes sense. It is a $3\times1$ matrix (3 component column vector). $rA$ also makes sense. It is a $1\times4$ matrix (4 component row vector). $Bx, Ar, Br, xA, xB, rB$ do not make sense.
4. $xx$ and $rx$ do not make sense. However, $xr$ makes sense. It is a $4\times 3$ matrix.

Here are some of the results of these operations in Julia for some random matrices of the specified sizes.  Notice that the **operations that don't make sense produce an error message**, and the **other operations produce the expected shapes**.
    

In [1]:
A = rand(3,4);
B = rand(4,5);
x = rand(4,1);
r = rand(1,3); # 

In [2]:
A^2

LoadError: [91mDimensionMismatch("A has dimensions (3,4) but B has dimensions (3,4)")[39m

In [3]:
A*B

3×5 Array{Float64,2}:
 0.989144  0.913517  0.730817  1.06519   0.694219
 0.721819  0.673202  0.700648  0.887384  0.446857
 1.51989   1.05441   1.24618   0.927179  0.93389 

In [4]:
B*A

LoadError: [91mDimensionMismatch("A has dimensions (4,5) but B has dimensions (3,4)")[39m

In [5]:
3*x+A

LoadError: [91mDimensionMismatch("dimensions must match")[39m

In [6]:
3*x+x

4×1 Array{Float64,2}:
 3.04997 
 1.39693 
 0.349336
 1.66889 

In [7]:
A*x

3×1 Array{Float64,2}:
 0.691722
 0.633115
 0.57986 

In [8]:
B*x

LoadError: [91mDimensionMismatch("A has dimensions (4,5) but B has dimensions (4,1)")[39m

In [9]:
A*r

LoadError: [91mDimensionMismatch("A has dimensions (3,4) but B has dimensions (1,3)")[39m

In [10]:
B*r

LoadError: [91mDimensionMismatch("A has dimensions (4,5) but B has dimensions (1,3)")[39m

In [11]:
x*A

LoadError: [91mDimensionMismatch("A has dimensions (4,1) but B has dimensions (3,4)")[39m

In [12]:
x*B

LoadError: [91mDimensionMismatch("A has dimensions (4,1) but B has dimensions (4,5)")[39m

In [13]:
r*A

1×4 Array{Float64,2}:
 0.405276  0.69771  0.537122  0.376162

In [14]:
r*B

LoadError: [91mDimensionMismatch("A has dimensions (1,3) but B has dimensions (4,5)")[39m

In [15]:
x*x

LoadError: [91mDimensionMismatch("A has dimensions (4,1) but B has dimensions (4,1)")[39m

In [16]:
x*r

4×3 Array{Float64,2}:
 0.573176   0.187251   0.106616 
 0.262524   0.085764   0.0488321
 0.0656502  0.0214473  0.0122116
 0.313632   0.10246    0.0583387

In [17]:
r*x

LoadError: [91mDimensionMismatch("A has dimensions (1,3) but B has dimensions (4,1)")[39m

**(b)**

Later in the class, we will spend a lot of time on the [transpose Aᵀ](https://en.wikipedia.org/wiki/Transpose) of a matrix A, but for now you can think of it as just *swapping rows and columns*: if A is $m \times n$, then Aᵀ is $n \times m$, and if $x$ is an $m$-component column vector (or an $m \times 1$ matrix then $x^T$ is a $1\times m$ row vector.

So, for the examples above, $A^T$ is $4 \times 3$, $x^T$ is $1\times 4$, and $r^T$ is $3\times 1$.

For these matrides, which of the following make sense, and (for those that make sense) what is the shape of the result?

1. $A^T A$ and/or $AA^T$
2. $x^T x$ and/or $x x^T$

**Check your answers** by trying them out in Julia.  For real matrices and vectors, $A^T$ is written `A'` in Julia and $x^T$ is written `x'`, for example:

### Solution 

1. These both make sense. $A^T A$ is a $4\times4$ matrix, while $AA^T$ is a $3\times3$ matrix (note that these are both square matrices).
2. These both make sense. $x^T x$ is a $1\times1$ matrix. $xx^T$ is a $4\times4$ matrix.

These are the results of these operations in Julia:

In [18]:
A'*A

4×4 Array{Float64,2}:
 0.280945  0.504378  0.411894  0.331684
 0.504378  1.00248   0.583885  0.508483
 0.411894  0.583885  1.06925   0.891146
 0.331684  0.508483  0.891146  0.795849

In [19]:
A*A'

3×3 Array{Float64,2}:
 0.901894  0.792469  0.769983
 0.792469  0.787165  0.59628 
 0.769983  0.59628   1.45946 

In [20]:
x'*x

1×1 Array{Float64,2}:
 0.88506

In [21]:
x*x'

4×4 Array{Float64,2}:
 0.581394   0.266288  0.0665915   0.318129 
 0.266288   0.121964  0.0305      0.145708 
 0.0665915  0.0305    0.00762724  0.0364377
 0.318129   0.145708  0.0364377   0.174074 

## Problem 2 (10 points)

**(a)** Give an exact count (a formula in terms of $m,n,p$) of the number of scalar multiplications required to compute the matrix product $AB$, where $A$ is an $m \times n$ matrix ($m$ rows and $n$ columns) and $B$ is an $n \times p$ matrix ($n$ rows and $p$ columns).

**(b)** Give an exact count (a formula in terms of $m$) of the number of scalar multiplications required to compute the matrix product $Ax$, where $A$ is an $m \times m$ matrix and $x$ is an $m$-component vector.   Explain how this is equivalent to your answer from part (a) in the special case …………………?

**(c)** Computing $ABx$ can be done by $(AB)x$ (first multiplying $AB$ and then multiplying by $x$) or by $A(Bx)$ (first multiplying $Bx$), because matrix multiplication is [associative](https://en.wikipedia.org/wiki/Associative_property).  If $A$ and $B$ are $1000 \times 1000$ matrices and $x$ is a 1000-component vector, explain why your answers from (a) and (b) imply that *one* of these ways of computing $ABx$ is *much* faster than the other way.

Try it out in Julia with some random matrices and compare the results to your prediction based on (a) and (b):

### Solution 

**(a)** If $A$ is an $m \times n$ matrix and $B$ is an $n \times p$ matrix, then $AB$ is an $m\times p$ matrix. Each entry of $AB$ is obtained as a product of a $1\times n$ row vector from $A$ by a $n\times1$ column vector from $B$, so will require $n$ scalar multiplications. The total number of scalar multiplications reqired to compute $AB$ will then be the number of entries in $AB$ (which is $mp$) multiplied by $n$. So the total number of scalar multiplications is $mpn$.

**(b)** If $A$ is an $m\times m$ matrix and $x$ is an $m$-component column vector, then $Ax$ will be an $m$-component column vector. Each entry in this column vector is obtained by taking the product of a $1*m$ row vector from $A$ with the $m\times1$ column vector $x$, which requires $m$ scalar multiplications. Therefore the total number of scalar multiplications required to compute $Ax$ is $m^2$. Note this is a special case of **(a)** in the case where $m=n$ and $p=1$. 

**(c)** From part **(a)**, we know that the muliplication $AB$ will require $1000^3$ scalar multiplications, and then the multiplication $(AB)x$ will require $1000^2$ scalar multiplications. Alternatively, the muliplications $Bx$ and $A(Bx)$ will both require $1000^2$ scalar multiplications. Calculating $ABx$ by $(AB)x$ (first multiplying $AB$ and then multiplying by $x$) will require $10^9+10^6$ scalar muliplications, whilst calculating $ABx$ by $A(Bx)$ (first multiplying $Bx$) will require $2*10^6$ scalar muliplications. We therefore expect $A(Bx)$ to be faster.

We can check this using Julia:

In [29]:
A = rand(1000,1000)  # random 1000×1000 matrix (entries in [0,1))
B = rand(1000,1000)
x = rand(1000);       # random 1000-component vector

In [30]:
# technicality: turn off multi-threading to make it a bit
# easier to interpret benchmark timings:
BLAS.set_num_threads(1)

# time each way 3 times, and look at the smallest times to reduce noise
println("timing (A*B)*x:")
@time (A*B)*x
@time (A*B)*x
@time (A*B)*x
println("timing A*(B*x):")
@time A*(B*x)
@time A*(B*x)
@time A*(B*x);

timing (A*B)*x:
  0.071565 seconds (7 allocations: 7.637 MiB)
  0.092033 seconds (7 allocations: 7.637 MiB, 18.99% gc time)
  0.070474 seconds (7 allocations: 7.637 MiB)
timing A*(B*x):
  0.001242 seconds (6 allocations: 16.031 KiB)
  0.000905 seconds (6 allocations: 16.031 KiB)
  0.000681 seconds (6 allocations: 16.031 KiB)


As expected, $A(Bx)$ is faster than $(AB)x$. However, although $A(Bx)$ is indeed faster, it is not faster by the factor of ~500 that you might naively expect from counting arithmetic operations; in fact, the speedup is only about a factor of 60. Computers are complicated, and things like memory access have a big impact on performance. Understanding precisely what is going on here would be the subject of a computer-architecture course, but basically a matrix-matrix multiplication like $AB$ can use memory much more efficiently than a matrix-vector multiplication, which makes up some of the difference (but not enough to make it faster).

## Problem 3 (10 points)

(From Strang, section 2.2, problem 14.) Consider Gaussian elimination on the following system of equations:

$$
2x + 5y + z = 0 \\
4x + dy + z = 2 \\
y - z = 3
$$

(Write your solution in matrix form.)

* What number $d$ forces you to do a row exchange during elimination, and what (non-singular) triangular system do you obtain for that $d$?
* What value of $d$ would make this system singular (no third pivot, i.e. no way to get a triangular system with 3 nonzero values on the diagonal)?

### Solution

In matrix form, the system of equations is: $$\begin{pmatrix} 2 & 5 & 1 \\ 4 & d & 1 \\ 0 & 1 & -1\end{pmatrix}\begin{pmatrix} x \\ y \\ z\end{pmatrix} = \begin{pmatrix} 0 \\ 2 \\ 3\end{pmatrix}.$$  Now consider performing Gaussian elimination on the associated augmented $3 \times 4$ matrix:
$$\left(\begin{array}{ccc|c}  
 2 & 5 & 1 & 0\\
 4 & d & 1 & 2\\
 0 & 1 & -1 & 3
\end{array}\right).
$$
Subtracting twice the first row from the second to eliminate the (2,1) entry gives:
$$\left(\begin{array}{ccc|c}  
 2 & 5 & 1 & 0\\
 0 & d - 10 & -1 & 2\\
 0 & 1 & -1 & 3
\end{array}\right).
$$

**(i)** We need to do a row exchange (of the second and third rows) if the (2,2) entry is 0, i.e $d - 10 = 0$, i.e. $d = 10$.  

When $d = 10$, we have the augmented matrix:
$$\left(\begin{array}{ccc|c}  
 2 & 5 & 1 & 0\\
 0 & 0 & -1 & 2\\
 0 & 1 & -1 & 3
\end{array}\right).
$$  Exchanging the second and third rows gives:
$$\left(\begin{array}{ccc|c}  
 2 & 5 & 1 & 0\\
 0 & 1 & -1 & 3 \\
 0 & 0 & -1 & 2
\end{array}\right),
$$ a nonsingular triangular system corresponding to the system of linear equations
$$
2x + 5y + z = 0 \\
y - z = 3 \\
-z = 2.
$$

**(ii)** The system will be singular when there is no way to get a third pivot. This occurs exactly when the second and third rows of the $3 \times 3$ matrix above (ignoring the constants on the right) are scalar multiplies of one another; as the (2,3) and (3,3) entries both equal -1, this happens exactly when $d - 10 = 1$, i.e. when $d = 11$.

## Problem 4 (10 points)

(From Strang, section 2.2, problem 11.)

A system of linear equations Ax=b cannot have *exactly two* solutions. An easy way to see why: if two vectors x and y≠x are two solutions (i.e. Ax=b and Ay=b), what is another solution? (Hint: x+y is almost right.)

### Solution

$Ax=b$ and $Ay=b$, so $A(x+y)=Ax+Ay=2b$.  [The key property that $A(x+y)=Ax+Ay$ is called *linearity*, and is what makes matrix multiplication a part of *linear* algebra.]   But we want $b$ on the right-hand side, so we can just divide both sides by 2: $A((x+y)/2) = b$, so $(x+y)/2$ is a solution.  (Since $x \ne y$, this is a *new* solution, halfway between $x$ and $y$.)

In fact, there are infinitely many solutions: anything on the line connecting $x$ and $y$.   Let z = tx + (1-t)y for any number t.  Then $z$ lies on the line connecting $x$ and $y$, and in fact as $t$ varies over all real numbers $t$ the vector $z$ traverses this entire line (check this on paper with your favorite vectors $x$ and $y$ in the plane!).  Then $z$ is another solution, again thanks to linearity: $$Az = A(tx + (1-t)y) = tAx + (1 - t)Ay = tb + (1 - t)b = b.$$

## Problem 5 (15 points)

Suppose we want to solve $Ax=b$ for **more than one right-hand side** $b$.  For example, suppose
$$
A = \begin{pmatrix} 1 & 6 & -3 \\ -2 & 3 & 4 \\ 1 & 0 & -2 \end{pmatrix}
$$
and want to solve *both* $Ax_1 = b_1$ and $Ax_2 = b_2$ for the right-hand sides:
$$
b_1 = \begin{pmatrix} 7 \\ 3 \\ 0 \end{pmatrix} \; b_2 = \begin{pmatrix} 0 \\ -2 \\ 1 \end{pmatrix}
$$

**(a)**

Show that solving *both* $Ax_1 = b_1$ and $Ax_2 = b_2$ is equivalent to solving $AX = B$ where $X$ is an unknown matrix (of what shape?) and B is a given matrix on the right-hand-side.   Give $B$ explicitly, and relate $X$ to your desired solutions $x_1$ and $x_2$.

(Hint: think about the "matrix × columns" viewpoint of matrix multiplication.)

**(b)**

Solve your $AX=B$ equation by forming the augmented matrix $\begin{pmatrix} A & B\end{pmatrix}$, reducing it to upper-triangular form (once), and doing backsubstition (twice) to obtain $X$ and hence $x_1$ and $x_2$.

**(c)**

You can solve $AX = B$ in Julia by the code `X = A \ B`.  The matrix $A$ is given below in Julia.   Enter the matrix $B$, compute `X = A \ B`, and verify that it matches the answer you computed by hand in (b).

### Solution

**(a)** Consider the matrix product $AX$, where $A$ is a $3\times 3$ matrix, and $X$ is a $3\times 2$ matrix, results in a $3\times 2$ matrix. Recall from lecture 2 that the first column of $AX$ is obtained by taking the product $A$(first column of $X$), while the second column of $AX$ is obtained by taking the product $A$(second column of $X$).

Solving $Ax_1 = b_1$ and $Ax_2 = b_2$ is therefore equivalent to solving $AX=B$, where the columns of $B$ are $b_1$ and $b_2$: $$B = \begin{pmatrix} 7 & 0 \\ 3 & -2 \\ 0 & 1 \end{pmatrix},$$ and $$X = \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \end{pmatrix},$$ where we have defined $$ x_1 = \begin{pmatrix} x_{11}  \\ x_{21} \\ x_{31}  \end{pmatrix}, \; x_2 = \begin{pmatrix} x_{12}  \\ x_{22} \\ x_{32}  \end{pmatrix}$$.

**(b)** We can form the augmented matrix $\begin{pmatrix} A & B\end{pmatrix}$:
$$ \left(\begin{array}{ccc|cc}  
 1 & 6 & -3 & 7 & 0\\
 -2 & 3 & 4 & 3 & -2\\
 1 & 0 & -2 & 0 & 1
\end{array}\right).
$$
We can then add $2$ times the first row to the second row, and subtract the first row from the third row, to yield:
$$ \left(\begin{array}{ccc|cc}  
 1 & 6 & -3 & 7 & 0\\
 0 & 15 & -2 & 17 & -2\\
 0 & -6 & 1 & -7 & 1
\end{array}\right).
$$
and then we can add $2/5$ times the second row to the third row to yield:
$$ \left(\begin{array}{ccc|cc}  
 1 & 6 & -3 & 7 & 0\\
 0 & 15 & -2 & 17 & -2\\
 0 & 0 & 1/5 & -1/5 & 1/5
\end{array}\right).
$$
This gives us two triangular systems that we can solve by back susbstitution. The first is
$$
x_{11} + 6x_{21} -3x_{31} = 7\\
15x_{21} - 2x_{31} = 17\\
1/5 x_{31} = -1/5.
$$
which has the solution $x_{11} = -2, x_{21} = 1, x_{31} = -1$. The second is 
$$
x_{12} + 6x_{22} -3x_{32} = 0\\
15x_{22} - 2x_{32} = -2\\
1/5 x_{32} = 1/5.
$$
which has the solution $ x_{12} = 3, x_{22} = 0, x_{32} = 1$. 

Therefore, the solution of $AX =B$ is 
$$X = \begin{pmatrix} -2 & 3 \\ 1 & 0 \\ -1 & 1 \end{pmatrix},$$
or equivalently
$$ x_1 = \begin{pmatrix} -2  \\ 1 \\ -1 \end{pmatrix}, \; x_2 = \begin{pmatrix} 3  \\ 0 \\ 1  \end{pmatrix}$$

**(c)** We can also solve $AX = B$ in Julia, as is shown belown

In [27]:
A =  [ 1  6 -3
     -2  3  4
      1  0 -2 ];
B = [ 7 0 
      3 -2
      0 1 ];

In [28]:
X = A \ B   # solve AX = B for X

3×2 Array{Float64,2}:
 -2.0  3.0
  1.0  0.0
 -1.0  1.0

which is exactly what we found in part **(b)**. 