# Linear Algebra 

This notebook provides a brief overview of the following concepts in Linear Algebra:

-  <a href='#Vectors and Matricies'>Vectors and Matricies</a> 
-  <a href='#Matrix Operations'>Matrix Operations</a>
-  <a href='#Vector Spaces'>Vector Spaces</a> 
-   <a href='#Inverses'>Inverses</a> 
- <a href='#The Determinant'>The Determinant</a>
-  <a href='#Diagonalisation'>Diagonalisation</a>
-  <a href='#Eigenvectors and Eigenvalues'>Eigenvectors and Eigenvalues</a> 
- <a href='#Idempotentness'>Idempotentness</a> 
- <a href='#Quadratic Forms'>Quadratic Forms</a> 
- <a href='#Matrix Calculus'>Matrix Calculus</a> 


There are many better resources on the internet for understanding the proofs and theory behind these concepts. This notebook simply aims to use R to run simulations of each of the concepts covered in order to gain a more practical and intuitive understanding of what is being done and why. We use lots of examples along the way to help keep the content (which can definitely be a bit dry at times) interesting.

 ### Vectors and Matricies <a id='Vectors and Matricies'></a>



Anyone who has ever used Microsoft Excel has some understanding of a **matrix**. Ok, that's not quite true, but we're not too far from the truth either. Anyone who has ever used Microsoft Excel *with only numbers in each cell* has some understanding of a matrix. What about **vectors**? Well anyone who has ever used a column in Microsoft Excel *with only numbers in each cell* has some understanding of a vector. Not so sure? We'll lets look at some examples.

$V=\pmatrix{1  \cr 2  \cr 3 \cr}$ is an example of a vector with dimensions $3 \times 1$ and $M=\pmatrix{1 & 2  \cr 4 & 5   \cr 7 & 8  \cr}$ is an example of a matrix with dimensions $3 \times 2$ (the $n \times k$ notation refers to the number of rows of a matrix/vector followed by the number of columns).

The first obvious question is why is V a vector but M is a matrix? Well the answer is that vectors are any array with dimension $n \times 1$, so they essentially form a *column*, like a column in excel. What about M? Well M is just a collection of columns in a sense, so because it has more than one column it is called a matrix. Matricies (the plural of matrix) come in all shapes and sizes, as we'll see in the next section. Matricies are going to get a lot more of our love and attention for the next little while but don't forget about vectors. In the section after next (<a href='#Vector Spaces'>Vector Spaces</a>) they're going to be extremely important.



I guess the last key thing we need to cover here is ***who cares about matricies***?? One answer: economists. When we do econometrics we use data to try to find trends in an economy. Very often this data will be structured as observations and variables. Observations are the things providing us with our data. They could be countries, people, counties, businesses and loads of other things. Variables are the information they are providing us with. If people responding to a survey declare their income this would be stored under a variable named 'Income'. Countries produce statistics on their inflation levels, each of which could be stored in a variable named 'Inflation'. There are countless examples of these kind of observation/variable combinations, and very often this is how the data we use in econometrics is structured. Hopefully the excel example is starting to make more sense.  

Matrix algebra is so key because in econometrics we need to be able to multiply whole rows, columns and datasets to find coefficients which help us understand trends in the data. None of this is done by hand, but without understanding why R, Stata, SAS or any other software gives us the result it does it is crucial we understand what is going on under the hood. After all, lots of this software is so easy to use that a smart 10 year old could run it. It takes Linear Algebra to understand why it works, how to fix it when it breaks and all the traps that we can easily fall into without noticing. So with that out of the way, let's get started.





### Matrix Operations  <a id='Matrix Operations'></a>



We talk about matricies under the topic of Linear Algebra because matricies are an algebraic structure. Most of us remember rules from our high school algebra courses like $a+b=b+a$ and $ab=ba$ but the trouble with matricies is that these rules don't always hold. Let's have a better look at what does and doesn't hold and when.

There are really only two operations we need to consider: addition and multiplication. We can get all the rest from these two. The key with addition is that as long as our matricies (or vectors) have exactly the same dimensions then matrix addition is just regular addition for each point. So: 

$\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$ + 
$\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$=
$\pmatrix{ 2&  2& 2 \cr   2& 2 & 2 \cr 2 &  2 &  2\cr}$

and 

$\pmatrix{ 1&  4& 4 \cr   2& 8 & 1 \cr 7 &  0 &  3\cr 8 &  9 &  3\cr 2 &  5 &  3\cr}$ + 
$\pmatrix{ 6&  5& 2 \cr   3& 8 & 2 \cr 3 &  5 &  6\cr 0 &  1 &  3\cr 2 &  2 &  3\cr}$ = 
$\pmatrix{ 7&  9& 6 \cr   5& 16 & 3 \cr 10 &  5 &  9\cr 8 &  10 &  6\cr 4 &  7 &  6\cr}$

but 

$\pmatrix{ 1&  4& 4 \cr   2& 8 & 1 \cr 7 &  0 &  3\cr 8 &  9 &  3\cr 2 &  5 &  3\cr}$ + 
$\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$ is not defined, as these matricies do not have the same dimensions.

Substraction is just the opposite of addition, so: 

$\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$ - 
$\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$=
$\pmatrix{ 0&  0& 0 \cr   0& 0 & 0 \cr 0 &  0 &  0\cr}$

and 

$\pmatrix{ 1&  4& 4 \cr   2& 8 & 1 \cr 7 &  0 &  3\cr 8 &  9 &  3\cr 2 &  5 &  3\cr}$ + 
$\pmatrix{ 6&  5& 2 \cr   3& 8 & 2 \cr 3 &  5 &  6\cr 0 &  1 &  3\cr 2 &  2 &  3\cr}$ = 
$\pmatrix{ -4&  -1& 2 \cr   -1& 0 & -2 \cr 4 &  -5 &  -3\cr 8 &  8 &  0\cr 0 &  3 &  0\cr}$

but 

$\pmatrix{ 1&  4& 4 \cr   2& 8 & 1 \cr 7 &  0 &  3\cr 8 &  9 &  3\cr 2 &  5 &  3\cr}$ - 
$\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$ is not defined, as once again these matricies do not have the same dimensions.

If this seems tedious then we should let R do more of the work for us:

We define $A=\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr} $, $B=\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$

by:

In [1]:
A<-matrix(c(1,1,1,
            1,1,1,
            1,1,1),nrow=3, byrow=T) 
B<-matrix(c(1,1,1,
            1,1,1,
            1,1,1),nrow=3, byrow=T)

Then $\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr} $+ $\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$ is simply:

In [2]:
A+B

0,1,2
2,2,2
2,2,2
2,2,2


Doing the same for the next two we define $C=\pmatrix{ 1&  4& 4 \cr   2& 8 & 1 \cr 7 &  0 &  3\cr 8 &  9 &  3\cr 2 &  5 &  3\cr}$ , $D=\pmatrix{ 6&  5& 2 \cr   3& 8 & 2 \cr 3 &  5 &  6\cr 0 &  1 &  3\cr 2 &  2 &  3\cr}$, so: 

In [3]:
C<-matrix(c(1,4,4,
            2,8,1,
            7,0,3,
            8,9,3,
            2,5,3),nrow=5, byrow=T) 
D<-matrix(c(6,5,2,
            3,8,2,
            3,5,6,
            0,1,3,
            2,2,3),nrow=5, byrow=T) 

In [4]:
C+D

0,1,2
7,9,6
5,16,3
10,5,9
8,10,6
4,7,6


And the subtraction and invalid dimensions results follow in a predictable way:

In [5]:
A-B
C-D

0,1,2
0,0,0
0,0,0
0,0,0


0,1,2
-5,-1,2
-1,0,-1
4,-5,-3
8,8,0
0,3,0


So everything we expected held (running A+C would give us an error, but feel free to try it for yourself). What about multiplication? Multiplication is a little trickier because it doesn't work 'point-by-point' - we can't just multiply each entry with its corresponding entry like we did for addition. Instead we find the entries in our product matrix (the result of multiplying two matricies) by multiplying the rows of the first matrix with the columns of the second and adding the products, so:

$\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr} \times \pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr}$=
$\pmatrix{ 3&  3& 3 \cr   3& 3 & 3 \cr 3 &  3 &  3\cr}$ We can check this with R:

In [6]:
A%*%B

0,1,2
3,3,3
3,3,3
3,3,3


So the obvious question is why all 3's? What is going on? All we do when multiplying a matrix is take the first row of our left matrix, so $\pmatrix{ 1&  1& 1 \cr}$  and the first column of our right matrix $\pmatrix{ 1 \cr  1 \cr 1 \cr}$ and then multiply the first element with first element, the second with the second ... and sum the results. So because $1\times1+1\times1+1\times1=3$ all our entries will be 3's!

Let's see another example to make it a bit clearer:

$\pmatrix{ 3&  1 \cr   5& 4  \cr} \times 
 \pmatrix{ 1&  2 \cr   3& 1  \cr}$ can be solved by taking the first row $\pmatrix{ 3&  1 }$ and the first column $\pmatrix{ 1 \cr  3 }$, and adding the products, so $3\times1+1\times3=6$, so the product matrix must look like $\pmatrix{ 6&   \cr   &   \cr}$. Then we multiply the first row $\pmatrix{ 3&  1 }$ by the second column $\pmatrix{ 2 \cr  1 }$, so $3\times2+1\times1=7$ to get our second entry $\pmatrix{ 6& 7  \cr   &   \cr}$. Following this logic we can quickly get to   $\pmatrix{ 6& 7  \cr  17 &   \cr}$ and then $\pmatrix{ 6& 7  \cr 17  &  14 \cr}$, which is our answer! What does R have to say?

In [7]:
C<-matrix(c(3,1,
            5,4),nrow=2, byrow=T) 
D<-matrix(c(1,2,
            3,1),nrow=2, byrow=T)
C%*%D

0,1
6,7
17,14


Exactly as expected! It might seem like an obvious point but clearly this method requires us to have a pair of numbers for each of the individual multiplication operations. For example if we tried to find $\pmatrix{ 1&  1& 1 \cr   1& 1 & 1 \cr 1 &  1 &  1\cr} \times \pmatrix{ 3&  1 \cr   5& 4  \cr}$ then our first operation would come from $\pmatrix{ 1&  1& 1 \cr}$  and $\pmatrix{ 3 \cr  5 }$, which if we tried to solve we'd want to find $1\times3+1\times5+1\times$ which clearly makes no sense because there is no third number for the last 1 to be multiplied by. What does R say?



In [8]:
A<-matrix(c(1,1,1,
            1,1,1,
            1,1,1),nrow=3, byrow=T) 
C<-matrix(c(3,1,
            5,4),nrow=2, byrow=T) 

If we ran: 

In [9]:
#A %*% C#

R would return "Error in A %*% C: non-conformable arguments" which is its way of saying we can't do this. Why not? Exactly the reason we just saw, there are a different number of elements in the rows of the left matrix as there are elements in the columns of the right matrix. We can generalise this condition to saying we need the dimensions of the matricies M and N to be $m \times k$ and $k \times n$. So matricies of dimensions $1 \times 4$ and $4 \times 6$ or for example $6 \times 3$ and $3 \times 1312312$ are totally fine, but $1 \times 4$ and $3 \times 6$ or $6 \times 4$ and $3 \times 1312312$ are not.

Hopefully this makes a little more sense now, but if not then remember R is your friend. Try out a few examples yourself!

There are a few more operations we can do with matricies but we'll introduce these as we need them.

### Vector Spaces<a id='Vector Spaces'></a>



A heads up about the rest of this notebook: a lot of these concepts are a bit scary-sounding (WTH is an Eigenvector??). Vector spaces may sound scary too but the good news is that in a sense you are reading this on one. That's right, any 2-D object can be thought of as a part of the vector space $\mathbb{R}^2$. How? Well every point on the 2-D object, let's use this screen for example, can be thought of as a pair of an x-coordinate and a y-coordinate. Because of this, these coordinates can be written as a $2\times1$ vector $\pmatrix{x \cr y}$. All that the vector spaces are is the mathematical concept behind what we already know as actual space! If this kind of makes sense then hopefully you won't be too shocked to hear that if you look up from your screen and start looking around then you'll be staring at a part of the vector space $\mathbb{R}^3$, because every coordinate of your room/library/wherever can be thought of as a $3\times1$ vector $\pmatrix{x \cr y \cr z}$. 

Ok, let me be clear about something: your laptop screen isn't *actually* a part of the vector space $\mathbb{R}^2$ and your room isn't *actually* a part of $\mathbb{R}^3$. They're just analogies. But the reason we use them is that they're extremely useful in getting us to think about what vectors actually are. This is especially useful because if we understand $\mathbb{R}^2$ and $\mathbb{R}^3$ well enough then we can start thinking about $\mathbb{R}^4$, $\mathbb{R}^5$ and $\mathbb{R}^n$, where n could be any whole number (one billion dimensions, anyone?). Clearly any value of n larger than 3 does not accomodate a visual analogy but the maths continues in the exact same way. What's more, if we want to do econometrics then every observation in our sample will get its own dimension, so 17,732 people in a survey means we need to be comfortable working with the vector space $\mathbb{R}^{17732}$. This might sound impossible now, but if we get a solid understanding of what vector spaces are then this will be pretty managable - one day even easy!

So what do we actually mean when we say vector space? Well to start with a vector space is a set. Sets are just collections of objects and we introduced them back in the notebook 'Important Distributions I'. So vector spaces are sets, but they're special sets. They're special because they satisfy two extremely important properties: **closure** under **pointwise addition** and closure under **scalar multiplication**. Closure means that the result of our operation *stays in* the set. So for example the set of all the positive whole numbers 1, 2, 3 etc (AKA natural numbers) is closed under addition because for any two positive whole numbers like 2 and 3, when we add them together we get a third (5) which is still a positive whole number! The set of the positive 'half' numbers $\frac{1}{2},\frac{3}{2},\frac{5}{2} $etc are *not* closed under addition because clearly $\frac{1}{2}+\frac{3}{2}=2$ and 2 isn't a 'half number'.

This same idea extends to our vector spaces. In a vector space, say $\mathbb{R}^3$, *any* vector added to *any* other vector gives us a third vector which will still be in the vector space. For example $\pmatrix{1 \cr 1 \cr 1}+\pmatrix{2 \cr 2 \cr 2}=\pmatrix{3 \cr 3 \cr 3}$, and $\pmatrix{3 \cr 3 \cr 3}$ is in $\mathbb{R}^3$ (because it is a vector of dimension $3\times1$ with real entries)! When we say pointwise addition we simply mean addition like we defined it for matricies in the last section. 

The same concept applies to scalar multiplication. Scalar multiplication is just multiplying a vector by a constant (a scalar), which could be any real (non-imaginary) number. So $5\times \pmatrix{1 \cr 1 \cr 1}$ is a valid form of scalar multiplication, which gives the result $\pmatrix{5 \cr 5 \cr 5}$ - a vector in $\mathbb{R}^3$.

If you're wondering when we're going to start using R in this section then sit tight - we're building up the questions and ideas which R make *much much* easier to deal with than using pen and paper. Back to vector spaces. The reason vector spaces are so important is because vectors are so important, and the reason vectors are so important is because matricies are so important. Matricies are just collections of vectors, because each column in a matrix of dimension $n\times k$ is a vector of dimension $n\times 1$. A really important question in linear algebra is whether any of the columns/vectors in a matrix are linear combinations of other columns/vectors. By linear combinations we mean just that: a combination which is linear. So the vector $\pmatrix{1 \cr 2}$ is a linear combination of $\pmatrix{1 \cr 0}$ and $\pmatrix{0 \cr 1}$ because $\pmatrix{1 \cr 0}+2\pmatrix{0 \cr 1}= \pmatrix{1 \cr 2}$.


The reason this is so important is a little abstract but it's essentially about 'new information'. We're trying to figure out if we add a new column to our matrix do we get any new information or did we already know what is being added. Let's take an econometric example: say we add the distance in miles of the respondent's commutes to our regression. If we knew nothing about their commute before we did this then we could be adding new information which could help us draw conclusions about the sample. But if we already knew the distance of their commutes in kilometres then we've learned nothing! That's because one mile is approximately 1.6 kilometres, so our column/vector/variable 'Commute in Miles', which could look like:  $\pmatrix{13.53\cr 43.3\cr1.4\cr9.32\cr\vdots \cr 0}$ 

is just $1.6$ times our column/vector/variable 'Commute in Kilometres', or $1.6 \times \pmatrix{8.45\cr 27.06\cr0.87\cr9.32\cr\vdots \cr 0}$.

What has this got to do with vector spaces? Three things: **basis**, **rank** and **span**. A basis of a vector space is any set of vectors such that any other vector in the vector space can be written as a linear combination of the basis vectors. It should be pretty easy to see that in $\mathbb{R}^2$ the vectors $\pmatrix{1 \cr 0},\pmatrix{0 \cr 1}$ do the job, because any vector in $\mathbb{R}^2$ (say $\pmatrix{23 \cr 4}$) can be written as a linear combination of $\pmatrix{1 \cr 0},\pmatrix{0 \cr 1}$ (in this case   $23\times\pmatrix{1 \cr 0},4\times\pmatrix{0 \cr 1}$). So $\pmatrix{1 \cr 0},\pmatrix{0 \cr 1}$ form a basis of $\mathbb{R}^2$. Likewise in $\mathbb{R}^3$, $\pmatrix{1 \cr 0\cr 0},\pmatrix{0 \cr 1\cr 0},\pmatrix{0 \cr 0\cr 1}$ form a valid basis, and in $\mathbb{R}^n$, so do the vectors $\pmatrix{1 \cr 0 \cr \vdots \cr 0},\pmatrix{0 \cr 1 \cr \vdots \cr 0},...,\pmatrix{0 \cr 0 \cr \vdots \cr 1}$ (there are n of these).

The key thing to note here is that these aren't the *only* possible bases (plural of basis) for these vector spaces. The reason that they are is that they can't be written as linear combinations of each other. There is no real number $\alpha$ such that $\alpha \times \pmatrix{1 \cr 0}=\pmatrix{0 \cr 1}$, and there are no real numbers $\beta \text{ and } \gamma$ such that $\pmatrix{1 \cr 0\cr 0}=\beta\pmatrix{0 \cr 1\cr 0}+\gamma\pmatrix{0 \cr 0\cr 1}$. We say that vectors which are like this are called **linearly independent**. So if we want to find a basis of $\mathbb{R}^n$ all we need to do is find n vectors which are linearly independent. The easiest way to do that is to find n vectors of the form $\pmatrix{1 \cr 0 \cr \vdots \cr 0},\pmatrix{0 \cr 1 \cr \vdots \cr 0},...,\pmatrix{0 \cr 0 \cr \vdots \cr 1}$, which have the special name the 'standard basis vectors'.

What about rank?

The rank of a matrix is the number of linearly independent columns/vectors it contains. So the matrix $\pmatrix{1 &0& 0 \cr 0 & 1 & 0\cr 0 & 0 &1}$ (which is just the matrix of the vectors $\pmatrix{1 \cr 0\cr 0}, \pmatrix{0 \cr 1\cr 0}, \pmatrix{0 \cr 0\cr 1}$) has rank 3, because it has 3 linearly independent columns.

The matrix $\pmatrix{1 &0& 0 & 0 & 2 & 4 & 0 & 5 &1 & 4 & 0 &5\cr 0 & 1 &0 & 4 & 4 &1 7 & 3 &11 & 8 & 6 &1 & 7\cr 0 & 0 &1 & 4 & 0 &5 &7 &5 &2 &4 &4 &1}$ can only have rank 3 because it contains the standard basis vectors, so the rest of the columns/vectors can't be linearly independent!

It is extremely tedious to find the rank of a matrix with pen and paper, so this where R helps us a lot. Take the matrix $\pmatrix{ 3&  1 \cr   5& 4  \cr}$. R finds it rank much faster than we can!

In [10]:
a<-c(3,1,
     5,4)
C<-matrix(a,nrow=2, byrow=T) 

In [11]:
qr(C)$rank

The span of a matrix is just the vector space associated with its rank. So a matrix of rank 2 spans $\mathbb{R}^2$ and a matrix of rank 17732 spans $\mathbb{R}^{17732}$. 

This was quite a theoretical section but it is essential for us to understand these concepts in order to do econometrics. 

 ### The Determinant<a id='The Determinant'></a>



Once again we are confronted with a strange sounding concept, but the level of understanding we are looking for isn't too hard to grasp. What's more it has a pretty useful link with our previous section (especially the idea of rank) and leads us nicely to another very useful concept: inverses. So what is the determinant of a matrix? It's a number. What kind of a number? Well that depends on the size of the matrix. The first thing to note is that we can only find determinants of square matricies (of dimensions $n\times n$). The second is that the determinant of a 1$\times$1 matrix is just itself. So det$\pmatrix{5}=5$. The third thing is that the determinant of a 2$\times$2 matrix is top-left element multiplied by the bottom right element minus the bottom-left element multiplied by the top right element. So:

det$\pmatrix{4 &4 \cr 2 &7}=28-8=20$

That's the story of the determinant for matricies of dimensions $1\times 1$ and $2\times 2$, but what about larger matricies? Well to calculate the determinant of any matrix of size $3\times 3$ or larger we split the matrix into $2\times 2$ matricies and then use those to find the determinant of the larger matrix. How do we split our matrix up? Well in a $3\times 3$, for each entry we just ignore the row and column that entry is in and find the determinant of the $2\times 2$ matrix we are left with. So for: 

$\pmatrix{ 3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}$ we start by ignoring the first row and column, or considering $\pmatrix{ &  &  \cr   & 2 &6 \cr  &  6 &  5\cr}$ and then finding the determinant of the underlying 2$\times$2 matrix $\pmatrix{ 2 &6 \cr   6 &  5\cr}$ (which is -2).

These 'mini-determinants' are called **minors** and each element in the matrix has one. In order to find the determinant we then multiply each element of the $3\times 3$ matrix by its minor, and then either 1 or -1 based on where the element falls in the matrix: $\pmatrix{ 1&  -1& 1 \cr   -1& 1 &-1 \cr 1 &  -1 &  1\cr}$. The numbers that we get left with for each element are called **cofactors**, and once we sum them all together we get the determinant! We aren't going to do a fully worked example here because its not particularly instructive, but this is the process (one of many) of finding the determinant of a $3\times 3$ matrix. 

We repeat this process for $4\times 4$ matricies by using the $3\times 3$ minors and for $5\times 5$ matricies by using the $4\times 4$ minors. Obviously this gets ***REALLY TEDIOUS*** really quickly, which is why R is so helpful.




In [12]:
a<-c(3,4,5,
     3,2,6,
     9,6,5)
C<-matrix(a,nrow=3, byrow=T) 

In [13]:
det(C)

R gives us a slightly strange answer because of how it stores numbers but we know it means 78 as we have done no division and used only whole numbers. This is considerably faster than finding the determinant of $\pmatrix{ 3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}$ by hand!

The crucial reason we need to care about the determinant is because it tells us about the rank of our matrix. Any matrix with linearly dependent columns, or a rank less than its number of columns, will have a determinant of 0. Matricies with a determinant of 0 are called singular matricies. This makes it extremely easy to figure out if we have linearly dependent columns in our matrix. For example because det$\pmatrix{ 3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}=78$, $\pmatrix{ 3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}$ has a rank of 3. We can check this:

In [14]:
qr(C)$rank

How did we know? We saw it was non-singular as its determinant wasn't 0 (it was 78). This means it has no linearly dependent columns, so its number of linearly dependent columns is just its number of columns. Clearly $\pmatrix{ 3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}$ has 3 columns, so it also has 3 linearly independent columns. And because the rank of a matrix is just the number of linearly independent columns it must have rank 3! When a matrix has no linearly dependent columns (which, as we've just seen is the same has have its rank equal to its number of columns) we say it has 'full column rank'. 

So what about the matrix $\pmatrix{ 3&  4& 4 \cr   3& 2 &2 \cr 9 &  6 &  6\cr}$? It obviously has one linearly dependent column and therefore it doesn't have full column rank. This makes it a singular matrix and as a result we should expect its determinant to be 0.

In [15]:
a<-c(3,4,4,
     3,2,2,
     9,6,6)
C<-matrix(a,nrow=3, byrow=T) 
det(C)

This is exactly the result we get! So we now know that whenever we want to know whether a matrix has linearly dependent columns we just check whether its determinant is 0!

### Inverses<a id='Inverses'></a>



Inverses don't sound scary and they're not - they're exactly what we expect them to be. There are only two real rules to keep in mind about inverses. Firstly only square matricies have inverses, or are *invertible*. The second is that not all square matricies are invertible. Which ones aren't? Well matricies that have linearly dependent columns! So any square non-singular matrix (one with a determinant not equal to 0) can be inverted. These are called invertible matricies for obvious reasons. Before we can go further we have to introduce **the identity matrix** and the **transpose** operation.

The identity matrix is essentially the matrix equivalent of the number 1 in that it doesn't affect our matricies at all to be multiplied by it. The identity matrix is a square matrix with 1s along its top-left/bottom-right diagonal and 0s everywhere else. Each square dimension $n\times n$ has its own identity matrix, but they all follow this structure. Here are a few examples:

$I_2=\pmatrix{1 &0 \cr 0 &  1\cr}$, $I_3= \pmatrix{ 1&  0& 0 \cr   0& 1 &0 \cr 0 &  0 &  1\cr}$, $I_n= \pmatrix{1&0&  \cdots& 0 \cr 0&1&  \cdots& 0 \cr  \vdots& \vdots& \ddots &\vdots \cr 0&0 &  \cdots &  1\cr}$

We can also see that  $\pmatrix{ 3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}\times \pmatrix{ 1&  0& 0 \cr   0& 1 &0 \cr 0 &  0 &  1\cr}= \pmatrix{3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}$

In [16]:
C<-matrix(c(3,4,5,
            3,2,6,
            9,6,5),nrow=3, byrow=T) 
I<-matrix(c(1,0,0,
            0,1,0,
            0,0,1),nrow=3, byrow=T) 

In [17]:
C%*%I

0,1,2
3,4,5
3,2,6
9,6,5


The second concept is the transpose of a matrix. Transposition is essentially changing all of the rows of a matrix into columns and all of the columns into rows. The rows and columns both keep their position, so the 1st row is the 1st column, the 2nd row the 2nd column etc. Unlike inversion, any matrix of any dimensions (including vectors) can be transposed. For example: $\pmatrix{ 3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}^{ T}= \pmatrix{ 3&  3& 9 \cr   4& 2 &6 \cr 5 &  6 &  5\cr}$  and $\pmatrix{ 3  \cr   3 \cr 9   \cr}^{ T}= \pmatrix{ 3&  3& 9 \cr}$

Now that we have these concepts down we can define the inverse of a matrix $M$ (which is invertible and has dimensions $n\times n$) as the matrix $M^{-1}$ such that: 

$M^{-1}M$$=I_n $. In order to find this matrix $M^{-1}$ we must take the transpose of the matrix of cofactors (which we introduced in the last section) which gives us the **adjoint** matrix. Then we multiply the adjoint matrix by $\dfrac{1}{det(M)}$ to find the inverse. This final step should show us why non-singularity is a crucial condition. If our matrix were singular it would have a determinant of 0, so we couldn't multiply the adjoint matrix by $\dfrac{1}{det(M)}$

This method is also very tedious so we skip any worked examples and go straight to using R to calculate our inverses for us:

In [18]:
C<-matrix(c(3,4,5,
            3,2,6,
            9,6,5),nrow=3, byrow=T) 
D<-solve(C)

In [19]:
D

0,1,2
-0.3333333,0.1282051,0.17948718
0.5,-0.3846154,-0.03846154
0.0,0.2307692,-0.07692308


In [20]:
round(C%*%D,10)

0,1,2
1,0,0
0,1,0
0,0,1


So the D matrix calculated above is the inverse of $\pmatrix{ 3&  4& 5 \cr   3& 2 &6 \cr 9 &  6 &  5\cr}$! If we tried to invert $D=\pmatrix{ 6&  5& 2 \cr   3& 8 & 2 \cr 3 &  5 &  6\cr 0 &  1 &  3\cr 2 &  2 &  3\cr}$ we would get an error as it is not a square matrix, just as we would if we tried to invert $\pmatrix{ 3&  4& 4 \cr   3& 2 &2 \cr 9 &  6 &  6\cr}$ as it is singular

Inverses and transposes are really important in econometrics for lots of things, including the formula for the vector of OLS coefficients: $\hat{\beta} =(X'X)^{-1}X'Y$. This formula probably won't mean much to you right now but once you've finished the notebook Econometrics you'll know it better than some of your cousins.

### Eigenvectors and Eigenvalues<a id='Eigenvectors and Eigenvalues'></a>



Alright now we've reached peak-scary: the Eigenvectors and Eigenvalues! The first thing to note is that Eigen comes from the German for 'proper'. Nothing scary about that, at least not if you've kept up with history for the last 75 years. Now what are they and why are they useful? Eigenvalues are the solutions to an important polynomial called the **characteristic polynomial**. We obtain the charateristic polynomial for an $n \times n$ matrix finding the solutions of det$(M-\lambda I_n)=0$. Let's see an example:

If $M=\pmatrix{4 &4 \cr 2 &7}$, $(M-\lambda I_n)$ is $M=\pmatrix{4-\lambda &4 \cr 2 &7-\lambda}$. This means the characteristic polynomial is $(4-\lambda)(7-\lambda)-8=0$ which has solutions 

In [21]:
C<-matrix(c(4,4,
            2,7),nrow=2,byrow=T)
eigen(C)$values

Which are its eigenvalues! These are kind of ugly numbers so we'll do another example to reassure that eigenavalues can be friendly too. 

If $M=\pmatrix{6 &0 \cr 0 &8}$, $(M-\lambda I_n)$ is $M=\pmatrix{6-\lambda &0 \cr 0 &8-\lambda}$. This means the characteristic polynomial is $(6-\lambda)(8-\lambda)-8=0$ which has solutions 

In [22]:
C<-matrix(c(6,0,
            0,8),nrow=2,byrow=T)
eigen(C)$values

Of course not all our eigenvalues will be this nice. The huge advantage of doing this on R is working with more than 2 dimensions is really time consuming, so R makes life a lot easier. We still haven't actually jutisfied why we're learning about eigenvalues so let's do that now. Eigenvalues are another way to think about rank. Matricies will have a unique eigenvalue for every linearly independent column they have. So the number of non-zero eigenvalues associated with a matrix is equal to its rank. To see this let's consider the singular matrix:

$M=\pmatrix{1 &323.1 \cr 0 &0}$


In [23]:
C<-matrix(c(1,323.1,
            0,0),nrow=2,byrow=T)
eigen(C)$values

Because it is a singular matrix it has linearly dependent columns. In this case it just has 1 linearly dependent column, so it has 1 eigenvalue equal to 0. By contrast $M=\pmatrix{1 &323.1 & 7 &21 \cr 0 &0&0&0 \cr 0 &0&0&0 \cr 0 &0&0&0 \cr}$ has 3 linearly depenent columns so we should expect 3 eigenvalues to be 0:

In [24]:
C<-matrix(c(1,323.1,7, 21,
            0,0, 0, 0,
            0,0,0,0,
            0,0,0,0),nrow=4,byrow=T)
eigen(C)$values

Ta-da! So what are eigenvectors then? Eigenvectors are the vectors $x$ which solve the the equation $Mx=\lambda x$, where $\lambda$ represents our eigenvalues. Each unique eigenvalue will 'create' eigenvectors which are linearly independent, which should make sense because the eigenvalues were unique because they came from linearly independent columns in the first place! We can use R to show this:

In [25]:
C<-matrix(c(6,0,
            0,8),nrow=2,byrow=T)
eigen(C)$values
eigen(C)$vectors

0,1
0,-1
1,0


Clearly 8 and 6 are two unique eigenvalues, so they are associated with two linearly independent eigenvectors. This is a rule: unique eigenvalues mean linearly independent eigenvectors. However this is where we have to be careful. If there are repeated eigenvalues then we could have linearly dependent columns or not; it depends. For example

In [26]:
C<-matrix(c(-1,0,1,
            3,0,-3,
            1,0,-1),nrow=3,byrow=T)
eigen(C)$values
eigen(C)$vectors

0,1,2
-0.3015113,0,0.7071068
0.904534,1,0.0
0.3015113,0,0.7071068


So this is an example of repeated eigenvalues but linearly independent eigenvectors. Now for another example:

In [27]:
C<-matrix(c(0,1,0,
            0,0,1,
            2,-5,4),nrow=3,byrow=T)
eigen(C)$values
eigen(C)$vectors

0,1,2
-0.2182179,-0.5773503,0.5773503
-0.4364358,-0.5773503,0.5773503
-0.8728716,-0.5773503,0.5773503


Again R gives us almost whole numbers, but we should interpret these eigenvalues as 1 and 1. However unlike the previous example we now have linearly dependent columns.

We've gotten through a bit here but that's all we really need to know about eigenvalues and eigenvectors for now. As with all of this material there is a ***LOT*** more to be said about this topic but we'll avoid getting too far into the weeds until a later notebook.

 ### Diagonalisation<a id='Diagonalisation'></a>



A matrix is diagonalisable if it can be converted into a diagonal matrix of its eigenvalues. Mathematically this means that, for a matrix M of dimension $m\times m$, M has m linearly independent eigenvectors. Given we spent some time on it just before we know that this means all matricies with unique eigenvalues will be diagonalisable and some with non-unique eigenvalues will be too. 

Practically this means that for the matrix $M$ there exists a matrix $P$ such that $P^{-1}MP=\pmatrix{\lambda_1&0&  \cdots& 0 \cr 0&\lambda_2&  \cdots& 0 \cr  \vdots& \vdots& \ddots &\vdots \cr 0&0 &  \cdots &  \lambda_n\cr}$ where $\lambda_1,\lambda_2...$ are $M$'s eigenvalues.

This might look like new material you have to learn but it really isn't. The $P$ matrix is simply the matrix of eigenvectors which we have been calculating above and clearly $P^{-1}$ is just its inverse. Let's see an example:

Consider the matrix $\pmatrix{-1 &-1 &1\cr 0 &-2 &1 \cr 0& 0&-1}$

We see its eigenvalues and eigenvectors are:

In [28]:
C<-matrix(c(-1,-1,1,
            0,-2,1,
            0,0,-1),nrow=3,byrow=T)
eigen(C)$values
eigen(C)$vectors



0,1,2
0.7071068,1,0.0
0.7071068,0,0.7071068
0.0,0,0.7071068


So we know $P=\pmatrix{1 &1 &0\cr 1 &0 &1 \cr 0& 0&1}$, which we can find the inverse of using R:


In [32]:
P<-matrix(c(1,1,0,
            1,0,1,
            0,0,1),nrow=3,byrow=T)
P_inv<-solve(P)
P_inv

0,1,2
0,1,-1
1,-1,1
0,0,1


So $P^{-1}=\pmatrix{0 &1 &-1\cr 1 &-1 &1 \cr 0& 0&1}$


And therefore: $\pmatrix{0 &1 &-1\cr 1 &-1 &1 \cr 0& 0&1}\pmatrix{-1 &-1 &1\cr 0 &-2 &1 \cr 0& 0&-1}\pmatrix{1 &1 &0\cr 1 &0 &0 \cr 0& 1&1}$ = $\pmatrix{-2 &0 &0\cr 0 &-1 &0 \cr 0& 0&-1}$

Which we can double check by matrix multiplication:


In [34]:
P_inv%*%C%*%P

0,1,2
-2,0,0
0,-1,0
0,0,-1


R is cool!

### Idempotentness<a id='Idempotentness'></a>



Idempotent sounds scary but it really isn't. All it means is that $MM=M$. Clearly all the identity matricies are idempotent but there are also lots of useful idempotent matricies in econometrics. One which we will introduce here is the *residual maker*: $M_X=I_n-X(X'X)^{-1}X'$. 

It is an idempotent matrix because:

$(I_n-X(X'X)^{-1}X')(I_n-X(X'X)^{-1}X')$=

$I_n-X(X'X)^{-1}X'-X(X'X)^{-1}X'+X(X'X)^{-1}X'X(X'X)^{-1}X'$=

$I_n-X(X'X)^{-1}X'-X(X'X)^{-1}X'+X(X'X)^{-1}X'=I_n-X(X'X)^{-1}X'$

This wizardry is called idempotentness and there are many more matricies like this but we'll leave it here for now.

### Quadratic Forms<a id='Quadratic Forms'></a>





Matricies and vectors can be multiplied together using our matrix notation we've spent this notebook perfecting but they can also be represented using simple sigma notation. For example if we consider the simple example where $a=\pmatrix{1 & 1 &  1 & 1 & 1} , b=\pmatrix{1 \cr 1  \cr 1 \cr 1 \cr 1\cr}$ 

$ab=\pmatrix{1 & 1 &  1 & 1 & 1}\pmatrix{1 \cr 1  \cr 1 \cr 1 \cr 1\cr}=5$


If we replace these numbers by variables such as $a=\pmatrix{a_1 & a_2 &  a_3 & a_4 & a_5} , b=\pmatrix{b_1 \cr b_2  \cr b_3 \cr b_4 \cr b_5\cr}$ then 

$ab=\pmatrix{a_1 & a_2 &  a_3 & a_4 & a_5}\pmatrix{b_1 \cr b_2  \cr b_3 \cr b_4 \cr b_5\cr}=\sum\limits_{i=1}^{5}a_ib_i$

We can use this notation to make life simpler for us when solving quadratic equations. For example any vector made up of variable $\pmatrix{x_1 \cr x_2 \cr  \vdots \cr
x_n}$ can be made into a scalar by multiplication by a symmetric matrix $A$ as follows:


$x'Ax=\pmatrix{x_1 & x_2 &  \cdots & x_n}\pmatrix{a_{11}&a_{12}&  \cdots& a_{1n} \cr a_{21}&a_{22}&  \cdots& a_{2n} \cr  \vdots& \vdots& \ddots &\vdots \cr a_{n1}&a_{n2} &  \cdots &  a_{nn}\cr}\pmatrix{x_1 \cr x_2 \cr  \vdots \cr
x_n}=\sum\limits_{j=1}^{n}\sum\limits_{i=1}^{n}x_ix_ja_{ij}$


Again this might seem useless but will come in handy in econometrics.

### Matrix Calculus<a id='Matrix Calculus'></a>

The last essential topic we need to cover before we can dive into our first econometrics notebook (at last) is how to differentiate (we don't cover integration here) matricies. Differentiating a function of a vector of variables like $f(x)=f(\pmatrix{x_1 \cr x_2 \cr  \vdots \cr x_n})$ may seem ridiculously complex but its actually not so bad if we just think about what we're doing.

The first thing to realise is that $\dfrac{\mathrm d}{\mathrm d x} f(x) $ IS NOT THE SAME AS $\dfrac{\partial}{\partial  x} f(x) $

The 'straight' $\mathrm d$ is the symbol for a regular derivative with respect to one variable, whereas the 'curly'  $\partial$ is the symbol for a partial derivative with respect to multiple variables. So to differentiate $f(x)$ from before we need to use partial derivatives, which denotes we are differentiating with respect to each variable in the vector $x$. What does this look like? Well:

$\dfrac{\partial}{\partial  x} f(x) =\pmatrix{\dfrac{\partial f(x)}{\partial  x_1}  \cr \dfrac{\partial f(x)}{\partial  x_2} \cr  \vdots \cr \dfrac{\partial f(x)}{\partial  x_n} }$

This is useful background for the next section but we won't dwell on it here. So good news, we're done! Now that you've got a good grasp of matrix algebra head over to the Econometrics notebook to get started on using R for econometrics.