# Markov matrices

A matrix $A$ is a **Markov matrix** if

* Its entries are all $\ge 0$
* Each **column**'s entries **sums to 1**

Typicaly, a Markov matrix's entries represent **transition probabilities** from one state to another.

For example, consider the $2 \times 2$ Markov matrix:

In [1]:
A = [0.9 0.2
     0.1 0.8]

2×2 Array{Float64,2}:
 0.9  0.2
 0.1  0.8

Let us suppose that this represents the fraction of people switching majors each year between math and English literature.

Let
$$
x = \begin{pmatrix} m \\ e \end{pmatrix}
$$

represent the number of math majors $m$ and English majors $e$.  Suppose that each year, 10% of math majors and 20% of English majors switch majors.  After one year, the new number of math and English majors is:

$$
m' = 0.9 m + 0.2 e
e' = 0.1 m + 0.8 e
$$

But this is equivalent to a matrix multiplication!  i.e. the numbers $x'$ of majors after one year is

$$
x' = A x \,
$$

Note that the two Markov properties are critical: we never have negative numbers of majors (or negative probabilities), and the probabilities must sum to 1 (the net number of majors is not changing: we're not including new students or people that graduate in this silly model).

## Eigenvalues of Markov matrices

There are two key questions about Markov matrices that can be answered by analysis of their eigenvalues:

* Is there a **steady state**?
  - i.e. is there an $x_0 \ne 0$ such that $A x_0 = x_0$?
  - i.e. is there $\lambda_0 = 1$ eigenvector $x_0$?

* Does the system **tend toward a steady state?**
  - i.e. does $A^n x \to \mbox{multiple of } x_0$ as $n \to \infty$?
  - i.e. is $\lambda = 1$ the **largest** $|\lambda|$?
  
The answers are **YES** and **YES** for **any Markov** matrix $A$.

In [16]:
eigvals(A)

2-element Array{Float64,1}:
 1.0
 0.7

To see why, the key idea is to write the columns-sum-to-one property of Markov matrices in linear-algebra terms.  It is equivalent to the statement:

$$
\underbrace{\begin{pmatrix} 1 & 1 & \cdots & 1 & 1 \end{pmatrix}}_{o^T} A = o^T
$$

since this is just the operation that sums all of the rows of $A$.  Equivalently, if we transpose both sides:

$$
A^T o = o
$$

i.e. $o$ is an eigenvector of $A^T$ (called a **left eigenvector of A**) with eigenvalue $\lambda = 1$.

But since $A$ and $A^T$ have the **same eigenvalues** (they have the same characteristic polynomial $\det (A - \lambda I) = \det (A^T - \lambda I)$ because transposed don't change determinants), this means that $A$ **also has an eigenvalue 1** but with a **different eigenvector**.

In [18]:
o = [1,1]
o' * A

1×2 Array{Float64,2}:
 1.0  1.0

In [19]:
A' * o

2-element Array{Float64,1}:
 1.0
 1.0

The eigenvector of $A$ with eigenvalue $1$ must be a basis for $N(A - I)$:

In [21]:
A - 1*I

2×2 Array{Float64,2}:
 -0.1   0.2
  0.1  -0.2

By inspection, $A - I$ is singular here: the second column is -2 times the first.  So, $x_0 = (2,1)$ is a basis for its nullspace, and is the steady state:

In [23]:
(A - I) * [2,1]

2-element Array{Float64,1}:
 5.55112e-17
 5.55112e-17

Let's check if some arbitrary starting vector $(3,0)$ tends towards the steady state:

In [24]:
using Interact
@manipulate for n in slider(0:100,value=0)
    A^n * [3,0]
end

2-element Array{Float64,1}:
 3.0
 0.0

Yes!  In fact, it tends to exactly $(2,1)$, because the other eigenvalue is $< 1$ (and hence that eigenvector component decays exponentially fast).

An interesting property is that the **sum of the vector components is conserved** when we multiply by a Markov matrix.  Given a vector $x$, $o^T x$ is the sum of its components.  But $o^T A = o^T$, so:

$$
o^T A x = o^T x = o^T A^n x$
$$

for any $n$!  This is why $(3,0)$ must tend to $(2,1)$, and not to any other multiple of $(2,1)$, because both of them sum to 3.  (The "number of majors" is conserved in this problem.)

## Why no eigenvalues > 1?

Why are all $|\lambda| \le 1$ for a Markov matrix?

The key fact is that the **product AB of two Markov matrices A and B is also Markov**.  Reasons:

* If $A$ and $B$ have nonnegative entries, $AB$ does as well: matrix multiplication uses only $\times$ and $+$, and can't introduce a minus sign.

* If $o^T A = o^T$ and $o^T B = o^T$ (both have columns summing to 1), then $o^T AB = o^T B = o^T$: the columns of $AB$ sum to 1.

For example, $A^n$ is a Markov matrix for any $n$ if $A$ is Markov.

Now, if there were an eigenvalue $|\lambda| > 1$, the matrix $A^n$ would have to *blow up exponentially* as $n\to \infty$ (since the matrix times that eigenvector, or any vector with a nonzero component of that eigenvector, would blow up).  But since $A^n$ is Markov, all of its entries must be between 0 and 1.  It can't blow up!  So we must have all $|\lambda| \le 1$.

In [25]:
A^100

2×2 Array{Float64,2}:
 0.666667  0.666667
 0.333333  0.333333

(In fact, $A^n$ is pretty boring for large $n$: it just takes in any vector and redistributes it to the steady state.)

## Can there be more than one steady state?

We have just showed that we have *at least one* eigenvalue $\lambda = 1$, and that *all* eigenvalues satisfy $|\lambda| \le 1$.  But can there be *more than one* independent eigenvector with $\lambda = 1$?

**Yes!** For example, the **identity matrix** $I$ is a Markov matrix, and *all* of its eigenvectors have eigenvalue $1$.  Since $Ix = x$ for *any* $x$, *every vector is a steady state* for $I$!

But this does not usually happen for *interesting* Markov matrices coming from real problems.  In fact, there is a theorem:

* If all the entries of a Markov matrix are $> 0$ (not just $\ge 0$), then *exactly one* of its eigenvalues $\lambda = 1$ (that eigenvalue has "multiplicity 1": $N(A-I)$ is one-dimensional), and **all other eigenvalues have** $|\lambda| < 1$.  There is a *unique steady state* (up to an overall scale factor).

I'm not going to prove this in 18.06, however.

## Another example

Let's generate a random 5x5 Markov matrix:

In [5]:
M = rand(5,5) # random entries in [0,1]

5×5 Array{Float64,2}:
 0.272777  0.953171  0.646512  0.287063  0.505887
 0.638533  0.295136  0.241793  0.933168  0.274634
 0.701478  0.562611  0.96985   0.815157  0.509676
 0.724704  0.309826  0.459468  0.30807   0.752348
 0.780111  0.806479  0.677558  0.572014  0.82533 

In [6]:
sum(M,1) # not Markov yet

1×5 Array{Float64,2}:
 3.1176  2.92722  2.99518  2.91547  2.86788

In [7]:
M = M ./ sum(M,1)

5×5 Array{Float64,2}:
 0.0874957  0.325623  0.215851   0.098462  0.176398 
 0.204815   0.100825  0.0807274  0.320074  0.0957622
 0.225006   0.1922    0.323803   0.279597  0.177719 
 0.232455   0.105843  0.153403   0.105667  0.262337 
 0.250228   0.27551   0.226216   0.196199  0.287785 

In [8]:
sum(M,1)

1×5 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0

In [9]:
eigvals(M)

5-element Array{Complex{Float64},1}:
         1.0+0.0im     
   -0.105326+0.132886im
   -0.105326-0.132886im
    0.122489+0.0im     
 -0.00626055+0.0im     

In [10]:
abs.(eigvals(M))

5-element Array{Float64,1}:
 1.0       
 0.169565  
 0.169565  
 0.122489  
 0.00626055

In [13]:
x = rand(5)
x = x / sum(x)

5-element Array{Float64,1}:
 0.281819 
 0.100349 
 0.231969 
 0.31349  
 0.0723724

In [14]:
M^100 * x

5-element Array{Float64,1}:
 0.178866
 0.152502
 0.241941
 0.178764
 0.247927

In [15]:
λ, X = eig(M)
X[:,1] / sum(X[:,1])

5-element Array{Complex{Float64},1}:
 0.178866-0.0im
 0.152502-0.0im
 0.241941-0.0im
 0.178764-0.0im
 0.247927-0.0im