# Markov processes

A Markov process can be used to describe the dependency between consecutive data samples. This is done using the state-transition matrix 
$$\boldsymbol{P}=\left[\begin{array}{ccc}
P_{11} & P_{12} & P_{13}\\
P_{21} & P_{22} & P_{23}\\
P_{31} & P_{32} & P_{33}
\end{array}\right]$$


## Markov model for the weather of the Land of Oz

Consider that the weather in the land of Oz can be one of n=3 states: x_{1}=R: rainy weather, x_{2}=N: nice weather, and x_{3}=S: snowy weather. The state diagram of a stationary Markov process that is characterized by its state-transition matrix 
$$ \boldsymbol{P}=\left[\begin{array}{ccc}
1/2 & 1/4 & 1/4 \\
1/2 & 0 & 1/2 \\
1/4 & 1/4 & 1/2
\end{array}\right] $$ 
where the elements $i,j$ specify the probability $$ \boldsymbol{P}_{ij} = P[X(k)=x_{j}(k)|X(k-1)=x_{i}(k-1)] $$ that the next state is $x_{j}$ when you are currently ($k-1$) in state $x_i$.

Remark that this matrix contains probabilities on each row (each row corresponding to a different current state $x_i$.) Hence, $$ \sum_{j=1}^n P_{ij} = 1 \qquad \forall i.$$ The matrix is therefore called to be a row stochastic matrix.

Define the matrix $\boldsymbol{P}$ in numpy and show that it is a row stochastic matrix.

In [3]:
import numpy
matrix = numpy.array([[.5, .25, .25],[ .5, 0, .5],[ .25, .25, .5]])
print(matrix)
sums = numpy.array([sum(row) for row in matrix])
print(sums)

[[0.5  0.25 0.25]
 [0.5  0.   0.5 ]
 [0.25 0.25 0.5 ]]
[1. 1. 1.]


## Generate a realization of the weather of the land of Oz

Generate and print a realization of length 20 of the weather of the land of Oz starting from the initial state: the nice weather.

In [15]:
from random import random
RAINY_WEATHER = 0
NICE_WEATHER = 1
SNOWY_WEATHER = 2
def pretty_print(x):
    if x == RAINY_WEATHER:
        print("Rainy Weather")
    elif x == NICE_WEATHER:
        print("Nice Weather")
    else:
        print("Snowy Weather")

def realization(amount):
    current_weather = NICE_WEATHER
    result = [current_weather]
    for i in range(amount):
        r = random()
        for j, val in enumerate(matrix[current_weather]):
            r -= val
            if r < 0:
                result.append(j)
                current_weather = j
                break
    return result
print(realization(20))


[1, 2, 2, 1, 0, 0, 0, 0, 0, 2, 2, 1, 0, 0, 2, 1, 2, 0, 1, 0, 1]


## Probability to be in a state

To determine the entropy of a Markov source, it is necessary to determine the probability to be in a given state. Using the Markov process generator defined before, generate the a realization a sequence of 100000 state and determine the probability of have rainly, nice and snowy weather. This corresponds with a finite sample estimation of the probability vector $\boldsymbol{\pi}$ introduced in the theory.

Note that for different realizations, one obtains similar results for the probabilty of being in a certain state (try running this algorthm different times and look to the results)

In [26]:

import collections


result_100_000 = realization(100_000)
print([b for a,b in sorted(collections.Counter(result_100_000).items())])

[39706, 20015, 40280]


## Properties stochastic matrix

It is known that for a row stochastic matrix such as $\boldsymbol{P}$ that its largest left eigenvalue $\lambda_{1}=1$. If there is a unique stationary distribution (only the largest eigenvalue equals to one, or mathematically $1=\lambda_{1}>\lambda_{2})$, then the largest eigenvalue is unique (=1) and hence, the corresponding eigenvector $(\boldsymbol{W}_{[1,:]})$ are both unique 
$$\boldsymbol{P}=\boldsymbol{W}^{-1}\boldsymbol{D}\boldsymbol{W}$$
with $\boldsymbol{W}$  the matrix of eigenvectors and $\boldsymbol{D}$ the diagonal matrix of *left* eigenvalues.

First, compute the eigenvalues and eigenvectors using the linalg.eig function. As the eig function computes the *right* eigenvalues/eigenvector problem, we have to work on the transposed probability matrix.

Verify that exactly one eigenvalue equals 1 and print the eigenvectors (but don't forget the transpose operation)

In [29]:
eig_vals, eig_vectors = numpy.linalg.eig(numpy.transpose(matrix))
print(f"eigenvalues: {eig_vals}")
print(f"eigenvectors: {eig_vectors}")

eigenvalues: [ 1.    0.25 -0.25]
eigenvectors: [[-6.66666667e-01 -7.07106781e-01  4.08248290e-01]
 [-3.33333333e-01 -1.01150648e-16 -8.16496581e-01]
 [-6.66666667e-01  7.07106781e-01  4.08248290e-01]]


Note that one eigenvalue is equal to 1 while all others are $|\lambda_i|<1$ for all $i=2,\ldots,n$. 

Check that the eigenvector coresponding to the largest eigenvalue is not representing a probability vectors with  $0 \le p_i \le 1$ and $$ \sum_{i=1}^n p_i = 1 $$ but with a quadratic norm scaling $$ \sum_{i=1}^n w_i^2 = 1. $$

In [33]:
eig_vect_1 = eig_vectors[:,0]
normal_sum = sum(eig_vect_1)
quad_sum = sum(e**2 for e in eig_vect_1)
print(f"without quadratic norm scaling : {normal_sum}")
print(f"with quadratic norm scaling : {quad_sum}")

without quadratic norm scaling : -1.6666666666666665
with quadratic norm scaling : 0.9999999999999999


It is possible to scale an eigenvector with an arbitrary constant. Determine the scaling of the eigenvector such that it becomes an probability vector, which is mathematically represented by $\boldsymbol{\pi}$.

In [39]:
scalar = 1 / normal_sum
pi_vect = scalar * eig_vect_1
pi_vect_sum = sum(pi_vect)
print(f"π-probability vector = {pi_vect} with sum: {pi_vect_sum}")

π-probability vector = [0.4 0.2 0.4] with sum: 1.0


## Entropy of a Markov process

The entropy of a (stationary and ergodic) Markov process equals
$$ H(X)	=	-\sum_{i=1}^{n}\sum_{j=1}^{n}\pi_{i}P_{ij}\log P_{ij}. $$
Determine the entropy of the considered Markov process.

In [None]:
from math import log2
P = matrix
PlogP = numpy.array([[ 0 if x == 0 else - x * log2(x) for x in row] for row in P])
print(f"- PlogP matrixPlogP: {PlogP}")

HXxi = numpy.array([sum(column) for column in PlogP.transpose()])
print(f"H(X|xi) = {HXxi}")

Hx = sum(HXxi * pi_vect)
print(f"H(X) = {Hx}")

- PlogP matrixPlogP: [[0.5 0.5 0.5]
 [0.5 0.  0.5]
 [0.5 0.5 0.5]]
H(X|xi) = [1.5 1.  1.5]
H(X) = 1.4


## Covergence to the asymptotic probability

The vector $\boldsymbol{\pi}$ gives use the probabiltiy $p(x_i)$ that the Markov process is in state $x_i$. The assumption that the Markov process is stationary and ergodic implies that there is only the largest eigenvalue equal is to one. Due to this, the process will always converge to this distribution, irrespective of probability distribution at the initial time of $k=0$.

To illustrate this, consider an arbitrary probability distribution, perform 5 successive steps in the Markov process, and compare the final probability vector with the asymptotic one.

In [81]:
prob_vect = numpy.array([0.7, 0.1, 0.2])
amount_of_steps = 5
for i in range(amount_of_steps):
    prob_vect = numpy.dot(prob_vect, matrix)
    print(f"Iteration {i+1}: \nprobability vector {prob_vect}")

Iteration 1: 
probability vector [0.45  0.225 0.325]
Iteration 2: 
probability vector [0.41875 0.19375 0.3875 ]
Iteration 3: 
probability vector [0.403125  0.2015625 0.3953125]
Iteration 4: 
probability vector [0.40117187 0.19960937 0.39921875]
Iteration 5: 
probability vector [0.40019531 0.20009766 0.39970703]
