## Markov Decision Process
Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.

### Stocastic model that assumes the markov property.A stochastic models is a process where the state depends on previous states in a non-deterministic way.A stochastic process has markov property if conditional probability distribution of future states of process.

In [12]:
from IPython.display import Image
img = 'Markov_Decision_Process.png'
Image(url=img)

## Hidden Markov Models (HMM) :- 
Class of probabilistic model that allows us to predict a sequence of unknown(hidden) variables from a set of observed variables. A simple example of an HMM is predicting weather(hidden variable) based on type of clothes that someone wear(observed).

In [14]:
img = 'winter.jpg'
Image(url=img)

In [2]:
#install networkx
pip install networkx

Note: you may need to restart the kernel to use updated packages.


In [3]:
# import Libraries
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt


In [5]:
# create state space and initial state probabilities

states = ['sleeping', 'eating', 'pooping']
pi = [0.35, 0.35, 0.3]
state_space = pd.Series(pi, index=states, name='states')

print(state_space)

sleeping    0.35
eating      0.35
pooping     0.30
Name: states, dtype: float64


In [6]:
print(state_space.sum())

1.0


In [7]:
# create transition matrix
# equals transition probability matrix of changing states given a state
# matrix is size (M x M) where M is number of states

q_df = pd.DataFrame(columns=states, index=states)
q_df.loc[states[0]] = [0.4, 0.2, 0.4]
q_df.loc[states[1]] = [0.45, 0.45, 0.1]
q_df.loc[states[2]] = [0.45, 0.25, .3]

print(q_df)

         sleeping eating pooping
sleeping      0.4    0.2     0.4
eating       0.45   0.45     0.1
pooping      0.45   0.25     0.3


In [8]:
q = q_df.values
print('\n', q, q.shape, '\n')
print(q_df.sum(axis=1))


 [[0.4 0.2 0.4]
 [0.45 0.45 0.1]
 [0.45 0.25 0.3]] (3, 3) 

sleeping    1.0
eating      1.0
pooping     1.0
dtype: float64


In [9]:
from pprint import pprint 
# create a function that maps transition probability dataframe 
# to markov edges and weights

def _get_markov_edges(Q):
    edges = {}
    for col in Q.columns:
        for idx in Q.index:
            edges[(idx,col)] = Q.loc[idx,col]
    return edges

edges_wts = _get_markov_edges(q_df)
pprint(edges_wts)

{('eating', 'eating'): 0.45,
 ('eating', 'pooping'): 0.1,
 ('eating', 'sleeping'): 0.45,
 ('pooping', 'eating'): 0.25,
 ('pooping', 'pooping'): 0.3,
 ('pooping', 'sleeping'): 0.45,
 ('sleeping', 'eating'): 0.2,
 ('sleeping', 'pooping'): 0.4,
 ('sleeping', 'sleeping'): 0.4}


In [10]:
# create graph object
G = nx.MultiDiGraph()

# nodes correspond to states
G.add_nodes_from(states)
print(f'Nodes:\n{G.nodes()}\n')

 

Nodes:
['sleeping', 'eating', 'pooping']



In [11]:
# edges represent transition probabilities
for k, v in edges_wts.items():
    tmp_origin, tmp_destination = k[0], k[1]
    G.add_edge(tmp_origin, tmp_destination, weight=v, label=v)
print(f'Edges:')
pprint(G.edges(data=True))  

Edges:
OutMultiEdgeDataView([('sleeping', 'sleeping', {'weight': 0.4, 'label': 0.4}), ('sleeping', 'eating', {'weight': 0.2, 'label': 0.2}), ('sleeping', 'pooping', {'weight': 0.4, 'label': 0.4}), ('eating', 'sleeping', {'weight': 0.45, 'label': 0.45}), ('eating', 'eating', {'weight': 0.45, 'label': 0.45}), ('eating', 'pooping', {'weight': 0.1, 'label': 0.1}), ('pooping', 'sleeping', {'weight': 0.45, 'label': 0.45}), ('pooping', 'eating', {'weight': 0.25, 'label': 0.25}), ('pooping', 'pooping', {'weight': 0.3, 'label': 0.3})])


# Follow me for more useful resources @

https://www.linkedin.com/in/piyushpathak03/
    
https://anirudhrapathak3.wixsite.com/piyush