Skip to content

sid-thiru/Markov-Decision-Process

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MARKOV DECISION PROCESS - Finding the optimal policy

  • The aim here is to implement an MDP planning algorithm. The class accepts an MDP object, and returns the optimal policy for the given MDP object.
  • An MDP object consists of the set of states, set of actions, rewards and the transition probabilities.
  • The approach used to find the optimal policy is value iteration. The underlying principle behind value iteration is: the utility of being in a state is the immediate utility of that state plus the expected reward from moving to the adjacent states (assuming that an optimal policy is used in making moves) The utilities are calculated using the Bellman equation: 𝑈(𝑠)=𝑅(𝑠)+ 𝛾maxΣ𝑃(𝑠′|𝑠,𝑎)𝑈(𝑠′)𝑠′
  • These utilities are iteratively updated until the change drops below a certain threshold.

solver.py: Contains code for finding the optimal policy

test.py: Contains code to create MDP objects to test the solver on

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages