Skip to content

Value Iteration algorithm for Markov Decision Processes. The value iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation

License

Notifications You must be signed in to change notification settings

MarSH-Up/MDPs_Value-Iteration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 

Repository files navigation

⛰ Markov Decision Process' Value Iteration Algorithm

The content of this repository served as an assignment project requested for the course Probabilistic Graphical Models at the INAOE as a student of the Master in Science in Computer Science. All the resources presented in the versions of this code were obtained from the class book that you can find in the references part.

This application of the algorithm and information was for an only educational purpose

Description:

Implement the value iteration algorithm to solve a discrete Markov Decision Processes.

Professor:

Student Involved:

Instructions

  1. Download the repository's file
  2. Verify that the C++ version is at least C++ 14
  3. Call the functions marked in the documentation

The following algorithms are based on the documentation provided by the professor. The book used as a reference is at the end of this file.

  1. The value iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation. The next image shows the pseudocode used to create this project.
  1. The Policy iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation, with the main difference we store the Policy in each iteration, it would allow us to compare an iteration (t) with a (t-1), then if the Policy is the same we finish the process, this gives you a computational speed advantage at storage cost. The image 2 shows the pseudocode used to create this project.

Examples The class need to be call as the figure indicates:

  • Object Creation

We used two examples to confirm the algorithm's functionality, called "The robot path", from the book, and "The bear travel" from an online blog called Towards Data Science (Link in the references)

  • The robot path

  • The bear travel

  • Let's start with the robot path example, consider figure 1 as a grid to complete, our code needs some parameters defined in the description, so the next image shows what we mean.

    • First, consider that the enumeration of the states follows:
      • States
    • We pass the parameters as a matrixes, as you can see in the next figure:
      • Parameters Descriptions
    • Then running the fuctions you would see the following results:
    • Policy_Iteration
    • Policy
    • So then, you would follow the next path
    • Policy
  • Let's solve now the second example, now we are just going to show the images of each fuction used and results:

    • States
    • Parameters Descriptions
    • Policy_Iteration
    • Policy_Iteration

#References

About

Value Iteration algorithm for Markov Decision Processes. The value iteration algorithm consists of iteratively estimate the value for each state, s, based on Bellman's equation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages