Skip to content

Python implementation of the Value Iteration algorithm for Markox Decision Process optimal policy computing. This repo was made for a reinforcement learning course at ENSTA ParisTech

Notifications You must be signed in to change notification settings

glimow/python_value_iteration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ROB311: Utility function of simple MDP

This repository contains an MDP Utility function for ROB311's project at ENSTA ParisTech. It is separated into two files:

  • value_iteration.py, that contains a quickly unit-tested implementation of the Value Iteration Algorithm. it is heavily inspired from the one in Russel and Norvig's AI, a modern approach chapter 17, but with a tweak in the while loop condition to match the course's one. It can be used to compute any MDP's Utility function as long as it's transition matrices are availables.
  • test.py contains a test of our Value Iteration Algorithm used on the 2x3 MDP problem seen in class.

Installation and usage

To install numpy with pip:

pip3 install -r requirements.txt

To run unit tests:

python3 value_iteration.py

To run TP2's 3x2 problem:

python3 test.py

Questions

  1. It takes 6010 Iterations for the utility to converge with gamma=0.999 and threshold=0.01
  2. with gamma=0.1, it takes only 4 iterations to converge. We observe that the found policy is the same.

About

Python implementation of the Value Iteration algorithm for Markox Decision Process optimal policy computing. This repo was made for a reinforcement learning course at ENSTA ParisTech

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages