Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

This repo decoments the code used for the experiments presented in the extended abstract Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space presented at RLDM 2022 (see https://arxiv.org/abs/2205.14098 and also https://arxiv.org/abs/2110.07409 for theoretical discussion of the geometry of the optimization problem). This includes an implementation of the presented method for reward optimization in state-action space (ROSA) as well as the two baselines used for comparison.

Overview over the content:

utilities.jl: Contains implementations of basic functions like the reward as well as implementations of the solution of the Bellmann constrained program (BCP) proposed by Amato et. al. (see http://people.csail.mit.edu/camato/publications/OptimalPOMDP-aimath05.pdf) as well as the reward optimization in state-action space (ROSA), both relying on the interior point solver IPOpt (see https://coin-or.github.io/Ipopt/). Further, it contains code for the generation of random solvable mazes (see also https://rosettacode.org/wiki/Maze_generation) as well as code for the automated generation of the transition and observation matrices of the model.
Code for the experiments: The code for the experiments can be found in the julia notebooks ROSA_discount_fixed.ipynb, ROSA_size_fixed.ipynb etc.
Mazes used for computations: The mazes used in the experiments, which where generated by the maze() function provided in utilities.jl can be found in the mazes folder. Here, mazesn.csv contains a list of the solved mazes with 2n^2-1 states.
Code used for plotting: Not provided as for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

Files

README.md

Latest commit

History

README.md

File metadata and controls

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space