Learning Intention Reconsideration Strategies using Reinforcement Learning on Markov Decision Processes
Java
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
config
images
lib
src
.gitignore
LICENSE
README.md
worldID.txt

README.md

mdp-plan-revision

Read a more detailed description of the conceptual underpinnings and experimental results in the following paper:

Intention Reconsideration as Metareasoning (Marc van Zee, Thomas Icard), In Bounded Optimality and Rational Metareasoning NIPS 2015 Workshop, 2015.

Summary

This project implements an agent that is situated on a Markov Decision Process (MDP).

A Markov Decision Process in this software

The agent is able to compute the optimal policy through Value Iteration.

Optimal policy (green arrows) computing using value iteration

The MDP is changing over time, and the agent can respond to this change by either acting (i.e. executing the optimal action according to its current policy) or thinking (i.e. computing a new policy). The task is to learn the best meta-reasoning strategy, i.e. deciding when to think or act, based on the characteristics of the envrionment.

This general setup is quite complex, so we have simplified the environment (i.e. the MDP) to the TIleworld environment. This consists of an agent that is situated on a grid. It can move up, down, left, or right and has to fill holes, which means it has to reach specific states in the grid. It cannot move through obstacles.

Tileworld in MDP representation

In the above screenshot, the agent is represented by a yellow state, obstacles are represented by black states, goal states by larger green states, and "normal" grid states by red states. In order to simplify the Tileworld visualization, we have developed an alternative one:

Tileworld in simplified representation

Note that this is still an MDP: We have only simplified the visualization. This allows us to visualize larger Tileworld scenarios easily:

Tileworld in simplified representation

We then develop several metareasoning strategies that the agent can use. Read a more detailed description of the conceptual underpinnings and experimental results in our paper:

Intention Reconsideration as Metareasoning (Marc van Zee, Thomas Icard), In Bounded Optimality and Rational Metareasoning NIPS 2015 Workshop, 2015.