MDP Minimization

This repository contains algorithms for minimizing Markove Decision Processes (MDP) given a fully-specified deterministic MDP as an input. The minimal MDP and the original MDP can be equivalent either under the notion of bisimulation or MDP homomorphism.

Environments

Two example environments are included.

Pick

A simple environment with 5 states and 4 actions. The goal is to pick the puck in the 2x2 grid. The states are illustrated above and the actions are simply PICK for each of the 4 possible positions in the grid. If the PICK action is executed in the coordinate where the puck is, the environment transitions into state 5 and reward 1 is award. Otherwise, the environment stays in the same state and the reward is 0.

Redundant Pick

This environment is the same as the Pick environment except there are 4 redundant states 11, 12, 13, and 14 that behave in the same way as states 1, 2, 3 and 4.

Minimization

Bisimulation

Bisimulation (Givan 2012) considers two states to be equivalent if they, roughly speaking, behave the same given an arbitrary action.

Pick:

Bisimulation partitions each state separately, the Pick environment is already minimal.

state partition = {{1}, {2}, {3}, {4}, {5}}

Redundant Pick:

Bisimulation does not offer much help with abstracting states, but it can find redundant states:

state partition = {{1, 11}, {2, 12}, {3, 13}, {4, 14}, {5}}

Homomorphism

MDP homomorphism (Ravindran 2014) is more lenient than bisimulation because two states can be equivalent even if they do not behave the same given some action, but we need to find a mapping between actions of equivalent states.

Pick:

Homomorphism partitions the state space in a more useful way:

state partition = {{1, 2, 3, 4}, {5}}

Redundant Pick:

state partition = {{1, 2, 3, 4, 11, 12, 13, 14}, {5}}

Iterative Homomorphism

There are cases when we want to construct the minimal MDP even if we do not have a full specification of the environment. To test this, I started with an empty MDP and iteratively added new entries to the transition function (P) and the reward function (R) until I got the original MDP. At each step, I ran the homomorphism algorithm on the partial MDP.

Find the results in results/pick_homomorphism_iterative.txt.

Setup

Install Python 3.

Usage

# Minimize Pick using bisimulation
python -m scripts.bisim.pick

# Minimize Redundant Pick using bisimulation
python -m scripts.bisim.redundant_pick

# Minimize Pick using MDP homomorphism
python -m scripts.homo.pick

# Minimize Redundant Pick using homomorphism
python -m scripts.homo.redundant_pick

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
algorithms		algorithms
envs		envs
images		images
results		results
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
evaluation.py		evaluation.py
log_utils.py		log_utils.py
model_utils.py		model_utils.py
requirements.txt		requirements.txt
vis_utils.py		vis_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDP Minimization

Environments

Pick

Redundant Pick

Minimization

Bisimulation

Homomorphism

Iterative Homomorphism

Setup

Usage

About

Releases

Packages

Languages

ondrejbiza/mdp_abstraction

Folders and files

Latest commit

History

Repository files navigation

MDP Minimization

Environments

Pick

Redundant Pick

Minimization

Bisimulation

Homomorphism

Iterative Homomorphism

Setup

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages