Skip to content
Demo of Q learning and Sarsa-lambda learing
C++ C
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
COPYING
README
SConstruct
discrete.cpp
discrete.h
maze.cpp
maze.h
maze_app.cpp
maze_app.h
q_agent.cpp
q_agent.h
rl_agent.cpp
rl_agent.h
rl_demo.cpp
sarsa_lambda_agent.cpp
sarsa_lambda_agent.h
sdl_app.cpp
sdl_app.h

README

													REINFORCEMENT LEARNING DEMO

This is a short demo of two different reinforcement learning algorithms in a
simple maze world.  The red squares give negative reward (-1) and the green
squares give positive reward (+3).  The goal of the agents is to find an optimal
policy.  The left grid shows a Q-learning agent and the right shows a
SARSA-lambda agent.  This demo was put together for a short presentation I did
on reinforcement learning.

BUILDING

There is a SConstruct file so this can be built using Scons.  If you have Scons
installed, simply type 

scons

This demo requires boost libraries, SDL, and OpenGL be installed.  


RUNNING

Simply execute the rl_demo file after building.  The demo will run in fullscreen
mode.  To exit, you can press Q or <esc>.  


BUGS

I am aware of none, though I have only tested this on a couple of machines, all
with Ubuntu 10.10 as the OS.  If you find bugs, let me know.  There are many
additional features that could be added, and certainly improvements can be made.
I wrote this as a simple demo for a single presentation, but have published it
in the hopes that someone else might find it useful.  If you do, or have
suggestions for improvements, let me know.

Here are a few I can think of:

	1) Add command line options for parameters of the maze problem.  E.G. size,
	number of red squares, number of green squares, stochasticity of actions

	2) Add text to the display that gives info about the algorithm.  E.G. the name
	of the algorithm, the average reward earned, etc.

One issue is that occasionally the randomly generated maze produces a maze which
is too simple.  For example, a green square is adjacent to the start position,
so the optimal policy is learned immediately and nothing interesting happens.
It would be nice to to either avoid these mazes altogether or allow the user to
regenerate mazes without restarting.
You can’t perform that action at this time.