GitHub - jbryan/rl_demo: Demo of Q learning and Sarsa-lambda learing

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
COPYING		COPYING
README		README
SConstruct		SConstruct
discrete.cpp		discrete.cpp
discrete.h		discrete.h
maze.cpp		maze.cpp
maze.h		maze.h
maze_app.cpp		maze_app.cpp
maze_app.h		maze_app.h
q_agent.cpp		q_agent.cpp
q_agent.h		q_agent.h
rl_agent.cpp		rl_agent.cpp
rl_agent.h		rl_agent.h
rl_demo.cpp		rl_demo.cpp
sarsa_lambda_agent.cpp		sarsa_lambda_agent.cpp
sarsa_lambda_agent.h		sarsa_lambda_agent.h
sdl_app.cpp		sdl_app.cpp
sdl_app.h		sdl_app.h

Repository files navigation

													REINFORCEMENT LEARNING DEMO

This is a short demo of two different reinforcement learning algorithms in a
simple maze world.  The red squares give negative reward (-1) and the green
squares give positive reward (+3).  The goal of the agents is to find an optimal
policy.  The left grid shows a Q-learning agent and the right shows a
SARSA-lambda agent.  This demo was put together for a short presentation I did
on reinforcement learning.

BUILDING

There is a SConstruct file so this can be built using Scons.  If you have Scons
installed, simply type 

scons

This demo requires boost libraries, SDL, and OpenGL be installed.  


RUNNING

Simply execute the rl_demo file after building.  The demo will run in fullscreen
mode.  To exit, you can press Q or <esc>.  


BUGS

I am aware of none, though I have only tested this on a couple of machines, all
with Ubuntu 10.10 as the OS.  If you find bugs, let me know.  There are many
additional features that could be added, and certainly improvements can be made.
I wrote this as a simple demo for a single presentation, but have published it
in the hopes that someone else might find it useful.  If you do, or have
suggestions for improvements, let me know.

Here are a few I can think of:

	1) Add command line options for parameters of the maze problem.  E.G. size,
	number of red squares, number of green squares, stochasticity of actions

	2) Add text to the display that gives info about the algorithm.  E.G. the name
	of the algorithm, the average reward earned, etc.

One issue is that occasionally the randomly generated maze produces a maze which
is too simple.  For example, a green square is adjacent to the start position,
so the optimal policy is learned immediately and nothing interesting happens.
It would be nice to to either avoid these mazes altogether or allow the user to
regenerate mazes without restarting.