Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Demo of Q learning and Sarsa-lambda learing
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
REINFORCEMENT LEARNING DEMO This is a short demo of two different reinforcement learning algorithms in a simple maze world. The red squares give negative reward (-1) and the green squares give positive reward (+3). The goal of the agents is to find an optimal policy. The left grid shows a Q-learning agent and the right shows a SARSA-lambda agent. This demo was put together for a short presentation I did on reinforcement learning. BUILDING There is a SConstruct file so this can be built using Scons. If you have Scons installed, simply type scons This demo requires boost libraries, SDL, and OpenGL be installed. RUNNING Simply execute the rl_demo file after building. The demo will run in fullscreen mode. To exit, you can press Q or <esc>. BUGS I am aware of none, though I have only tested this on a couple of machines, all with Ubuntu 10.10 as the OS. If you find bugs, let me know. There are many additional features that could be added, and certainly improvements can be made. I wrote this as a simple demo for a single presentation, but have published it in the hopes that someone else might find it useful. If you do, or have suggestions for improvements, let me know. Here are a few I can think of: 1) Add command line options for parameters of the maze problem. E.G. size, number of red squares, number of green squares, stochasticity of actions 2) Add text to the display that gives info about the algorithm. E.G. the name of the algorithm, the average reward earned, etc. One issue is that occasionally the randomly generated maze produces a maze which is too simple. For example, a green square is adjacent to the start position, so the optimal policy is learned immediately and nothing interesting happens. It would be nice to to either avoid these mazes altogether or allow the user to regenerate mazes without restarting.