-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
48 lines (32 loc) · 1.85 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
REINFORCEMENT LEARNING DEMO
This is a short demo of two different reinforcement learning algorithms in a
simple maze world. The red squares give negative reward (-1) and the green
squares give positive reward (+3). The goal of the agents is to find an optimal
policy. The left grid shows a Q-learning agent and the right shows a
SARSA-lambda agent. This demo was put together for a short presentation I did
on reinforcement learning.
BUILDING
There is a SConstruct file so this can be built using Scons. If you have Scons
installed, simply type
scons
This demo requires boost libraries, SDL, and OpenGL be installed.
RUNNING
Simply execute the rl_demo file after building. The demo will run in fullscreen
mode. To exit, you can press Q or <esc>.
BUGS
I am aware of none, though I have only tested this on a couple of machines, all
with Ubuntu 10.10 as the OS. If you find bugs, let me know. There are many
additional features that could be added, and certainly improvements can be made.
I wrote this as a simple demo for a single presentation, but have published it
in the hopes that someone else might find it useful. If you do, or have
suggestions for improvements, let me know.
Here are a few I can think of:
1) Add command line options for parameters of the maze problem. E.G. size,
number of red squares, number of green squares, stochasticity of actions
2) Add text to the display that gives info about the algorithm. E.G. the name
of the algorithm, the average reward earned, etc.
One issue is that occasionally the randomly generated maze produces a maze which
is too simple. For example, a green square is adjacent to the start position,
so the optimal policy is learned immediately and nothing interesting happens.
It would be nice to to either avoid these mazes altogether or allow the user to
regenerate mazes without restarting.