# The Maze Environment for Search # The Maze Environment is a 2-D grid world embedded in a 3-D simulation. Grid locations correspond to potential intersections in the maze; the maze is randomly generated and contains a single shortest path to the goal, no loops. Various search and learning agents are implemented in the Maze environment, as well as a first-person search setting.

What the display means

Search experiments

Red cube - goal position (opposite of the starting position)
Yellow marker - next location the agent is going to
Blue marker - past locations the agent has already expanded, i.e. whose successors have already been generated
Green markers - generated (but not yet expanded) locations the agent may return to
White markers - found path

Reinforcement learning experiments

Yellow/Green cube - location of a (discrete) state. Green cubes represent states which have a non-zero Q-value. When a yellow cube turns green, it means the agent knows of a path from that location to the goal.
Blue cube - Q-value of the action in that state (direction from the yellow cube identifies the direction of the action (NSWE), and distance from the yellow cube corresponds to the actual Q-value: the further the blue cube is, the higher the corresponding Q-value

Controls

The controls can be easily redefined, but in general, the following keys should work:

F1 - help (opens the browser to show this page)
A - move camera left
D - move camera right
W - move camera forward
S - move camera back
Q - pan camera left
E - pan camera right
R - tilt camera up
F - tilt camera down
space bar - recenter camera to origin
ESC - exit the currently running mod
Mouse Scroll - zoom in or zoom out
Z - zoom in
C - zoom out

User Interface

The pull-down menu lists the different types of agents available for the Maze:

Search experiments

Depth First Search - starts the depth first search agent
Breadth First Search - starts the breadth first search agent
A* search with three different types of visualizations
- Single Agent A* Search - the agent has to navigate through the maze both to make progress and to back-track to move on
- Teleporting A* Search - the agent can search for solutions faster by teleporting to the next open node instead of having to backtrack
- Front A* Search - the agent now has the ability to produce several new agents when faced with different alternatives. These agents are marking the front of the search.
First Person Control - use the arrow keys to try to solve the maze yourself!

Reinforcement learning experiments

Q-Learning, coarse and fine - the agent learns from reinforcement signal using the off-policy learning algorithm (tabular, no function approximation; the coarse version is based on a 8x8 and the fine version on a 64x64 location table).
First Person Control, coarse and fine - use the arrow keys to try to solve the maze yourself! The coarse version corresponds to the search agents and the coarse Q-learning agent, and the fine version corresponds to the fine Q-learning agent.

Additional Controls

The control panel includes additional controls:

The Exploit/Explore Slider - this is applicable only to the learning methods such as Sarsa and Q-Learning. Because these methods start out knowing nothing about the best actions and learn from experience, they face an exploration-exploitation trade-off during learning, where they have to decide how much of the time to do the best thing they know how to do (exploit) and how much of the time to try to seek new experience (explore). The exploit/explore slider lets you make this decision for them - side it to the right to encourage exploration and slide it to the left to see what the best learned policy so far looks like. The slider will appear as soon as you start running Q-learning.
The Speedup Slider - this slider controls another tradeoff: one between displaying the simulation slowly enough to see robot animations and movements from cell to cell, and as quickly as the computer running OpenNERO can handle it. The speedup slider is again particularly useful when running the learning agents, because they may require a large amount of experience before finding the optimal path through the maze. To progress through the learning faster, slide the Speedup slider to the right.
The Starting Offset Slider - By default, the search agent starts at the lower left corner of the maze. Using this slider, you can have the agent start at a location that is closer to the target. For example, if you set the starting offset to 8, the agent will start at a random cell whose Manhattan distance from the lower left corner is equal to 8.
Generate New Maze Button - this button allows you to mix things up by generating a new random maze. Some mazes take longer than others, and some are more suited to particular search techniques.
Pause/Continue - Pause will temporarily suspend the execution of the algorithm; the button changes to Continue and hitting it will resume execution.
Start/Reset - Start will begin running the selected algorithm; the button changes to Reset and hitting it will terminate the algorithm.
Help - will get you to this page.