Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Using Reinforcement Learning to solve mazes of arbitrary sizes. Use the variables mx and my in the Python code to configure the mazes' size in blocks and use img_dx and img_dy to specify the amount of pixels of the output image. Here is an example maze featuring 40x40 blocks in 500x500 pixels:

Maze Sample Image

Main Files

  • Generates a random maze and then uses a Uniform Random Walk Policy and the Iterative Policy Evaluation algorithm to solve the maze. The empty maze and the solution is stored as a result a series of PNG images.
  • Generates a random maze and then uses Q-Learning to solve the maze. This version of the solution also utilizes Python's multi-processing capabilities to speed-up image saving. The results are a numbered series of PNG files. Assumptions to make it more "realistic" than the "godmode" dynamic programming version that you find in the other source

Helper Files

  • Helper class taken from the Udemy RL course: It represents a Grid World consisting basically of two dictionaries: one containing all possible actions at each position of the grid and one containing the rewards used for the RL algorithm.
  • A set of helper functions to generate random mazes in the form of two dimensional lists and to save them as PNG files.

How to Create Animated Mazes

The animated mazes shown at have been generated using The output are many PNG files named mrunner_2_*.png. If you have ImageMagick installed, you can use the following commands to create an Animated GIF:

convert -loop 0 -delay 10 mrunner_2_*.png animated.gif
rm *.png

The more advanced could also be used to generate animated mazes. But by default, it is generating "heatmaps" of the Q-Learning's "learning walk" instead. If you want to generate animated mazes instead, then this line of code has to be changed: Move it inside the for-loop and use i instead of episode as format variable.

You can’t perform that action at this time.