Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Q-learning implementation for Machine Learning Course

branch: master

Lists showing in README

latest commit 8a6d267189
Mihai Maruseac authored April 19, 2011
Octocat-spinner-32 res Changed name of img folder April 10, 2011
Octocat-spinner-32 src Done April 10, 2011
Octocat-spinner-32 test New test April 10, 2011
Octocat-spinner-32 README.ro Small change in README April 10, 2011
Octocat-spinner-32 README.rst Lists showing in README April 19, 2011
Octocat-spinner-32 cmp.sh Done April 10, 2011
Octocat-spinner-32 ql.py Renamed main.py April 10, 2011
README.rst

Reinforcement learning

A. About

This is a homework for the Machine Learning Course that I am taking right now at my University.

The homework simulates a robot in a rectangular grid learning to navigate within a certain area on the the grid. The robot has only 4 sensors describing his position as values for the distances between his position and the walls. The sensors are limited, they can provide a maximum distance of D.

The robot most navigate in a trigonometric orientation in a small corridor contained between d1 and d2 from the nearest wall. In order to learn this, it should receive various rewards depending on his position.

The assignment is done in Python.

B. Usage

Run ./ql to start the main part of the application. The GUI is pretty simple to use.

The plot shows a red point for the 0 value and a green dot for the current total reward that the robot has gained in the current simulation epoch.

Save plots if you want to compare the effect of the parameters or the differences between SARSA and Q-learning or ε-greedy and softmax action selection.

You can compare them by using:

./ql.py cmp file1 file2 ...

where each file is a saved plot. This will output a simple table with all data.

If you want to see a nice plot use the cmp.sh script:

./cmp.sh out_file file1 file1_title file2 file2_title ...

C. Some implementation details

The robot and the world are separated in two different classes. The rewards are given according to the sensor values.

At each position, the robot must choose between going forward or turning left or right. Each selection is based on the rewards given until that point.

The rewards are given like this:

  • -100 if closer than d1 or facing the exterior of the grid
  • -50 if farther than d2
  • a value between -15 and 5 depending on the desired orientation
Something went wrong with that request. Please try again.