Skip to content
kristjankorjus edited this page Nov 25, 2014 · 59 revisions

Table of Contents


Initial ideas

Overall algorithm is written down to a file algorithm.m (in semi-pseudocode)


Linux (or Mac)


  • We use Python 2.7 and 32 bit version
  • Requirements:
    • Pillow (image processing library), NumPy and SciPy
      • sudo apt-get install python-pil python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
    • Theano

Atari framework

  • Libraries needed:
    • sudo apt-get install libsdl-gfx1.2-dev libsdl-image1.2-dev libsdl1.2-dev
    • sudo apt-get install imagemagick (might be needed for .show() function)
  • Installing Arcade Learning Environment (ALE)
    • Run ./ from root directory of the repository
    • ALE will be compiled in ./libraries/ale
    • The ALE executable is: ./libraries/ale/ale
    • ROMs are stored under ./libraries/ale/roms

Installing cuda-convnet2 (Convnet Branch)

The convnet branch makes use of the cuda-convnet2 code written by Aleksey Akrizhevsky. To run this code you MUST have a NVIDIA cuda-capable GPU of compute capability 3.5. The convnet page says it has to be at least 3.5 and we have found that 5.0 (Maxwell architecture) does not work (the bug is being fixed by NVIDA as we speak).

To begin with, you need to set up your computer to use the graphics card. Download an install Cuda Toolkit and driver for the GPU. We have had sucess with Toolkit 6.0 ( We strongly advise to follow the installation instructions (as well as pre-intallation and post-installation) in Getting Started Guide (

Once you have succeeded in doing a "hello_world" in cuda, you can move on to installing convnet2. Download and install the dependencies and git-checkout the code as described in

Change the environment variables in the file in the main directory - normally you just have to change the location of your Cuda installation. Then run "./build sh" and the convnet2 should compile in a few minutes. Nevertheless, this is not enough- to run our code, you will need to tweak the convnet2 code a bit. This will be described in the following section.

Modifying cuda-convnet2 code(Convnet Branch)

It sems that certain aspects of cuda-convnet2 code are designed to work with images that have either 1 input channel (grayscale) or 3 channels (RGB). We, however, have 4 input channels (the 4 frames). To accomodate our case without getting assertion errors, one needs to change the file at:


In line 2023 replace:
numImgColors < = 3 with numImgColors < = 4

and in line 2059 replace:
if (numFilterColors > 4) with if (numFilterColors > 4)

This should help the system deal with 4 input channels. After making the modifications, recompile the program by doing "./" in the "cuda-convnet2/" folder

Comments about the algorithm


  1. We convert Atari NTSC 128 palette colors to RGB using this table. (might be different in the original paper)
  2. We DO NOT use formula 0.21*R + 0.71*G + 0.07*B to convert RGB to grayscale but 0.5*R + 0.5*G + 0.5*B. (link) (might be different in the original paper)

Basic tests to the system

  • Issue #7 (make sure that learning changes weight values at all):
    make sure that learning_rate > 0 and uncomment the last line in "NeuralNet.train()", that prints some parameters after every training event
  • Issue #10 (make sure that for different input we get different output):
    easiest way to make sure of this (without writing a specific function) is to set learning_rate = 0.0, so the network weights do not change (you can verify that as in Issue #7) and uncomment line print "estimated q", estimated_Q in "NeuralNet.train()". Starting from second frame, when we already have two different images to compare, the Q-values estimated during minibatch-training should have more than 1 different value (at second frame they have 2 possible values, at 3rd step 3 possible values)

Possible differences from the original article

  • Preprocessing: color to grayscale
  • Gradient descent learning rate
  • Gradient descent regularisation: none, L1, L2 or both
  • Momentum in RMSProp
  • Initialization of the network: weights (mean, std) + biases
  • What to do with initial frames which do not have previous memory?
  • Death has no penalty?
  • Implementation of the error: we have zeros in all non-taken actions

Random comments

Clone this wiki locally
You can’t perform that action at this time.