Skip to content

Functional Approximation based Q-learning agent for playing the Enduro Atari car game, part of my MSc in Machine Learning and Neuroinformatics (Python 2.7)

Notifications You must be signed in to change notification settings

jameswilsenach/RL_Enduro

Repository files navigation

Notes:

This repository is an archive of my Python 2.7 implementation of a Q-learning-based learning agent for the Enduro Atari car game. In this assignment, students were required to implement a learning rule and a linear functional approximation to the "Q-function". The notebook, Functional_Approximation_QAgent_Plots.ipynb shows the per training episode outcomes of the linear approximator. In general this was highly stochastic and more effective strategies that covered more of parameter space should be employed, such as manifold learning or possibly even an adaptation of AlphaGo's algorithm to this context by racing the car against multiple copies of itself.

Reinforcement Learning: Coursework

This repository contains the same agent interface for playing the Enduro game as the one you used in coursework 1 an updated version can be found here, however the sensing capabilities of the agent have been extended. Instead of just sensing the environment grid, the agent can now sense the road and the others in pixel coordinates as well as its own speed. The main difference is in the sense function which now has the following prototype:

def sense(self, road, cars, speed, grid)

These new sensory signals will help you quickly construct more complex state spaces, compared to the ones based only on the environment grid, which you may need for the function approximation based agent.

Road grid

The road grid is 2-dimensional array which contains [x, y] points in pixel coordinates corresponding to the corners of the road cells used to construct the environment gird. Those are the cooridnates used to draw the white grid on top of the road in the game frames. There are 11x10 cells in the environment grid and so there are 12x11 points stored in the road grid. The first dimension of the road grid corresponds to the horizontal lines while the second dimension correspondes to the intersection points along a horizontal line. Thus, if you would like to access the pixel cooridinates of the top left corner of the furhtest leftmost road cell you would have to access road[0][0].

Cars

The cars argument is a dictionary which contains two keys 'self' and 'others'. cars['self'] returns a rectangle as a tuple (x, y, w, h) which represents the agent location and size in the game frame. x, y are the top-left corner pixel coordinates of the rectangle and its size is w, h. cars['self'] is visualised as the green rectangle overlayed on the game frame. cars['others'] is a list of tuples which countains the same information for each opponent present. If there are no opponents on the road, then cars['others'] is an empty list. The information in cars['others'] is visualised as red rectangles around the opponent.

Speed

This is a single scalar in the range [-50, 50] which represents the speed of the agent relative to the opponents. Thus -50 means that the agent has just collided and 50 means that the agent is moving as fast as possible.

Environment grid

The grid argument is the same environment grid that you have already used during the first coursework.

About

Functional Approximation based Q-learning agent for playing the Enduro Atari car game, part of my MSc in Machine Learning and Neuroinformatics (Python 2.7)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published