Some footage of the self driving car agent.
The agent was trained by sampling from an experience replay buffer which it populated by running simulations.
In particular, to fill the replay buffer with a variety of experiences, exploration was linearly decreasing from
The agent had as an input a
There was a total upperbound reward of
Some modifications included changing the number of sensors (from 5, 11, to 21), changing the length of the sensors (100, 150, 200, 300), changing the steering movement of the car (from steering the car in a certain angle versus moving a fixed distance right, left, or up versus continuously moving left, right, up) and the depth of the network approximating the Q function. Moreover, elements from imitation learning was used- in particular, human "expert" experience was recorded and stored as an array which was added to the experience replay buffer. Both replay buffer spiking (adding the human experience into the buffer and then filling the rest with the agent's own experiences) and pre-training on the human experience to have the q-values update in accordance with human experience was tried.
Moreover, a greedy agent hardcoded with if/else rules was used as a baseline.
Code references: