retro-agents

Overview

This is a repository containing agents used for OpenAI's Retro Contest. The competition was to create the best agent for Sonic the Hedgehog games using only 1 million timesteps and 12 hours of time on a VM with 6 E5-2690v3 cores, 56GB of RAM, and a single K80 GPU.

Below is a video of my agent on a custom stage that finished training. You can see that the agent found a glitch in the stage and used it to its favor.

Ideas

For a more detailed discussion of ideas used, please check the writeup.

Tested Ideas

These are ideas that have been implemented and tested. You can read the results here.

Sampling

Dual Sampling: Some uniformly, Some with priority

Collection

Threshold with Buffer TD-error Average
Threshold with Overall TD-error Average
Threshold with Average of Buffer and Overall TD-error Average
Threshold with Buffer TD-error Exponential Average
Stochastic Threshold with Buffer TD-error Average
Threshold with Minimum TD-error in Buffer
Stochastic Threshold with Maximum TD-error
Stochastic Threshold with (Maximum - Minimum) TD-error

Deletion

Minimum TD-error Deletion
Stochastic TD-error Deletion
TD-error Delta Deletion
Stochastic Deletion with TD-error deltas

Miscellaneous

Rainbow DQN without Dueling
Double Replay Memory: 1 Short Term (Episodic) Replay Memory, 1 Long Term (Lasting) Replay Memory
No Minimum Buffer Size

Untested Ideas

These are ideas I could not test because there was not enough time to implement.

Sampling

Double Sampling: Two filters for sampling
URB variants of tested ideas (for comparison)
Variants of TD-error prioritization shown in Prioritized Experience Replay paper

Collection

Threshold with Decaying Buffer TD-error Average
Overall TD-error Exponential Average
Non TD-error prioritization shown in Prioritized Experience Replay paper

Deletion

Uniform Deletion
Staleness penalty for Deletion
No deletion: Flush Replay Buffer periodically
Sigmoid Stochastic Deletion with TD-error deltas

Miscellaneous

Encode states and compute similarities between states in replay memory
Double Replay Memory: 2 Long Term Replay Memory
Using JERK to fill replay memory of Rainbow DQN
Periodic Recalculation of all errors in replay memory
Hyperparameter Tuning
Vary hyperparameters for parts of training phase
Cyclic NoisyNet: Add additional noise periodically
Subtract time from reward in early training phase to penalize wasting timesteps

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
agents		agents
results		results
.gitignore		.gitignore
README.md		README.md
video.gif		video.gif
video.mp4		video.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

retro-agents

Overview

Ideas

Tested Ideas

Sampling

Collection

Deletion

Miscellaneous

Untested Ideas

Sampling

Collection

Deletion

Miscellaneous

About

Releases

Packages

Languages

seungjaeryanlee/retro-agents

Folders and files

Latest commit

History

Repository files navigation

retro-agents

Overview

Ideas

Tested Ideas

Sampling

Collection

Deletion

Miscellaneous

Untested Ideas

Sampling

Collection

Deletion

Miscellaneous

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages