Human-level control through deep reinforcement learning

This repository implements the notable paper: Human-level control through deep reinforcement learning.

This paper is widely known for a famous video clip, which surpasses human's playing by a large gap. The paper uses deep neural networks to map from complex visual information to optimal actions, known as Deep Q network.

Features

Employed TensorFlow 2 with performance optimization
Simple structure
Easy to reproduce

Model Structure

Requirements

Default running environment is assumed to be CPU-ONLY. If you want to run this repo on GPU machine, just replace tensorflow to tensorflow-gpu in package lists.

How to install

`virtualenv`

$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

How to run

You can run Atari 2600 game with main.py. Running environment needs to be NoFrameskip from gym package.

$ python main.py --help
usage: main.py [-h] [--env ENV] [--train] [--play PLAY]
               [--log_interval LOG_INTERVAL]
               [--save_weight_interval SAVE_WEIGHT_INTERVAL]

Atari: DQN
optional arguments:
  -h, --help            show this help message and exit
  --env ENV             Should be NoFrameskip environment
  --train               Train agent with given environment
  --play PLAY           Play with a given weight directory
  --log_interval LOG_INTERVAL
                        Interval of logging stdout
  --save_weight_interval SAVE_WEIGHT_INTERVAL
                        Interval of saving weights

Example 1: Train BreakoutNoFrameskip-v4

$ python main.py --env BreakoutNoFrameskip-v4 --train

Example 2: Play PongNoFrameskip-v4 with trained weights

$ python main.py --env PongNoFrameskip-v4 --play ./log/[LOGDIR]/weights

Example 3: Control log & save interval

$ python main.py --env BreakoutNoFrameskip-v4 --train --log_interval 100 --save_weight_interval 1000

Results

This implementation is guaranteed to work well for Atlantis, Boxing, Breakout and Pong. Tensorboard summary is located at ./archive. Tensorboard will show following information:

Average Q value
Epsilon (for exploration)
Latest 100 avg reward (clipped)
Loss
Reward (clipped)
Test score
Total frames

$ tensorboard --logdir=./archive/

Single RTX 2080 Ti is used for the results below. (Thanks to @JKeun for allowing his computation resources)

Atalntis

Boxing

Breakout

Pong

BibTeX

@article{mnih2015humanlevel,
  added-at = {2015-08-26T14:46:40.000+0200},
  author = {Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A. and Veness, Joel and Bellemare, Marc G. and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K. and Ostrovski, Georg and Petersen, Stig and Beattie, Charles and Sadik, Amir and Antonoglou, Ioannis and King, Helen and Kumaran, Dharshan and Wierstra, Daan and Legg, Shane and Hassabis, Demis},
  biburl = {https://www.bibsonomy.org/bibtex/2fb15f4471c81dc2b9edf2304cb2f7083/hotho},
  description = {Human-level control through deep reinforcement learning - nature14236.pdf},
  interhash = {eac59980357d99db87b341b61ef6645f},
  intrahash = {fb15f4471c81dc2b9edf2304cb2f7083},
  issn = {00280836},
  journal = {Nature},
  keywords = {deep learning toread},
  month = feb,
  number = 7540,
  pages = {529--533},
  publisher = {Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.},
  timestamp = {2015-08-26T14:46:40.000+0200},
  title = {Human-level control through deep reinforcement learning},
  url = {http://dx.doi.org/10.1038/nature14236},
  volume = 518,
  year = 2015
}

Author

Jihoon Kim (@jihoonerd)

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
archive		archive
assets		assets
dqn		dqn
paper		paper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

jihoonerd/Human-level-control-through-deep-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Human-level control through deep reinforcement learning

Features

Model Structure

Requirements

How to install

virtualenv

How to run

Example 1: Train BreakoutNoFrameskip-v4

Example 2: Play PongNoFrameskip-v4 with trained weights

Example 3: Control log & save interval

Results

Atalntis

Boxing

Breakout

Pong

BibTeX

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

`virtualenv`