Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acer #231

Open
wants to merge 49 commits into
base: master
Choose a base branch
from
Open

Acer #231

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
ce4a56c
__init__ file for vec
random-user-x Jun 19, 2018
9fbb6cb
Added functions to __init__. Derived from Openai baselines
random-user-x Jul 23, 2018
5ecac5a
smnall change
random-user-x Jul 23, 2018
930eb8a
Added common folder
random-user-x Jul 23, 2018
46a9b9d
added tile_iamges
random-user-x Jul 23, 2018
ad5709e
Added basic functions from OPENAI Baselines
random-user-x Jul 23, 2018
878d07a
small changes
random-user-x Jul 23, 2018
fe32696
Added RUnners
random-user-x Jul 24, 2018
a92cf39
Simple Example added.
random-user-x Jul 25, 2018
93fd8f0
Episodic Memory Addedf
random-user-x Jul 26, 2018
004ccda
Merge branch 'master' of https://github.com/keras-rl/keras-rl into HEAD
random-user-x Jul 26, 2018
997f846
Merge branch 'addEpisodicMemory' into synchronousMultiAgent
random-user-x Jul 26, 2018
20001a4
Added Softmax Policy
random-user-x Aug 1, 2018
946641c
focus on ACER
random-user-x Aug 4, 2018
38bbf0a
Added without memory version acer
random-user-x Aug 7, 2018
abd489d
Final version working
random-user-x Aug 8, 2018
6a4ea61
remove unnecessary files
random-user-x Aug 8, 2018
fefa373
Example ACER Working without memory and trust region
random-user-x Aug 8, 2018
94ac539
Working without memory and trust region
random-user-x Aug 8, 2018
c4c3e95
Added ACER
random-user-x Aug 8, 2018
162086f
Added Softmax Policy
random-user-x Aug 8, 2018
ad81d3c
Added trust region
random-user-x Aug 8, 2018
234a318
Single agent version working.
random-user-x Aug 8, 2018
adabca7
Removing previous work
random-user-x Aug 8, 2018
11437e7
Minor
random-user-x Aug 8, 2018
3ed73f7
Minor changes
random-user-x Aug 8, 2018
62c52e8
Minor Changes. Remove render for synchronous agents.
random-user-x Aug 14, 2018
f7aa96c
Add reference
random-user-x Aug 14, 2018
92eaaf7
Small changes
random-user-x Aug 14, 2018
c5b26d8
Add seed to multiprocessing
random-user-x Aug 14, 2018
2a351d6
minor
random-user-x Aug 14, 2018
46553ae
Merge branch 'MPI' into ACER
random-user-x Aug 14, 2018
1d64a48
Merge branch 'master' of https://github.com/keras-rl/keras-rl into ACER
random-user-x Aug 14, 2018
598d6c0
Working
random-user-x Aug 15, 2018
731e66c
Final version of acer working.
random-user-x Aug 15, 2018
9571b6e
Example changed.
random-user-x Aug 15, 2018
dca3257
Readme
random-user-x Aug 15, 2018
6d67c93
Make acer folder for better understanding
random-user-x Aug 15, 2018
8594f88
Added test function support to acer
random-user-x Aug 16, 2018
e6dc145
Make testing fast
random-user-x Aug 16, 2018
a43ace2
Test is fast. Things are working
random-user-x Aug 16, 2018
f9f786f
Style changes
random-user-x Aug 16, 2018
3893133
Style changes
random-user-x Aug 16, 2018
3420feb
Revert changes
random-user-x Aug 17, 2018
ea83149
Refactor codes
random-user-x Aug 17, 2018
d1face0
Metric changed to None
random-user-x Aug 17, 2018
c9f44d5
Change in test model defination. Made a new function
random-user-x Aug 17, 2018
2936f1d
INtroduce argparse
random-user-x Aug 17, 2018
dd0ce06
add nenvs to callbacks
random-user-x Aug 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,17 @@ As of today, the following algorithms have been implemented:
- [x] Cross-Entropy Method (CEM) [[7]](http://learning.mpi-sws.org/mlss2016/slides/2016-MLSS-RL.pdf), [[8]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.81.6579&rep=rep1&type=pdf)
- [x] Dueling network DQN (Dueling DQN) [[9]](https://arxiv.org/abs/1511.06581)
- [x] Deep SARSA [[10]](http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf)
- [x] Sample Efficient Actor-Critic (ACER) [[5]](https://arxiv.org/abs/1611.01224)
- [ ] Asynchronous Advantage Actor-Critic (A3C) [[5]](http://arxiv.org/abs/1602.01783)
- [ ] Proximal Policy Optimization Algorithms (PPO) [[11]](https://arxiv.org/abs/1707.06347)

You can find more information on each agent in the [doc](http://keras-rl.readthedocs.io/en/latest/agents/overview/).

### Note

The current version of ACER supports simple toy games. We are working to add atari support soon.
You can use synchronous environments by using make_gym_env from cmd_utils.py in common folder. Please
have a look at acer_cartpole.py in the examples folder for better understanding.

## Installation

Expand Down
112 changes: 112 additions & 0 deletions examples/acer_cartpole.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
import os
# To use CPU for faster computation
# Remove this, if not needed
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""

import warnings
warnings.filterwarnings("ignore", message="numpy.dtype size changed")
warnings.filterwarnings("ignore", message="numpy.ufunc size changed")

import numpy as np
import gym

from keras import backend as K
from keras.models import Model
from keras.layers import Dense, Activation, Input, ReLU
from keras.optimizers import Adam

from rl.agents import ACERAgent
from rl.agents.acer.episode_memory import EpisodeMemory
from rl.policy import SoftmaxPolicy
from rl.common.cmd_util import make_gym_env
from rl.callbacks import FileLogger, ModelIntervalCheckpoint

# TODO : Add support for atari
# The current implementation supports simple toy games.

ENV_NAME = 'CartPole-v1'

# Define the number or environments and steps
nenvs = 4
nsteps = 50

# make_gym_env : Makes synchronous environments
# make_gym_env only supports actor-critic frameworks

env = make_gym_env(ENV_NAME, nenvs, 123)
np.random.seed(123)
env.seed(123)

# Action is discrete
nb_actions = env.action_space.n
obs_shape = env.observation_space.shape

# Defining model function
def model_fn(inp, name='inputs'):
inps = Input(tensor=inp, name=name)

# Define your model here.

# Note : Parameter sharing is not working
# Hence define two different parallel models
# for critic and actor networks.

x_actor = Dense(32, activation='relu')(inps)
x_actor = Dense(16)(x_actor)
x_actor = ReLU(max_value=80.)(x_actor)

x_critic = Dense(32, activation='relu')(inps)
x_critic = Dense(16, activation='relu')(x_critic)

# Actor and Critic output for the model
actor_output = Dense(nb_actions, activation='softmax')(x_actor)
critic_output = Dense(nb_actions, activation='linear')(x_critic)

# Input list to the model
inputs = [inps]

# Output list to the model
outputs = [critic_output, actor_output]

model = Model(inputs=inputs, outputs=outputs)
return model, inputs, outputs

# Policy of the actor model.
policy = SoftmaxPolicy()

# Experience memory of the agent
memory = EpisodeMemory(nsteps, 50000)
agent = ACERAgent(memory, model_fn, nb_actions, obs_shape, policy=policy, nenvs=nenvs, nsteps=nsteps)

# Define the optimizor to be used
opt = Adam(lr=0.00005, clipvalue=10.)

# Currently compile do not support metrics.
agent.compile(opt)

mode = 'train'
if mode == 'train':
# Okay, now it's time to learn something! We capture the interrupt exception so that training
# can be prematurely aborted. Notice that you can the built-in Keras callbacks!
weights_filename = 'acer_{}_weights.h5f'.format(ENV_NAME)
checkpoint_weights_filename = 'acer_' + ENV_NAME + '_weights_{step}.h5f'
log_filename = 'acer_{}_log.json'.format(ENV_NAME)
callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=1000)]
callbacks += [FileLogger(log_filename, interval=5000)]
agent.fit(env, callbacks=callbacks, nb_steps=50000, log_interval=10000)

# After training is done, we save the final weights one more time.
agent.save_weights(weights_filename, overwrite=True)

# Finally, evaluate our algorithm for 10 episodes.
env = gym.make(ENV_NAME)
agent.test(env, nb_episodes=10, visualize=False)
elif mode == 'test':
weights_filename = 'acer_{}_weights.h5f'.format(ENV_NAME)
# if args.weights:
# weights_filename = args.weights
agent.load_weights(weights_filename)
env = gym.make(ENV_NAME)
agent.test(env, nb_episodes=10, visualize=False)
# print (abc.losses)
1 change: 1 addition & 0 deletions rl/agents/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from __future__ import absolute_import
from .acer import ACERAgent
from .dqn import DQNAgent, NAFAgent, ContinuousDQNAgent
from .ddpg import DDPGAgent
from .cem import CEMAgent
Expand Down
1 change: 1 addition & 0 deletions rl/agents/acer/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .acer import ACERAgent
Loading