A proxy app for EXARL
- Python 3.7
- Additional python packages are defined in the setup.py
- This document assumes you are running at the top directory
├── setup.py : Python setup file with requirements files
├── config : folder containing configurations
└── agent_cfg : agent configuration folder
└── model_cfg : model configuration folder
└── env_cfg : env configuration folder
└── workflow_cfg : workflow configuration folder
└── learner_cfg.json : learner configuration
├── exarl : folder with EXARL code
└── __init__.py : make base classes visible
├── base : folder containing EXARL base classes
└── __init__.py : make base classes visible
└── agent_base.py : agent base class
└── env_base.py : environment base class
└── workflow_base.py : workflow base class
└── learner_base.py : learner base class
├── driver : folder containing RL MPI steering scripts
└── driver.py : Run scipt
├── candlelib : folder containing library for CANDLE functionality
├── agents : folder containing EXARL agents and registration scripts
└── __init__.py : agent registry
└── registration.py : script to handle registration
├── agent_vault : folder containing agents
└── __init__.py : script to make agents visible
└── <RLagent>.py : RL agents (such as DQN, DDPG, etc.)
├── envs : folder containing EXARL environments
└── __init__.py : environment registry
├── env_vault : folder containing environments
└── __init__.py : script to make environments visible
└── <RLenv>.py : RL environments (physics simulations, interfaces to experiments, etc.)
├── workflows : folder containing EXARL workflows and registration scripts
└── __init__.py : workflow registry
└── registration.py : script to handle registration
├── workflow_vault : folder containing workflows
└── __init__.py : script to make workflows visible
└── <RLworkflow>.py : RL workflows (such as SEED, IMPALA, etc.)
├── utils : folder containing utilities
└── __init__.py : make classes and functions visible
└── candleDriver.py : Supporting CANDLE script
└── analyze_reward.py : script for plotting results
└── log.py : central place to set logging levels
└── profile.py : provides function decorators for profiling, timing, and debugging
- Pull code from repo
git clone --recursive https://github.com/lanl/miniRL.git
cd miniRL
- Install dependencies for miniRL:
pip install -e . --user
Configuration files such as exarl/config/learner_cfg.json are searched for in the
following directories:
- (current working directory)/exarl/config
- ~/.exarl/config
- (site-packages dir)/exarl/config
If you would like to run miniRL from outside the source directory, you may install the config files with exarl or copy them into miniRL's config directory in your home directory like so:
$ mkdir -p ~/.exarl/config
$ cd EXARL
$ cp config/* ~/.exarl/configCANDLE functionality is built into miniRL
- Add/modify the learner parameters in
miniRL/exarl/config/learner_cfg.json
E.g.:-
{
"agent": "DQN-v0",
"env": "ExaCartPoleStatic-v0",
"workflow": "async",
"n_episodes": 10,
"n_steps": 10,
"output_dir": "./exa_results_dir"
}
- Add/modify the agent parameters in
miniRL/exarl/config/agent_cfg/<AgentName>.json
E.g.:-
{
"gamma": 0.75,
"epsilon": 1.0,
"epsilon_min" : 0.01,
"epsilon_decay" : 0.999,
"learning_rate" : 0.001,
"batch_size" : 5,
"tau" : 0.5
}
Currently, DQN agent takes either MLP or LSTM as model_type.
- Add/modify the model parameters in
miniRL/exarl/config/model_cfg/<ModelName>.json
E.g.:-
{
"dense" : [64, 128],
"activation" : "relu",
"optimizer" : "adam",
"out_activation" : "linear",
"loss" : "mse"
}
- Add/modify the environment parameters in
miniRL/exarl/config/env_cfg/<EnvName>.json
E.g.:-
{
"worker_app": "./exarl/envs/env_vault/cpi.py"
}
- Add/modify the workflow parameters in
miniRL/exarl/config/workflow_cfg/<WorkflowName>.json
E.g.:-
{
"process_per_env": "1"
}
- Please note the agent, model, environment, and workflow configuration file (json file) name must match the agent, model, environment, and workflow ID specified in
miniRL/exarl/config/learner_cfg.json.
E.g.:-miniRL/exarl/config/agent_cfg/DQN-v0.json,miniRL/exarl/config/model_cfg/MLP.json,miniRL/exarl/config/env_cfg/ExaCartPole-v1.json, andminiRL/exarl/config/workflow_cfg/async.json
- Existing environment can be paired with an available agent
- The following script is provided for convenience:
miniRL/exarl/driver/__main__.py
from mpi4py import MPI
import utils.analyze_reward as ar
import time
import exarl as erl
import mpi4py.rc
mpi4py.rc.threads = False
mpi4py.rc.recv_mprobe = False
# MPI communicator
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
# Get run parameters using CANDLE
# run_params = initialize_parameters()
# Create learner object and run
exa_learner = erl.ExaLearner(comm)
# Run the learner, measure time
start = time.time()
exa_learner.run()
elapse = time.time() - start
# Compute and print average time
max_elapse = comm.reduce(elapse, op=MPI.MAX, root=0)
elapse = comm.reduce(elapse, op=MPI.SUM, root=0)
if rank == 0:
print("Average elapsed time = ", elapse / size)
print("Maximum elapsed time = ", max_elapse)
# Save rewards vs. episodes plot
ar.save_reward_plot()
- Write your own script or modify the above as needed
- Run the following command:
mpiexec -np <num_parent_processes> python exarl/driver/__main__.py --<run_params>=<param_value>
- To obtain the parameters from JSON file/set in terminal using CANDLE, use the following lines:
import exarl.utils.candleDriver as cd
cd.run_params # dictionary containing all parameters
- Individual parameters are accessed using the corresponding key
E.g.-
self.search_method = cd.run_params['search_method']
self.gamma = cd.run_params['gamma']
- miniRL uses OpenAI gym environments
- The ExaEnv class in
miniRL/exarl/env_base.pyinherits from OpenAI GYM Wrapper class for including added functionality. - Environments inherit from gym.Env
Example:-
class envName(gym.Env):
...
- Register the environment in
EXARL/exarl/envs/__init__.py
from gym.envs.registration import register
register(
id='fooEnv-v0',
entry_point='envs.env_vault:FooEnv',
)
-
The id variable will be passed to exarl.make() to call the environment
-
The file
miniRL/exarl/env/env_vault/__init__.pyshould include
from exarl.envs.env_vault.foo_env import FooEnv
where miniRL/exarl/envs/env_vault/foo_env.py is the file containing your envirnoment
- The following example illustrates using the C function of computing the value of PI in miniRL
computePI.h:
#define MPICH_SKIP_MPICXX 1
#define OMPI_SKIP_MPICXX 1
#include <mpi.h>
#include <stdio.h>
#ifdef __cplusplus
extern "C" {
#endif
extern void compute_pi(int, MPI_Comm);
#ifdef __cplusplus
}
#endif
computePI.c:
#include <stdio.h>
#include <mpi.h>
double compute_pi(int N, MPI_Comm new_comm)
{
int rank, size;
MPI_Comm_rank(new_comm, &rank);
MPI_Comm_size(new_comm, &size);
double h, s, x;
h = 1.0 / (double) N;
s = 0.0;
for(int i=rank; i<N; i+=size)
{
x = h * ((double)i + 0.5);
s += 4.0 / (1.0 + x*x);
}
return (s * h);
}
- Compile the C/C++ code and create a shared object (*.so file)
- Create a python wrapper (Ctypes wrapper is shown)
computePI.py:
from mpi4py import MPI
import ctypes
import os
_libdir = os.path.dirname(__file__)
if MPI._sizeof(MPI.Comm) == ctypes.sizeof(ctypes.c_int):
MPI_Comm = ctypes.c_int
else:
MPI_Comm = ctypes.c_void_p
_lib = ctypes.CDLL(os.path.join(_libdir, "libcomputePI.so"))
_lib.compute_pi.restype = ctypes.c_double
_lib.compute_pi.argtypes = [ctypes.c_int, MPI_Comm]
def compute_pi(N, comm):
comm_ptr = MPI._addressof(comm)
comm_val = MPI_Comm.from_address(comm_ptr)
myPI = _lib.compute_pi(ctypes.c_int(N), comm_val)
return myPI
- In your environment code, just import the function and use it regularly
test_computePI.py:
from mpi4py import MPI
import numpy as np
import pdb
import computePI as cp
def main():
comm = MPI.COMM_WORLD
myrank = comm.Get_rank()
nprocs = comm.Get_size()
if myrank == 0:
N = 100
else:
N = None
N = comm.bcast(N, root=0)
num = 4
color = int(myrank/num)
newcomm = comm.Split(color, myrank)
mypi = cp.compute_pi(N, newcomm)
pi = newcomm.reduce(mypi, op=MPI.SUM, root=0)
newrank = newcomm.rank
if newrank==0:
print(pi)
if __name__ == '__main__':
main()
- miniRL extends OpenAI gym's environment registration to agents
- Agents inherit from exarl.ExaAgent
Example:-
import exarl
class agentName(exarl.ExaAgent):
def __init__(self, env, is_learner):
...
- Agents must include the following functions:
get_weights() # get target model weights
set_weights() # set target model weights
train() # train the agent
update() # update target model
action() # Next action based on current state
load() # load weights from memory
save() # save weights to memory
monitor() # monitor progress of learning
- Register the agent in
miniRL/exarl/agents/__init__.py
from exarl.agents.registration import register, make
register(
id='fooAgent-v0',
entry_point='exarl.agents.agent_vault:FooAgent',
)
-
The id variable will be passed to exarl.make() to call the agent
-
The file
miniRL/exarl/agents/agent_vault/__init__.pyshould include
from exarl.agents.agent_vault.foo_agent import FooAgent
where miniRL/agents/agent_vault/foo_agent.py is the file containing your agent
- miniRL also extends OpenAI gym's environment registration to workflows
- Workflows inherit from exarl.ExaWorkflow
Example:-
class workflowName(exarl.ExaWorkflow):
...
- Workflows must include the following functions:
run() # run the workflow
- Register the workflow in
miniRL/exarl/workflows/__init__.py
from exarl.agents.registration import register, make
register(
id='fooWorkflow-v0',
entry_point='exarl.workflows.workflow_vault:FooWorkflow',
)
-
The id variable will be passed to exarl.make() to call the agent
-
The file
miniRL/exarl/workflows/workflow_vault/__init__.pyshould include
from exarl.workflows.workflow_vault.foo_workflow import FooWorkflow
where miniRL/workflows/workflow_vault/foo_workflow.py is the file containing your workflow
- Base classes are provided for agents, environments, workflows, and learner in the directory
miniRL/exarl/ - Users can inherit from the correspoding agent, environment, and workflow base classes
- Function decorators are provided for debugging, timing, and profiling miniRL.
- Debugger captures the function signature and return values.
- Timer prints execution time in seconds.
- Either line_profiler or memory_profiler can be used for profiling the code.
- Profiler can be selected in
learner_cfg.jsonor using the command line argument--profile. - Options for profiling are
line,mem, ornone.
- Profiler can be selected in
- Function decorators can be used as shown below:
from exarl.utils.profile import *
@DEBUG
def my_func(*args, **kwargs):
...
@TIMER
def my_func(*args, **kwargs):
...
@PROFILE
def my_func(*args, **kwargs):
...
- Profiling results are written to:
results_dir + '/Profile/<line/memory>_profile.txt.
@misc{EXARL,
author = {Vinay Ramakrishnaiah, Malachi Schram, Jamal Mohd-Yusof, Sayan Ghosh, Yunzhi Huang, Ai Kagawa, Christine Sweeney, Shinjae Yoo},
title = {Easily eXtendable Architecture for Reinforcement Learning (EXARL)},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/exalearn/EXARL}},
}
If you have any questions or concerns regarding EXARL, please contact Vinay Ramakrishnaiah (vinayr@lanl.gov) and/or Malachi Schram (Malachi.Schram@pnnl.gov).