Trainer
gets episodes through Sampling to train the policy. In
Garage, Trainer
uses Sampler
to perform sampling. Sampler
manages Worker
s
and assign specific tasks to them, which is doing rollouts with agents and
environments. You can also implement your own sampler and worker. The
followings introduce the existing samplers and workers in Garage.
Sampler is responsible for assign sampling jobs to workers. Garage now has
two types of Sampler
s:
-
LocalSampler
, the default sampler, which runs workers in the main process in serial style. With this sampler, all the sampling tasks will run in the same thread. -
RaySampler
, the sampler using Ray framework to run distributed workers in parallel style.RaySampler
can not only run workers in different CPUs, but also in different machines across network.
Worker is the basic unit to perform a rollout per step. In paralleling
samplers, each worker will typically run in one exclusive CPU. For most
algorithms, Garage provides two kinds of workers, DefaultWorker
and
VecWorker
. A few algorithms (RL2 and PEARL) use custom workers specific to
that algorithm.
-
DefaultWorker
, the default worker. It works with one single agent/policy and one single environment in one step. -
VecWorker
, the worker with Vectorization, which runs multiple instances of the simulation on a single CPU.VecWorker
can compute a batch of actions from a policy regarding multiple environments to reduce of overhead of sampling (e.g. feeding forward a neural network).
A sampler can be constructed either from a worker factory (the class that can construct workers), or from parameters directly.
from garage.sampler import LocalSampler, WorkerFactory
env = ...
policy = ...
worker_factory = WorkerFactory(max_episode_length=100,
is_tf_worker=True,
n_workers=4)
sampler = LocalSampler.from_worker_factory(worker_factory=worker_factory,
agents=policy,
envs=env)
...
In the above example, we firstly construct a worker factory, which will
construct 4 workers for the sampler. And the max length of episodes collected
by these workers will be 100. Noted that for policies with TensorFLow
framework, we need to set is_tf_worker
to be True
. Here we don't choose
a type of worker explicitly, so it will construct DefaultWorker
by default.
With the worker factory, then we construct a LocalSampler
with policies and
environments that will be used in sampling.
Sometimes we want to construct a sampler directly.
from garage.sampler import RaySampler, VecWorker
env = ...
policy = ...
sampler = RaySampler(agents=policy,
envs=env,
# params below are for worker
max_episode_length=100,
is_tf_worker=True,
n_workers=4.
worker_class=VecWorker,
worker_args=dict(n_envs=12))
In this example, we construct a RaySampler
directly, and the sampler will
construct 4 VecWorker
when sampling. Besides, we set the level of
vectorization (i.e. the number of environments simulated in one step) to 12 by
setting n_envs
in worker_args
for the VecWorker
.
When we construct the sampler in the launcher. We just need to pass it to the
algorithm object. The algorithm will have a field named sampler
to save the
sampler. Then the trainer will be able to use the sampler for sampling.
env = ...
sampler = ...
algo = TRPO(...
sampler=sampler,
...)
trainer.setup(algo=algo, env=env)
This page was authored by Ruofu Wang (@yeukfu).