Propose new Sampler API and port Ray Sampler #881

krzentner · 2019-09-16T19:17:54Z

I'm uploading this pull request now to discuss this design, not with the intent of merging it right now.

However, I've ported the ray sampler to the new design to demonstrate that this port is not difficult from the sampler side. The ray unit test still passes (after also being ported to the new API).

avnishn

What I like about this ray sampler @krzentner is now I guess we don't have to have an off policy ray sampler vs on policy ray sampler. This design is definitely easier to work with as we wont have to go back and change it. The only thing I am confused about is if the init is deprecated, are you saying that construct is going to be the new init of the sampler?

avnishn · 2019-09-16T22:18:25Z

src/garage/sampler/base.py

+        self.env = env
+
+    @classmethod
+    def construct(cls, config: SamplerConfig, agent, env) -> 'Sampler':


if we have this construct that takes a config, can we get rid of passing the algo to the init @krzentner

Yeah, the goal of this change is actually to let us avoid the Samplers ever having direct access to Algorithms. They only get SamplerConfigs.

... if the init is deprecated, are you saying that construct is going to be the new init of the sampler?

The intent is actually to have algorithms ask the runner to create samplers. This is because the runner generally knows the correct / default values for init_worker_fn and update_agent_fn. But yeah, the runner will construct Samplers by calling construct, at least for now. In the future, we could refactor things so that __init__ is used by the runner instead, but we might not want to do that, since there's a general aversion to making __init__ part of APIs.

src/garage/sampler/base.py

src/garage/sampler/ray_sampler.py

ryanjulian · 2019-09-19T21:45:55Z

src/garage/sampler/base.py

+        self.env = env
+
+    @classmethod
+    def construct(cls, config, agent, env):


do you envision replacing the constructor with this after the API change is finished?

Maybe? Having the constructor be part of the API is generally avoided in languages where types aren't first class (since it's not meaningful there), so it's considered somewhat unusual in other languages. However, it seems fairly natural in Python. I don't have a strong preference here.

ryanjulian

I'm usually suspicious of C++-style "config types everywhere" in python, but this is a very nice design for this specific problem.

Questions:

Would this design allow us to get rid of framework-specific samplers classes (e.g. those in tf/samplers)? That would be a big win IMO.
How well would this generalize to a multi-environment sampler, assuming we don't want to just wrap multiple environments in one round-robin environment? Do you have some thoughts on what that would look like?
nit: Are there better names than SamplerConfig? Perhaps not?

krzentner · 2019-09-20T00:00:26Z

Would this design allow us to get rid of framework-specific samplers classes (e.g. those in tf/samplers)?

I designed it with that in mind. For the case of off-policy algorithms, we still need a better way of incorporating exploration strategies, but that could be handled in this design too (by using a different rollout function).

How well would this generalize to a multi-environment sampler, assuming we don't want to just wrap multiple environments in one round-robin environment? Do you have some thoughts on what that would look like?

The environment update function provides a reasonably simple way of handling multiple environments. It receives the worker number, which means we could also handle different workers managing different environments for efficiency. The only major assumption of this design which might introduce efficiency or ergonomics issues is that we transmit the same environment update to all workers. I don't think this is a major concern, and there's ways we can work around it if necessary.

Are there better names than SamplerConfig? Perhaps not?

The main reason I don't like the term config here is that this object really provides additional behavior (mostly) not data. This pattern (of having an object with a bunch of functions in it to act like a poor-man's trait pattern) has been used in other places. In JavaScript, it's sometimes called the revealing constructor pattern (although JavaScript has object literals, making it somewhat more natural). We could also think of this as injecting new behavior into the Sampler, since it decouples the Sampler from the agent and environment types. Despite these related techniques, I don't really know what else to call this. SamplerExtension, SamplerBehavior, and SamplerOperations are reasonable options, I suppose, but not better than SamplerConfig. We could focus more on how this object interacts with workers, and call it a WorkerConfig, UnderlyingWorker, WorkerBehavior, or WorkerImpl.

Alternatively, we could turn this more into a "proper" worker abstraction, albeit one that doesn't know how to distribute itself. Then we could think of this as an inversion of control in the worker interface (and call this a Worker, InnerWorker, WorkerImpl, or WorkerCore). I'll think about these options a little more.

codecov · 2019-09-27T02:46:46Z

Codecov Report

Merging #881 into master will decrease coverage by 0.02%.
The diff coverage is 84.26%.

@@            Coverage Diff             @@
##           master     #881      +/-   ##
==========================================
- Coverage   86.43%   86.41%   -0.03%     
==========================================
  Files         161      164       +3     
  Lines        7780     7874      +94     
  Branches      984      993       +9     
==========================================
+ Hits         6725     6804      +79     
- Misses        870      883      +13     
- Partials      185      187       +2

Impacted Files	Coverage Δ
src/garage/sampler/utils.py	`82.35% <ø> (ø)`	⬆️
src/garage/sampler/batch_sampler.py	`100% <ø> (ø)`	⬆️
src/garage/tf/samplers/ray_sampler.py	`90.9% <100%> (ø)`	⬆️
src/garage/tf/samplers/batch_sampler.py	`41.93% <33.33%> (-6.35%)`	⬇️
src/garage/sampler/worker_factory.py	`71.42% <71.42%> (ø)`
src/garage/sampler/sampler.py	`78.57% <78.57%> (ø)`
src/garage/sampler/worker.py	`83.52% <83.52%> (ø)`
src/garage/sampler/ray_sampler.py	`96.42% <97.87%> (+6.25%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2b701a...3c5d766. Read the comment docs.

src/garage/sampler/base.py

ryanjulian · 2019-09-30T22:28:28Z

src/garage/sampler/base.py

+            config(SamplerConfig): Configuration which specifies how to
+                intialize workers, update agents and environments, and perform
+                rollouts.
+            agents(Agent or [Agent]): Agent(s) to use to perform rollouts. It


we try to use PEP484-ish notation here, so it would be Agent or List[Agent]

ryanjulian · 2019-09-30T22:30:31Z

src/garage/sampler/config.py

+    return x
+
+
+class SamplerConfig:


SamplingStrategy? SamplingPlan?

WorkerFactory? The name is still up for discussion, of course.

src/garage/sampler/config.py

ryanjulian · 2019-09-30T22:46:41Z

src/garage/sampler/config.py

+        seed(int): The seed to use to intialize random number generators.
+        n_workers(int): The number of workers to use.
+        max_path_length(int): The maximum length paths which will be sampled.
+        worker_init_fn((int, SamplerConfig) -> None): Function to run in


worker number is an odd thing to be carrying around/passing everywhere, but I understand the intent. passing SamplerConfig to worker_init_fn and rollout, sometimes in addition to worker_number seems an odd asymmetry to me.

more importantly, who gets a SamplerConfig and who gets a worker_number (or both) i find a little hard to predict from just reading the API.

what if you split the roles here into SamplerConfig which is global/initialization and SamplingContext which is generated by Sampler and created/managed per-worker. for instance, the context could include the random seed this worker should use (rather than expecting the worker to derive it from the global seed), the worker number, perhaps which resources this worker is allowed to use, perhaps a prototype Policy/Env objects to which env/agent updates to be applied, etc.

per-worker functions (rollout, worker_init_fn, env_update_fn, agent_update_fn would always/only have access to SamplingContext, not SamplerConfig).

I ended up doing something like this (WorkerFactory and Worker). I do think it makes sense to rename these though, since they're not exactly workers in the sense most people probably expect.

src/garage/sampler/base.py

src/garage/sampler/ray_sampler.py

src/garage/sampler/worker.py

src/garage/sampler/worker_factory.py

ryanjulian · 2019-10-09T22:22:08Z

Overall this is looking great and I think a definite improvement over what we currently have.

What are your thoughts on a rollout strategy? Should we this now and refactor the existing as we go? Or run one giant refactor PR? I look a look at our sampler coverage and it's pretty OK, so I think either could be viable options.

krzentner · 2019-11-22T02:30:02Z

What are your thoughts on a rollout strategy? Should we this now and refactor the existing as we go? Or run one giant refactor PR? I look a look at our sampler coverage and it's pretty OK, so I think either could be viable options.

Definitely several PRs. We need to at least make the other samplers minimally follow this API before we refactor the runner and algorithms (which is the real target of this refactoring).

src/garage/sampler/ray_sampler.py

src/garage/sampler/sampler.py

src/garage/sampler/worker.py

src/garage/sampler/sampler.py

src/garage/sampler/worker.py

ryanjulian · 2019-11-22T02:50:36Z

src/garage/sampler/worker.py

+        """Update an agent, assuming it implements garage.tf.policies.Policy.
+
+        Args:
+            agent_update(Dict[str, np.array]): Parameters to agent, which


you specified object in the interface, so does this DefaultWorker specifically do agent updates with dicts?

Yep. I've added a comment.

src/garage/sampler/worker.py

ryanjulian

This is a solid improvement over what we have. Please consider my comments (no need to wait for me to respond) but after that LGTM.

ryanjulian · 2019-11-22T02:58:50Z

@avnishn using the "request changes" button blocks this PR until you approve the updated PR. please re-review or remove your request.

avnishn

It seems that my previous questions were answered. Apologize for the late review. Looks good to me

krzentner requested review from ryanjulian, zequnyu, avnishn and a team September 16, 2019 19:17

avnishn requested changes Sep 16, 2019

View reviewed changes

krzentner commented Sep 17, 2019

View reviewed changes

src/garage/sampler/ray_sampler.py Show resolved Hide resolved

ryanjulian reviewed Sep 19, 2019

View reviewed changes

ryanjulian reviewed Sep 30, 2019

View reviewed changes

src/garage/sampler/base.py Outdated Show resolved Hide resolved

ryanjulian reviewed Sep 30, 2019

View reviewed changes

src/garage/sampler/config.py Outdated Show resolved Hide resolved

ryanjulian reviewed Sep 30, 2019

View reviewed changes

src/garage/sampler/config.py Outdated Show resolved Hide resolved

ryanjulian reviewed Sep 30, 2019

View reviewed changes

krzentner force-pushed the new-sampler-api branch from bc94257 to c2b57df Compare October 8, 2019 01:37