Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose new Sampler API and port Ray Sampler #881

Merged
merged 1 commit into from Dec 28, 2019
Merged

Conversation

krzentner
Copy link
Contributor

I'm uploading this pull request now to discuss this design, not with the intent of merging it right now.

However, I've ported the ray sampler to the new design to demonstrate that this port is not difficult from the sampler side. The ray unit test still passes (after also being ported to the new API).

Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I like about this ray sampler @krzentner is now I guess we don't have to have an off policy ray sampler vs on policy ray sampler. This design is definitely easier to work with as we wont have to go back and change it. The only thing I am confused about is if the init is deprecated, are you saying that construct is going to be the new init of the sampler?

self.env = env

@classmethod
def construct(cls, config: SamplerConfig, agent, env) -> 'Sampler':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have this construct that takes a config, can we get rid of passing the algo to the init @krzentner

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the goal of this change is actually to let us avoid the Samplers ever having direct access to Algorithms. They only get SamplerConfigs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... if the init is deprecated, are you saying that construct is going to be the new init of the sampler?

The intent is actually to have algorithms ask the runner to create samplers. This is because the runner generally knows the correct / default values for init_worker_fn and update_agent_fn. But yeah, the runner will construct Samplers by calling construct, at least for now. In the future, we could refactor things so that __init__ is used by the runner instead, but we might not want to do that, since there's a general aversion to making __init__ part of APIs.

src/garage/sampler/base.py Outdated Show resolved Hide resolved
self.env = env

@classmethod
def construct(cls, config, agent, env):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you envision replacing the constructor with this after the API change is finished?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? Having the constructor be part of the API is generally avoided in languages where types aren't first class (since it's not meaningful there), so it's considered somewhat unusual in other languages. However, it seems fairly natural in Python. I don't have a strong preference here.

Copy link
Member

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually suspicious of C++-style "config types everywhere" in python, but this is a very nice design for this specific problem.

Questions:

  1. Would this design allow us to get rid of framework-specific samplers classes (e.g. those in tf/samplers)? That would be a big win IMO.
  2. How well would this generalize to a multi-environment sampler, assuming we don't want to just wrap multiple environments in one round-robin environment? Do you have some thoughts on what that would look like?
  3. nit: Are there better names than SamplerConfig? Perhaps not?

@krzentner
Copy link
Contributor Author

Would this design allow us to get rid of framework-specific samplers classes (e.g. those in tf/samplers)?

I designed it with that in mind. For the case of off-policy algorithms, we still need a better way of incorporating exploration strategies, but that could be handled in this design too (by using a different rollout function).

How well would this generalize to a multi-environment sampler, assuming we don't want to just wrap multiple environments in one round-robin environment? Do you have some thoughts on what that would look like?

The environment update function provides a reasonably simple way of handling multiple environments. It receives the worker number, which means we could also handle different workers managing different environments for efficiency. The only major assumption of this design which might introduce efficiency or ergonomics issues is that we transmit the same environment update to all workers. I don't think this is a major concern, and there's ways we can work around it if necessary.

Are there better names than SamplerConfig? Perhaps not?

The main reason I don't like the term config here is that this object really provides additional behavior (mostly) not data. This pattern (of having an object with a bunch of functions in it to act like a poor-man's trait pattern) has been used in other places. In JavaScript, it's sometimes called the revealing constructor pattern (although JavaScript has object literals, making it somewhat more natural). We could also think of this as injecting new behavior into the Sampler, since it decouples the Sampler from the agent and environment types. Despite these related techniques, I don't really know what else to call this. SamplerExtension, SamplerBehavior, and SamplerOperations are reasonable options, I suppose, but not better than SamplerConfig. We could focus more on how this object interacts with workers, and call it a WorkerConfig, UnderlyingWorker, WorkerBehavior, or WorkerImpl.

Alternatively, we could turn this more into a "proper" worker abstraction, albeit one that doesn't know how to distribute itself. Then we could think of this as an inversion of control in the worker interface (and call this a Worker, InnerWorker, WorkerImpl, or WorkerCore). I'll think about these options a little more.

@codecov
Copy link

codecov bot commented Sep 27, 2019

Codecov Report

Merging #881 into master will decrease coverage by 0.02%.
The diff coverage is 84.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #881      +/-   ##
==========================================
- Coverage   86.43%   86.41%   -0.03%     
==========================================
  Files         161      164       +3     
  Lines        7780     7874      +94     
  Branches      984      993       +9     
==========================================
+ Hits         6725     6804      +79     
- Misses        870      883      +13     
- Partials      185      187       +2
Impacted Files Coverage Δ
src/garage/sampler/utils.py 82.35% <ø> (ø) ⬆️
src/garage/sampler/batch_sampler.py 100% <ø> (ø) ⬆️
src/garage/tf/samplers/ray_sampler.py 90.9% <100%> (ø) ⬆️
src/garage/tf/samplers/batch_sampler.py 41.93% <33.33%> (-6.35%) ⬇️
src/garage/sampler/worker_factory.py 71.42% <71.42%> (ø)
src/garage/sampler/sampler.py 78.57% <78.57%> (ø)
src/garage/sampler/worker.py 83.52% <83.52%> (ø)
src/garage/sampler/ray_sampler.py 96.42% <97.87%> (+6.25%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2b701a...3c5d766. Read the comment docs.

config(SamplerConfig): Configuration which specifies how to
intialize workers, update agents and environments, and perform
rollouts.
agents(Agent or [Agent]): Agent(s) to use to perform rollouts. It
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we try to use PEP484-ish notation here, so it would be Agent or List[Agent]

return x


class SamplerConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SamplingStrategy? SamplingPlan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkerFactory? The name is still up for discussion, of course.

seed(int): The seed to use to intialize random number generators.
n_workers(int): The number of workers to use.
max_path_length(int): The maximum length paths which will be sampled.
worker_init_fn((int, SamplerConfig) -> None): Function to run in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker number is an odd thing to be carrying around/passing everywhere, but I understand the intent. passing SamplerConfig to worker_init_fn and rollout, sometimes in addition to worker_number seems an odd asymmetry to me.

more importantly, who gets a SamplerConfig and who gets a worker_number (or both) i find a little hard to predict from just reading the API.

what if you split the roles here into SamplerConfig which is global/initialization and SamplingContext which is generated by Sampler and created/managed per-worker. for instance, the context could include the random seed this worker should use (rather than expecting the worker to derive it from the global seed), the worker number, perhaps which resources this worker is allowed to use, perhaps a prototype Policy/Env objects to which env/agent updates to be applied, etc.

per-worker functions (rollout, worker_init_fn, env_update_fn, agent_update_fn would always/only have access to SamplingContext, not SamplerConfig).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing something like this (WorkerFactory and Worker). I do think it makes sense to rename these though, since they're not exactly workers in the sense most people probably expect.

@ryanjulian
Copy link
Member

Overall this is looking great and I think a definite improvement over what we currently have.

What are your thoughts on a rollout strategy? Should we this now and refactor the existing as we go? Or run one giant refactor PR? I look a look at our sampler coverage and it's pretty OK, so I think either could be viable options.

@krzentner
Copy link
Contributor Author

krzentner commented Nov 22, 2019

What are your thoughts on a rollout strategy? Should we this now and refactor the existing as we go? Or run one giant refactor PR? I look a look at our sampler coverage and it's pretty OK, so I think either could be viable options.

Definitely several PRs. We need to at least make the other samplers minimally follow this API before we refactor the runner and algorithms (which is the real target of this refactoring).

"""Update an agent, assuming it implements garage.tf.policies.Policy.

Args:
agent_update(Dict[str, np.array]): Parameters to agent, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you specified object in the interface, so does this DefaultWorker specifically do agent updates with dicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I've added a comment.

Copy link
Member

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a solid improvement over what we have. Please consider my comments (no need to wait for me to respond) but after that LGTM.

@ryanjulian
Copy link
Member

@avnishn using the "request changes" button blocks this PR until you approve the updated PR. please re-review or remove your request.

Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that my previous questions were answered. Apologize for the late review. Looks good to me

@krzentner krzentner merged commit dcf6020 into master Dec 28, 2019
@krzentner krzentner deleted the new-sampler-api branch December 28, 2019 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants