New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose new Sampler API and port Ray Sampler #881
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I like about this ray sampler @krzentner is now I guess we don't have to have an off policy ray sampler vs on policy ray sampler. This design is definitely easier to work with as we wont have to go back and change it. The only thing I am confused about is if the init is deprecated, are you saying that construct is going to be the new init of the sampler?
src/garage/sampler/base.py
Outdated
self.env = env | ||
|
||
@classmethod | ||
def construct(cls, config: SamplerConfig, agent, env) -> 'Sampler': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have this construct that takes a config, can we get rid of passing the algo to the init @krzentner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the goal of this change is actually to let us avoid the Sampler
s ever having direct access to Algorithm
s. They only get SamplerConfig
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... if the init is deprecated, are you saying that construct is going to be the new init of the sampler?
The intent is actually to have algorithms ask the runner to create samplers. This is because the runner generally knows the correct / default values for init_worker_fn
and update_agent_fn
. But yeah, the runner will construct Samplers by calling construct
, at least for now. In the future, we could refactor things so that __init__
is used by the runner instead, but we might not want to do that, since there's a general aversion to making __init__
part of APIs.
src/garage/sampler/base.py
Outdated
self.env = env | ||
|
||
@classmethod | ||
def construct(cls, config, agent, env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you envision replacing the constructor with this after the API change is finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? Having the constructor be part of the API is generally avoided in languages where types aren't first class (since it's not meaningful there), so it's considered somewhat unusual in other languages. However, it seems fairly natural in Python. I don't have a strong preference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually suspicious of C++-style "config types everywhere" in python, but this is a very nice design for this specific problem.
Questions:
- Would this design allow us to get rid of framework-specific samplers classes (e.g. those in tf/samplers)? That would be a big win IMO.
- How well would this generalize to a multi-environment sampler, assuming we don't want to just wrap multiple environments in one round-robin environment? Do you have some thoughts on what that would look like?
- nit: Are there better names than SamplerConfig? Perhaps not?
I designed it with that in mind. For the case of off-policy algorithms, we still need a better way of incorporating exploration strategies, but that could be handled in this design too (by using a different rollout function).
The environment update function provides a reasonably simple way of handling multiple environments. It receives the worker number, which means we could also handle different workers managing different environments for efficiency. The only major assumption of this design which might introduce efficiency or ergonomics issues is that we transmit the same environment update to all workers. I don't think this is a major concern, and there's ways we can work around it if necessary.
The main reason I don't like the term config here is that this object really provides additional behavior (mostly) not data. This pattern (of having an object with a bunch of functions in it to act like a poor-man's trait pattern) has been used in other places. In JavaScript, it's sometimes called the revealing constructor pattern (although JavaScript has object literals, making it somewhat more natural). We could also think of this as injecting new behavior into the Sampler, since it decouples the Sampler from the agent and environment types. Despite these related techniques, I don't really know what else to call this. Alternatively, we could turn this more into a "proper" worker abstraction, albeit one that doesn't know how to distribute itself. Then we could think of this as an inversion of control in the worker interface (and call this a |
Codecov Report
@@ Coverage Diff @@
## master #881 +/- ##
==========================================
- Coverage 86.43% 86.41% -0.03%
==========================================
Files 161 164 +3
Lines 7780 7874 +94
Branches 984 993 +9
==========================================
+ Hits 6725 6804 +79
- Misses 870 883 +13
- Partials 185 187 +2
Continue to review full report at Codecov.
|
src/garage/sampler/base.py
Outdated
config(SamplerConfig): Configuration which specifies how to | ||
intialize workers, update agents and environments, and perform | ||
rollouts. | ||
agents(Agent or [Agent]): Agent(s) to use to perform rollouts. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we try to use PEP484-ish notation here, so it would be Agent or List[Agent]
src/garage/sampler/config.py
Outdated
return x | ||
|
||
|
||
class SamplerConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SamplingStrategy? SamplingPlan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WorkerFactory? The name is still up for discussion, of course.
src/garage/sampler/config.py
Outdated
seed(int): The seed to use to intialize random number generators. | ||
n_workers(int): The number of workers to use. | ||
max_path_length(int): The maximum length paths which will be sampled. | ||
worker_init_fn((int, SamplerConfig) -> None): Function to run in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worker number is an odd thing to be carrying around/passing everywhere, but I understand the intent. passing SamplerConfig
to worker_init_fn
and rollout
, sometimes in addition to worker_number
seems an odd asymmetry to me.
more importantly, who gets a SamplerConfig
and who gets a worker_number
(or both) i find a little hard to predict from just reading the API.
what if you split the roles here into SamplerConfig
which is global/initialization and SamplingContext
which is generated by Sampler
and created/managed per-worker. for instance, the context could include the random seed this worker should use (rather than expecting the worker to derive it from the global seed), the worker number, perhaps which resources this worker is allowed to use, perhaps a prototype Policy/Env objects to which env/agent updates to be applied, etc.
per-worker functions (rollout
, worker_init_fn
, env_update_fn
, agent_update_fn
would always/only have access to SamplingContext, not SamplerConfig).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up doing something like this (WorkerFactory
and Worker
). I do think it makes sense to rename these though, since they're not exactly workers in the sense most people probably expect.
bc94257
to
c2b57df
Compare
Overall this is looking great and I think a definite improvement over what we currently have. What are your thoughts on a rollout strategy? Should we this now and refactor the existing as we go? Or run one giant refactor PR? I look a look at our sampler coverage and it's pretty OK, so I think either could be viable options. |
20f0743
to
1c9ecea
Compare
Definitely several PRs. We need to at least make the other samplers minimally follow this API before we refactor the runner and algorithms (which is the real target of this refactoring). |
"""Update an agent, assuming it implements garage.tf.policies.Policy. | ||
|
||
Args: | ||
agent_update(Dict[str, np.array]): Parameters to agent, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you specified object
in the interface, so does this DefaultWorker
specifically do agent updates with dicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I've added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a solid improvement over what we have. Please consider my comments (no need to wait for me to respond) but after that LGTM.
@avnishn using the "request changes" button blocks this PR until you approve the updated PR. please re-review or remove your request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that my previous questions were answered. Apologize for the late review. Looks good to me
1c9ecea
to
2a55bd1
Compare
2a55bd1
to
d419fb9
Compare
d419fb9
to
289b56c
Compare
289b56c
to
3c5d766
Compare
I'm uploading this pull request now to discuss this design, not with the intent of merging it right now.
However, I've ported the ray sampler to the new design to demonstrate that this port is not difficult from the sampler side. The ray unit test still passes (after also being ported to the new API).