New worker for PEARL #1124

lywong92 · 2020-01-08T13:13:49Z

Adding a new worker for PEARL sampling because it needs to be able to store context and resample belief in the policy when obtaining samples.

codecov · 2020-01-08T15:11:39Z

Codecov Report

Merging #1124 into master will increase coverage by 0.04%.
The diff coverage is 97.56%.

@@            Coverage Diff             @@
##           master    #1124      +/-   ##
==========================================
+ Coverage   88.00%   88.04%   +0.04%     
==========================================
  Files         188      189       +1     
  Lines        8885     8926      +41     
  Branches     1124     1131       +7     
==========================================
+ Hits         7819     7859      +40     
  Misses        862      862              
- Partials      204      205       +1

Impacted Files	Coverage Δ
src/garage/torch/algos/pearl.py	`97.56% <97.56%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e26a7f6...c918641. Read the comment docs.

src/garage/replay_buffer/meta_replay_buffer.py

src/garage/replay_buffer/multi_task_replay_buffer.py

src/garage/sampler/in_place_sampler.py

src/garage/replay_buffer/multi_task_replay_buffer.py

src/garage/torch/policies/deterministic_mlp_policy.py

ryanjulian · 2020-01-20T07:06:15Z

Assuming everything passes muster, I'm OK merging this now if KR is okay with it. Clearly, unifying sampling for meta-RL is an area we need to work on.

avnishn

Based on comments and code similarities, not sure if we need a new meta-rl replay buffer. Perhaps a refactor using an existing replay buffer can come later if we decide.
What is the reasoning behind a multi task replay buffer for PEARL? I saw your comment:

The algorithm keeps a buffer for each task, and I think this looks cleaner. I just followed the implementation on oyster, but I suppose I can keep a list of buffers in the algo without this buffer class.

I'm with you on keeping a list of buffers as opposed to a separate MT replay buffer

Can you create your own rollout function separate from the sampler/utils rollout function.
It seems that you have modified the return types of the rollout function and added some other features. A question I find myself asking is "Will the added functionality be used by algorithms besides PEARL?"

src/garage/sampler/pearl_sampler.py

avnishn · 2020-01-20T19:08:25Z

src/garage/sampler/pearl_sampler.py

+
+            # save the latent context that generated this trajectory
+            if accum_context:
+                path['context'] = self.policy.z.detach().cpu().numpy()


does self.policy.z ever update during the duration of the sampling loop? If not, can we store this value instead of querying ever time.

src/garage/sampler/utils.py

ryanjulian · 2020-03-10T21:37:09Z

src/garage/sampler/__init__.py

+    'WorkerFactory',
+    'Worker',
+    'DefaultWorker',
+    'PEARLSampler',


can this be achieved by adding a PEARLWorker class @krzentner ?

Anson does a similar pattern for RL2: He implements an RL2 worker which lives in garage.tf.algos.rl2 rather than the central sampler package.

It can definitely be achieved with that.
Alternatively, just put z in agent_infos. It's there for a reason.

ryanjulian · 2020-03-10T21:37:39Z

src/garage/sampler/pearl_sampler.py

@@ -0,0 +1,156 @@
+# pylint: disable=unnecessary-pass


you can deal with this by just omitting pass from your empty methods.

ryanjulian · 2020-03-11T18:07:12Z

src/garage/sampler/__init__.py

+    'WorkerFactory',
+    'Worker',
+    'DefaultWorker',
+    'PEARLWorker',


please put this class in src/garage/torch/algos/pearl.py -- since it is only used by the PEARL algorithm.

Its import path will be from garage.torch.algos.pearl import PEARLWorker

ryanjulian · 2020-03-11T18:08:06Z

Please address my comments before submit. @krzentner should also probably take a look.

krzentner · 2020-03-12T22:43:12Z

src/garage/torch/algos/pearl.py

+            pass
+        self._agent_infos['context'] = [self.agent.z.detach().cpu().numpy()
+                                        ] * self._max_path_length
+        self.agent.sample_from_belief()


It might make more sense to put this at the start, since agent updates happen in between calls to this method.

krzentner

Mostly looks good. There's one small thing you should check, but feel free to submit.

lywong92 requested review from ryanjulian and krzentner January 8, 2020 13:13

lywong92 requested a review from a team as a code owner January 8, 2020 13:13

lywong92 changed the title ~~New sampler for PEARL~~ New sampler and buffers for PEARL Jan 9, 2020