Add value function for torch #1317

yonghyuc · 2020-04-18T05:16:24Z

Implement GaussianMLPValueFunction for PyTorch algorithms.

A value function computes loss and returns the value to an algorithm to update the weights.
An algorithm has two optimizers, for policy and value function.

codecov · 2020-04-18T06:01:20Z

Codecov Report

Merging #1317 into master will increase coverage by 0.00%.
The diff coverage is 98.44%.

@@           Coverage Diff           @@
##           master    #1317   +/-   ##
=======================================
  Coverage   91.50%   91.50%           
=======================================
  Files         218      220    +2     
  Lines       10970    10975    +5     
  Branches     1322     1324    +2     
=======================================
+ Hits        10038    10043    +5     
- Misses        675      676    +1     
+ Partials      257      256    -1

Impacted Files	Coverage Δ
src/garage/torch/algos/trpo.py	`91.66% <81.81%> (-8.34%)`	⬇️
src/garage/torch/algos/maml.py	`97.04% <100.00%> (+0.22%)`	⬆️
src/garage/torch/algos/maml_ppo.py	`100.00% <100.00%> (ø)`
src/garage/torch/algos/maml_trpo.py	`100.00% <100.00%> (ø)`
src/garage/torch/algos/maml_vpg.py	`100.00% <100.00%> (ø)`
src/garage/torch/algos/ppo.py	`100.00% <100.00%> (ø)`
src/garage/torch/algos/vpg.py	`100.00% <100.00%> (ø)`
src/garage/torch/optimizers/__init__.py	`100.00% <100.00%> (ø)`
src/garage/torch/optimizers/optimizer_wrapper.py	`100.00% <100.00%> (ø)`
src/garage/torch/value_functions/__init__.py	`100.00% <100.00%> (ø)`
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 652db03...2cfc032. Read the comment docs.

krzentner

This is definitely a solid change in the right direction. Having said that, I think the ValueFunction interface should be much smaller. Remember: it's easier to add new methods than to delete old ones.
Aside from that and some minor details, I'm quite happy with this change.

krzentner · 2020-04-18T05:41:46Z

src/garage/torch/algos/maml.py

@@ -63,7 +61,8 @@ def __init__(self,
        self._meta_evaluator = meta_evaluator
        self._policy = policy
        self._env = env
-        self._value_function = value_function
+        self._value_function = copy.deepcopy(inner_algo._value_function)
+        self._lame_vf_state = self._value_function.state_dict()


Can you call it old instead of lame?

@naeioi Can you check this?

Conceptually it's the same, it's just that "lame" sounds a little unprofessional. Either way is okay, I guess.

it's just confusing. use old or previous

Sorry I am late to the party. I made this name. This is not an old or previous value function, but a value function without any training. Now I think initial_vf_state is the most appropriate name.

I see. If it gets used in multiple places, I suppose I'm fine with it being fixed in a later PR. _initial_vf_state is definitely a better name.

Can you change this to _initial_vf_state ?

src/garage/torch/algos/maml.py

src/garage/torch/algos/vpg.py

src/garage/torch/value_functions/base.py

src/garage/torch/value_functions/gaussian_mlp_value_function.py

src/garage/torch/value_functions/base.py

src/garage/torch/value_functions/gaussian_mlp_value_function.py

src/garage/torch/value_functions/base.py

src/garage/torch/algos/vpg.py

examples/torch/maml_ppo_half_cheetah_dir.py

src/garage/torch/algos/maml.py

ryanjulian · 2020-04-18T21:23:10Z

src/garage/torch/algos/trpo.py

+                                         advs_flat)
+        logger.log('Policy loss: {}'.format(policy_loss))
+
+        batch_dataset = BatchDataset((obs_flat, returns_flat),


why was this necessary for minibatching for the value function, but not the policy?

I thought for TRPO, there is the cg_iters field for ConjugateGradientOptimizer and this works very similar with max_optimization_epochs.
Is it okay to use minibatch for ConjugateGradientOptimizer?
If so, does this need to use same minibatch dataset but with different optimizing iterations? (cg_iters and max_optimization_epochs)?
Or it needs a single field for cg_iters and max_optimization_epochs?

you can't use minibatching with ConjugateGradientOptimizer, or rather it would be kind of useless. Minibatching only works with SGD-based optimization algorithms, but a CG optimizer doesn't use SGD to choose parameters.

Yes, so I used mini-batch only for value_function because default optimizer for value function is adam. When I run benchmark, I got a better result with minibatch and optimization_epochs

what if i use the non-default ConjugateGradientOptimizer? will it still minibatch? wouldn't that be incorrect?

In the tensorflow branch, we have our own optimizer class partly for this reason -- so that users can pass either a minibatching optimizer (which does batching inside of our own optimizer class), or the ConjugateGradientOptimizer. The real solution for us is probably to do the same for the pytorch branch, but that seems like significant additional complexity for this change. I think the simplest change which wouldn't be wrong would be to not use minibatching with Adam for the time being (even though that makes performance worse), and open an issue to create a minibatching adam optimizer for pytoch in another PR.

i saw in the CHANGELOG that Torch 1.5 started making it easier to add custom optimizers, but didn't look in detail.

src/garage/torch/algos/vpg.py

src/garage/torch/value_functions/base.py

ryanjulian

Great PR. See my comments and KR's (we are in agreement).

Overall this is like 80% there, but the intention of this PR is actually to totally remove the ValueFunction as a "special" primitive and make it act just like any other network in garage, so "predict" should not be necessary (for instance).

this will allow us to put the ValueFunction in the computation graph of the algorithm, which allows us to make a lot of things cleaner, and also enables us to do some interesting things.

ryanjulian · 2020-04-18T22:10:39Z

What do the MuJoCo3M benchmarks look like?

yonghyuc · 2020-04-20T08:17:34Z

This is PPO benchmark result for Mujoco1M (5.5e+5)

yonghyuc · 2020-04-20T08:19:04Z

This is TRPO benchmark result for Mujoco1M (5.5e+5)

src/garage/torch/value_functions/__init__.py

krzentner

Overall, looks good to me. I still think we should avoid using the term "lame," since it's unprofessional.

src/garage/torch/value_functions/base.py

src/garage/torch/algos/ppo.py

src/garage/torch/algos/trpo.py

src/garage/torch/algos/vpg.py

ryanjulian · 2020-04-24T04:17:02Z

src/garage/torch/algos/vpg.py

@@ -432,7 +483,7 @@ def process_samples(self, itr, paths):
        for path in paths:
            if 'returns' not in path:
                path['returns'] = tu.discount_cumsum(path['rewards'],
-                                                     self.discount)
+                                                     self.discount).copy()


why copy this?

If not, I got the error
E ValueError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.
This is becaue we revert the order of output [::-1] in the discount_cumsum function.
So I need to copy the array

You can also check here

How did it work before? What changed?

Can you use some other numpy function to get a reversed view, other than negative indexing?

When you call copy() you break the gradient path. If your returns have a differentiable components, e.g. a differentiable reward augmentation, your augmented rewards will no longer be differentiable.

This error is from here.
Previously, there is only LinearFeatureBaseline which doesn't have to convert numpy.array to torch.tensor. But now we need to convert numpy.array to torch.tensor for GaussianMLPValueFunction.

According to this post, problem is in PyTorch. PyTorch doesn't support numpy.array with negative stride.
This is one solution not using .copy()

torch.Tensor( tu.discount_cumsum(path['rewards'], self.discount) [::-1] ).flip(-1)

For the gradient path, the returns values from the tu.discount_cumsum is numpy.array not a torch.tensor, so I think there is no gradient path for the values yet.

Also I'm just curious that we use rewards from environment to compute the return values and there is module or functions that compute gradient for the rewards?

yonghyuc · 2020-04-27T22:32:27Z

I added new OptimizerWrapper class.
The OptimizerWrapper gets optimizer type(torch.optim.optimizer) and other parameters for mini batch. So, it is similar with torch.optim.optimizer (zero_grad, step) but it has get_minibatch function to provide batch data to algorithm.

Overall process is

algo gets mini batch dataset from OptimizerWrapper
OptimizerWrapper.zero_grad() -> inner torch.optim.optimizer.zero_grad()
algo computes loss and backward it
OptimizerWrapper.step() -> inner torch.optim.optimizer.step()

This PR contains many changes, so I will put changes only for OptimizerWrapper

src/garage/torch/algos/vpg.py

yonghyuc · 2020-04-28T22:41:02Z

@naeioi Could you check the changes and how they affect MAML algorithm?
I changed the MAML following the changes but I am worried about missing part.
You can check main changes here
Thanks!

@krzentner Could you check this PR for me? Thanks!

zequnyu · 2020-04-29T22:18:17Z

tests/benchmarks/garage/tf/algos/benchmark_ppo.py

@@ -264,49 +247,3 @@ def run_garage_tf(env, seed, log_dir):
        dowel_logger.remove_all()

        return tabular_log_file
-
-
-def run_baselines(env, seed, log_dir):


Do we want to delete this? This should be the only place we could locate the baselines benchmarking code.

It looks like the openai.baseline version is not support Tensorflow 2.0.

zequnyu · 2020-04-29T22:20:06Z

tests/benchmarks/garage/tf/algos/benchmark_trpo.py

@@ -218,59 +215,3 @@ def run_garage(env, seed, log_dir):
        dowel_logger.remove_all()

        return tabular_log_file
-
-
-def run_baselines(env, seed, log_dir):


It looks like the openai.baseline version is not support Tensorflow 2.0.

naeioi

LGTM. This is an important change! Please fix my minor comment

naeioi · 2020-04-29T22:29:20Z

src/garage/torch/algos/maml.py

@@ -63,7 +61,8 @@ def __init__(self,
        self._meta_evaluator = meta_evaluator
        self._policy = policy
        self._env = env
-        self._value_function = value_function
+        self._value_function = copy.deepcopy(inner_algo._value_function)
+        self._lame_vf_state = self._value_function.state_dict()


Can you change this to _initial_vf_state ?

src/garage/torch/algos/maml_trpo.py

ryanjulian · 2020-04-30T17:28:30Z

@Mergifyio rebase

mergify · 2020-04-30T17:29:08Z

Command rebase: success

Branch has been successfully rebased

ryanjulian · 2020-05-01T00:45:32Z

@Mergifyio rebase

mergify · 2020-05-01T00:46:14Z

Command rebase: success

Branch has been successfully rebased

yonghyuc requested a review from a team as a code owner April 18, 2020 05:16

yonghyuc requested review from ryanjulian and naeioi April 18, 2020 05:16

krzentner reviewed Apr 18, 2020

View reviewed changes

ryanjulian reviewed Apr 18, 2020

View reviewed changes

examples/torch/maml_ppo_half_cheetah_dir.py Show resolved Hide resolved

ryanjulian reviewed Apr 18, 2020

View reviewed changes

src/garage/torch/algos/maml.py Show resolved Hide resolved

ryanjulian reviewed Apr 18, 2020

View reviewed changes

src/garage/torch/algos/vpg.py Outdated Show resolved Hide resolved

ryanjulian reviewed Apr 18, 2020

View reviewed changes

src/garage/torch/value_functions/base.py Outdated Show resolved Hide resolved

ryanjulian reviewed Apr 18, 2020

View reviewed changes

yonghyuc force-pushed the add_torch_gmvf branch 2 times, most recently from a0d74c2 to 7fbc0a4 Compare April 20, 2020 07:57

yonghyuc force-pushed the add_torch_gmvf branch from 7fbc0a4 to 83122d9 Compare April 20, 2020 11:21

ryanjulian mentioned this pull request Apr 20, 2020

Complete the matrix of torch modules/primitives #1324

Open

yonghyuc requested review from ryanjulian and krzentner April 20, 2020 23:15

yonghyuc mentioned this pull request Apr 20, 2020

Add Meta-World example for PPO and TRPO #1327

Merged

krzentner reviewed Apr 21, 2020

View reviewed changes

src/garage/torch/value_functions/__init__.py Outdated Show resolved Hide resolved

yonghyuc force-pushed the add_torch_gmvf branch from 83122d9 to f00e031 Compare April 22, 2020 23:47

yonghyuc requested a review from krzentner April 23, 2020 00:00

krzentner approved these changes Apr 23, 2020

View reviewed changes

src/garage/torch/value_functions/base.py Show resolved Hide resolved

yonghyuc force-pushed the add_torch_gmvf branch from f00e031 to a2207ee Compare April 24, 2020 00:26

ryanjulian reviewed Apr 24, 2020

View reviewed changes

src/garage/torch/algos/ppo.py Outdated Show resolved Hide resolved

ryanjulian reviewed Apr 24, 2020

View reviewed changes

src/garage/torch/algos/trpo.py Outdated Show resolved Hide resolved

ryanjulian reviewed Apr 24, 2020

View reviewed changes

src/garage/torch/algos/vpg.py Outdated Show resolved Hide resolved

ryanjulian reviewed Apr 24, 2020

View reviewed changes

yonghyuc requested review from ryanjulian, krzentner and naeioi April 27, 2020 22:37

ryanjulian reviewed Apr 27, 2020

View reviewed changes

src/garage/torch/algos/vpg.py Outdated Show resolved Hide resolved

ryanjulian approved these changes Apr 27, 2020

View reviewed changes

yonghyuc force-pushed the add_torch_gmvf branch 2 times, most recently from c1a5687 to 4326ba8 Compare April 28, 2020 22:34

ryanjulian requested review from a team and zequnyu and removed request for a team April 29, 2020 22:10

zequnyu approved these changes Apr 29, 2020

View reviewed changes

naeioi approved these changes Apr 29, 2020

View reviewed changes

yonghyuc force-pushed the add_torch_gmvf branch from 4326ba8 to 2146c20 Compare April 30, 2020 00:54

yonghyuc added the ready-to-merge label Apr 30, 2020

yonghyuc force-pushed the add_torch_gmvf branch from 2146c20 to d6316d4 Compare April 30, 2020 10:07

ryanjulian force-pushed the add_torch_gmvf branch from d6316d4 to 515f897 Compare April 30, 2020 11:02

ryanjulian reviewed Apr 30, 2020

View reviewed changes

src/garage/torch/algos/maml_trpo.py Outdated Show resolved Hide resolved

ryanjulian force-pushed the add_torch_gmvf branch from 515f897 to 3bcb7c2 Compare April 30, 2020 17:29

yonghyuc force-pushed the add_torch_gmvf branch from 3bcb7c2 to 89a1ff9 Compare April 30, 2020 23:56

ryanjulian force-pushed the add_torch_gmvf branch from 89a1ff9 to 42094a0 Compare May 1, 2020 00:46

Add value function for torch

2cfc032

ryanjulian force-pushed the add_torch_gmvf branch from 42094a0 to 2cfc032 Compare May 1, 2020 18:42

mergify bot merged commit e09e6dc into master May 1, 2020

mergify bot deleted the add_torch_gmvf branch May 1, 2020 19:40

Add value function for torch #1317

Add value function for torch #1317

Conversation

yonghyuc commented Apr 18, 2020

codecov bot commented Apr 18, 2020 • edited Loading

Codecov Report

krzentner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonghyuc Apr 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonghyuc Apr 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanjulian left a comment

Choose a reason for hiding this comment

ryanjulian commented Apr 18, 2020

yonghyuc commented Apr 20, 2020

yonghyuc commented Apr 20, 2020

krzentner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonghyuc Apr 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonghyuc Apr 24, 2020 • edited Loading

Choose a reason for hiding this comment

yonghyuc commented Apr 27, 2020 • edited Loading

yonghyuc commented Apr 28, 2020

zequnyu Apr 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naeioi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanjulian commented Apr 30, 2020

mergify bot commented Apr 30, 2020

ryanjulian commented May 1, 2020

mergify bot commented May 1, 2020

codecov bot commented Apr 18, 2020 •

edited

Loading

yonghyuc Apr 20, 2020 •

edited

Loading

yonghyuc Apr 20, 2020 •

edited

Loading

yonghyuc Apr 24, 2020 •

edited

Loading

yonghyuc Apr 24, 2020 •

edited

Loading

yonghyuc commented Apr 27, 2020 •

edited

Loading

zequnyu Apr 29, 2020 •

edited

Loading