[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes #16531

sven1977 · 2021-06-18T09:32:39Z

This PR fixes the following problems:

CQL's BC loss term is simplified (by using the existing SquashedGaussian distribution) and changed to be the same as BCTrainer's loss (rllib/agents/marwil/marwil.py).
A learning test case for Pendulum-v0 for CQL has been added for both tf and torch, utilizing BC and CQL phases and reading from an expert (SAC generated) output file.
Action normalization has been fixed (and enabled by default) in all trainers, allowing all PG (and PG+Q) algos to learn inside a normalized action space (from ~ -1.0 to 1.0). Actions are only unsquashed (and clipped for safety) before being sent back to the env. This e.g. allows PPO now to learn in a distorted pendulum env with an action space of e.g. Box(low=300.0, high=500.0). A test case for PPO learning in such an env has been added for both tf and torch.
A new config key has been added, in case actions in an offline file have not been normalized yet (i.e. have their original env/behavior policy values): actions_in_input_normalized. If False (AND normalize_actions=True), the offline reader will re-normalize the found actions - according to the given action-space - to make sure the algo can learn inside the normalized space.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…cy_support_add_and_delete

…fix_bc_loss_term

# Conflicts: # rllib/evaluation/sampler.py

…fix_bc_loss_term

sven1977 added 30 commits June 10, 2021 18:39

wip

47c91c1

wip

649103c

Merge branch 'master' of https://github.com/ray-project/ray into poli…

fbfbd5b

…cy_support_add_and_delete

fix and LINT.

aa472ca

wip.

281654d

wip.

8c049fe

Merge branch 'master' of https://github.com/ray-project/ray into poli…

a052fcd

…cy_support_add_and_delete

wip.

9d675af

fix

8dfbec9

fix

eaa6afb

fix

dc6a774

Merge branch 'master' of https://github.com/ray-project/ray into poli…

6406687

…cy_support_add_and_delete

wip.

e0b6311

wip.

46e84fc

wip.

2835b56

wip.

4351570

wip.

265454a

wip.

ab79eac

Merge branch 'master' of https://github.com/ray-project/ray into poli…

3fb411d

…cy_support_add_and_delete

wip.

9443460

Merge branch 'master' of https://github.com/ray-project/ray into poli…

230adee

…cy_support_add_and_delete

wip.

f2b4c20

wip.

6e4037c

wip.

45fb626

wip.

2fd6ff7

wip.

e2d0378

Merge branch 'master' of https://github.com/ray-project/ray into poli…

2ad07aa

…cy_support_add_and_delete

wip.

97ca8dc

wip.

b5e9542

wip

967fd1e

sven1977 added 29 commits June 20, 2021 22:42

wip

ca44258

wip

2b859e0

wip

6baa539

Merge branch 'master' of https://github.com/ray-project/ray into cql_…

f33b8b1

…fix_bc_loss_term

Merge branch 'policy_support_add_and_delete' into cql_fix_bc_loss_term

4598be1

# Conflicts: # rllib/evaluation/sampler.py

LINT

a7bf42e

wip.

9cb1d60

LINT.

503d538

fixes.

9024118

fix and lint

13fa9aa

fix and lint

c28b096

fix.

6b62aab

fix and lint

7f40479

wip.

9307aca

Merge branch 'master' of https://github.com/ray-project/ray into cql_…

42d8c5d

…fix_bc_loss_term

wip

3b08816

wip

c079185

Merge branch 'master' of https://github.com/ray-project/ray into cql_…

15492ab

…fix_bc_loss_term

Merge branch 'master' of https://github.com/ray-project/ray into cql_…

4e5de74

…fix_bc_loss_term

wip

a8ab846

wip

28e949f

wip

e43ae8f

wip

3a0f859

wip

ca9f092

wip

6013492

Merge branch 'master' of https://github.com/ray-project/ray into cql_…

7507cc9

…fix_bc_loss_term

wip

5fb5f80

wip

7b3e1ba

fix

b443510

sven1977 merged commit 53206dd into ray-project:master Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes #16531

[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes #16531

sven1977 commented Jun 18, 2021 •

edited

Loading

[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes #16531

[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes #16531

Conversation

sven1977 commented Jun 18, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 commented Jun 18, 2021 •

edited

Loading