Implement observation and reward scaling wrappers #28

arnupretorius · 2021-04-12T14:39:14Z

Best practice advice:

Make sure everything is reasonably scaled.

Rule of thumb:

Observations: Make everything mean 0, standard deviation 1.
Reward: If you control it, then scale it to a reasonable value.
Do it across ALL your data so far.
Look at all observations and rewards and make sure there aren't crazy outliers

arnupretorius · 2021-04-14T06:17:37Z

For inspiration:

KaleabTessera · 2021-04-28T10:35:11Z

Currently investigating if we can just use PZ observation and reward scaling wrappers.

…-dependencies docs: Fix dependencies.

arnupretorius self-assigned this Apr 12, 2021

arnupretorius assigned KaleabTessera and unassigned arnupretorius Apr 19, 2021

KaleabTessera mentioned this issue Apr 30, 2021

Feature/env preprocess wrappers #94

Merged

arnupretorius closed this as completed in #94 May 7, 2021

KaleabTessera pushed a commit that referenced this issue May 17, 2021

Merge pull request #28 from mava-team/bugfix/update-docs-with-correct…

a321af9

…-dependencies docs: Fix dependencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement observation and reward scaling wrappers #28

Implement observation and reward scaling wrappers #28

arnupretorius commented Apr 12, 2021

arnupretorius commented Apr 14, 2021 •

edited

Loading

KaleabTessera commented Apr 28, 2021

Implement observation and reward scaling wrappers #28

Implement observation and reward scaling wrappers #28

Comments

arnupretorius commented Apr 12, 2021

arnupretorius commented Apr 14, 2021 • edited Loading

KaleabTessera commented Apr 28, 2021

arnupretorius commented Apr 14, 2021 •

edited

Loading