Version 0.1.5 #17

md-enlite · 2021-05-20T08:21:45Z

Features:

Adds documentation for run_context
Changes of simulated environment interfaces step_without_observation -> fast_step
Adds seeding to environments, models and trainers
Initial commit of the Maze Python API
Adds an ExportGifWrapper
Adds network architecture visualizations to Tensorboard Images
adds incremental min/max stats
adds categorical (support-based) value networks
added value transformations

(Issue RL-604 - Consider entire wrapper stack in clone_from of SimulatedEnvs)

… passing of Python objects to trainers/runners. Rewrote plain python training example. (Issue RL-578 - Add pure-Python API layer for more accessible training)

(Issue RL-621 - Stats logging of mcts rollouts in AlphaZero)

…ID in single sub-step envs (Issue RL-631 - Fix: Allow random policy to sample actions without explicit actor ID in single sub-step envs)

…previous evaluation run (Issue RL-630 - Fix: Rollout evaluation statistics from previous unfinished episodes carry over)

…o default eval concurrency in local setup (Issue RL-630 - Fix: Rollout evaluation statistics from previous unfinished episodes carry over)

…ons in the output directories

value normalization with min/max stats discounted value bootstrapping (Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

(Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

adds critic evaluation adds list of wrappers to exclude fixes Q value min/max normalization (Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

(Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

(Issue RL-635 - step_without_observation -> fast_step interface)

…tions etc. (Issue RL-637 - Fix and improve trajectory record convenience accessors to actions etc.)

(Issue RL-637 - Fix and improve trajectory record convenience accessors to actions etc.)

EnliteAI Bot added 26 commits May 7, 2021 01:25

RL-604: initial complete version of clone from stack

0dfa679

(Issue RL-604 - Consider entire wrapper stack in clone_from of SimulatedEnvs)

RL-604: cleaned up wrappers and unit tests

07cd97a

(Issue RL-604 - Consider entire wrapper stack in clone_from of SimulatedEnvs)

RL-604: Added sphinx fix for cyclic imports and type hints

c1365b2

(Issue RL-604 - Consider entire wrapper stack in clone_from of SimulatedEnvs)

RL-604: Added sphinx fix for cyclic imports and type hints

52b6200

(Issue RL-604 - Consider entire wrapper stack in clone_from of SimulatedEnvs)

FIX: adds MultiDiscrete distribution to docs

64fbfed

FIX: updated docs

89c4b60

FIX: updated docs

186c903

RL-578: Extended docs by RunContext-based examples. Fixed issues with…

e3bbef3

… passing of Python objects to trainers/runners. Rewrote plain python training example. (Issue RL-578 - Add pure-Python API layer for more accessible training)

FIX: Add seeding to serialized torch policy

a28b536

RL-621: fixes memory leak when cloning the env_context

3eac48a

(Issue RL-621 - Stats logging of mcts rollouts in AlphaZero)

RL-631: Allow random policy to sample actions without explicit actor …

1648d75

…ID in single sub-step envs (Issue RL-631 - Fix: Allow random policy to sample actions without explicit actor ID in single sub-step envs)

RL-630: Rollout evaluator: Clear stats from unfinished episodes from …

001e74a

…previous evaluation run (Issue RL-630 - Fix: Rollout evaluation statistics from previous unfinished episodes carry over)

RL-630: Set more consistent defaults for eval_repeats corresponding t…

3838b52

…o default eval concurrency in local setup (Issue RL-630 - Fix: Rollout evaluation statistics from previous unfinished episodes carry over)

FIX: base output directory name on microseconds (%f) to avoid collisi…

c0e7eec

…ons in the output directories

RL-580: Added extentions for discounted tasks

fdc80cb

value normalization with min/max stats discounted value bootstrapping (Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

RL-580: added value function scaling (transformed Bellman operator)

8ed9385

(Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

RL-580: adds categorical value function prediction

7df67d3

(Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

RL-580: adds categorical critic training

b3b5f7b

adds critic evaluation adds list of wrappers to exclude fixes Q value min/max normalization (Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

RL-580: makes num_candidates optional in Policy interface

2c5d716

(Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

RL-580: adds test for gym env clone from

6657e15

(Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

RL-580: refactored value transformation into dedicated class

47882fc

(Issue RL-580 - AlphaZero for discounted, infinite horizon Tasks)

RL-635: replaces step_without_observation with fast_step interface

a68ada9

(Issue RL-635 - step_without_observation -> fast_step interface)

RL-637: Fix and improve trajectory record convenience accessors to ac…

c6d90cf

…tions etc. (Issue RL-637 - Fix and improve trajectory record convenience accessors to actions etc.)

RL-637: Use dummy test env instead of cartpole in rollout evaluator test

4083472

(Issue RL-637 - Fix and improve trajectory record convenience accessors to actions etc.)

RL-637: Increase timeouts for run context tests

8be929d

(Issue RL-637 - Fix and improve trajectory record convenience accessors to actions etc.)

FIX: fix failing tests in wheel (add maze/test/*.yml to MANIFEST)

8f1c1a7

md-enlite merged commit 953a2e5 into main May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 0.1.5 #17

Version 0.1.5 #17

md-enlite commented May 20, 2021 •

edited

Version 0.1.5 #17

Version 0.1.5 #17

Conversation

md-enlite commented May 20, 2021 • edited

md-enlite commented May 20, 2021 •

edited