Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(env): remove reentrant state reward scaling
Preivously, entering a state more than one time would provide a negative reward that scaled up (n) times before hitting a max. This was to penalize the agent for doing nonsensical things like commuting the same nodes back and forth a bunch then solving with 1 move left. The reentrant state penalty works as expected, but with large time windows the scaling leads to large negative rewards that skew the mean rewards. Removing the scaling allows large time windows (e.g. 128 steps) without episode rewards becoming very large.
- Loading branch information