Commit 0849e3c
committed
fix(env): remove reentrant state reward scaling
Preivously, entering a state more than one time would provide a negative reward that scaled up (n) times before hitting a max.
This was to penalize the agent for doing nonsensical things like commuting the same nodes back and forth a bunch then solving with 1 move left.
The reentrant state penalty works as expected, but with large time windows the scaling leads to large negative rewards that skew the mean rewards.
Removing the scaling allows large time windows (e.g. 128 steps) without episode rewards becoming very large.1 parent 6b7a847 commit 0849e3c
1 file changed
Lines changed: 1 addition & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | 208 | | |
212 | | - | |
213 | | - | |
214 | | - | |
| 209 | + | |
215 | 210 | | |
216 | 211 | | |
217 | 212 | | |
| |||
0 commit comments