Skip to content

Commit 0849e3c

Browse files
fix(env): remove reentrant state reward scaling
Preivously, entering a state more than one time would provide a negative reward that scaled up (n) times before hitting a max. This was to penalize the agent for doing nonsensical things like commuting the same nodes back and forth a bunch then solving with 1 move left. The reentrant state penalty works as expected, but with large time windows the scaling leads to large negative rewards that skew the mean rewards. Removing the scaling allows large time windows (e.g. 128 steps) without episode rewards becoming very large.
1 parent 6b7a847 commit 0849e3c

1 file changed

Lines changed: 1 addition & 6 deletions

File tree

  • libraries/mathy_python/mathy

libraries/mathy_python/mathy/env.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -205,13 +205,8 @@ def get_state_transition(
205205
if list_count <= 1 or key != expression.raw:
206206
continue
207207

208-
# NOTE: the reward is scaled by how many times this state has been visited
209-
# up to (n) times
210-
multiplier = min(list_count, 3)
211208
return time_step.transition(
212-
features,
213-
reward=EnvRewards.PREVIOUS_LOCATION * multiplier,
214-
discount=self.discount,
209+
features, reward=EnvRewards.PREVIOUS_LOCATION, discount=self.discount,
215210
)
216211

217212
if len(agent.history) > 0:

0 commit comments

Comments
 (0)