Skip to content

Commit

Permalink
fix(env): remove reentrant state reward scaling
Browse files Browse the repository at this point in the history
Preivously, entering a state more than one time would provide a negative reward that scaled up (n) times before hitting a max.

This was to penalize the agent for doing nonsensical things like commuting the same nodes back and forth a bunch then solving with 1 move left.

The reentrant state penalty works as expected, but with large time windows the scaling leads to large negative rewards that skew the mean rewards.

Removing the scaling allows large time windows (e.g. 128 steps) without episode rewards becoming very large.
  • Loading branch information
justindujardin committed Feb 27, 2020
1 parent 6b7a847 commit 0849e3c
Showing 1 changed file with 1 addition and 6 deletions.
7 changes: 1 addition & 6 deletions libraries/mathy_python/mathy/env.py
Expand Up @@ -205,13 +205,8 @@ def get_state_transition(
if list_count <= 1 or key != expression.raw:
continue

# NOTE: the reward is scaled by how many times this state has been visited
# up to (n) times
multiplier = min(list_count, 3)
return time_step.transition(
features,
reward=EnvRewards.PREVIOUS_LOCATION * multiplier,
discount=self.discount,
features, reward=EnvRewards.PREVIOUS_LOCATION, discount=self.discount,
)

if len(agent.history) > 0:
Expand Down

0 comments on commit 0849e3c

Please sign in to comment.