fix(env): remove reentrant state reward scaling

Preivously, entering a state more than one time would provide a negative reward that scaled up (n) times before hitting a max. This was to penalize the agent for doing nonsensical things like commuting the same nodes back and forth a bunch then solving with 1 move left. The reentrant state penalty works as expected, but with large time windows the scaling leads to large negative rewards that skew the mean rewards. Removing the scaling allows large time windows (e.g. 128 steps) without episode rewards becoming very large.
justindujardin · Feb 27, 2020 · 0849e3c · 0849e3c
1 parent 6b7a847
commit 0849e3c
Showing 1 changed file with 1 addition and 6 deletions.
diff --git a/libraries/mathy_python/mathy/env.py b/libraries/mathy_python/mathy/env.py
@@ -205,13 +205,8 @@ def get_state_transition(
             if list_count <= 1 or key != expression.raw:
                 continue
 
-            # NOTE: the reward is scaled by how many times this state has been visited
-            #       up to (n) times
-            multiplier = min(list_count, 3)
             return time_step.transition(
-                features,
-                reward=EnvRewards.PREVIOUS_LOCATION * multiplier,
-                discount=self.discount,
+                features, reward=EnvRewards.PREVIOUS_LOCATION, discount=self.discount,
             )
 
         if len(agent.history) > 0: