Skip to content

Commit 3d2d78b

Browse files
fix(env): clamp episode win signal to 2.0 max
1 parent 0849e3c commit 3d2d78b

File tree

1 file changed

+1
-2
lines changed
  • libraries/mathy_python/mathy

1 file changed

+1
-2
lines changed

libraries/mathy_python/mathy/env.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,8 +161,7 @@ def get_win_signal(self, env_state: MathyEnvState) -> float:
161161
# the number of allowed steps, double the bonus signal
162162
if total_moves > 10 and current_move < total_moves / 2:
163163
bonus *= 2
164-
# Don't let a win go negative
165-
return max(EnvRewards.WIN + bonus, 0.1)
164+
return min(2.0, EnvRewards.WIN + bonus)
166165

167166
def get_lose_signal(self, env_state: MathyEnvState) -> float:
168167
"""Calculate the reward value for failing to complete the episode. This is done

0 commit comments

Comments
 (0)