Skip to content

Align terminal reward with the last trainable token and add ALFWorld Evaluation#31

Merged
zhusq20 merged 1 commit intoopen-tinker:mainfrom
Xuyan923r:fix/reward-mask-alignment
Mar 1, 2026
Merged

Align terminal reward with the last trainable token and add ALFWorld Evaluation#31
zhusq20 merged 1 commit intoopen-tinker:mainfrom
Xuyan923r:fix/reward-mask-alignment

Conversation

@Xuyan923r
Copy link
Contributor

  • Align terminal reward placement with the last trainable token (response_mask == 1)

  • Keep the original fallback behavior when no trainable token exists

  • Add ALFWorld evaluation support

@zhusq20 zhusq20 merged commit f87fe25 into open-tinker:main Mar 1, 2026
2 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants