Align terminal reward with the last trainable token and add ALFWorld Evaluation by Xuyan923r · Pull Request #31 · open-tinker/OpenTinker

Xuyan923r · 2026-03-01T12:07:03Z

Align terminal reward placement with the last trainable token (response_mask == 1)
Keep the original fallback behavior when no trainable token exists
Add ALFWorld evaluation support

…evaluation

Align terminal reward with the last trainable token and add ALFWorld …

b631bf9

…evaluation

zhusq20 merged commit f87fe25 into open-tinker:main Mar 1, 2026
2 of 13 checks passed

Provide feedback