Update torchtitan for proper bf16 & new quant APIs #281
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@felipemello1 Correctly diagnosed that our memory was super high for our default Qwen3 8B script. The solution he proposed in #278 was to drop the
max_req_tokensandmax_res_tokens, which was very valid. But overall the high memory seems suspect. After looking into it, it was apparent that although we specified "bfloat16" in our Trainer/Ref sections of the configs, it was not being applied - hence the massive memory. The culprit was that we had an out of date Torchtitan package :/In this PR, I update the Torchtitan package so that bf16 is applied correctly.
Memory usage before (with 486 seq len):

Memory usage after (with 512 seq len):

sidenote: This has the fortunate side effect of halving the weight sync speed #impact