Update torchtitan for proper bf16 & new quant APIs #281

joecummings · 2025-10-01T19:26:57Z

@felipemello1 Correctly diagnosed that our memory was super high for our default Qwen3 8B script. The solution he proposed in #278 was to drop the max_req_tokens and max_res_tokens, which was very valid. But overall the high memory seems suspect. After looking into it, it was apparent that although we specified "bfloat16" in our Trainer/Ref sections of the configs, it was not being applied - hence the massive memory. The culprit was that we had an out of date Torchtitan package :/

In this PR, I update the Torchtitan package so that bf16 is applied correctly.

Memory usage before (with 486 seq len):

Memory usage after (with 512 seq len):

sidenote: This has the fortunate side effect of halving the weight sync speed #impact

…or-bf16

joecummings · 2025-10-01T20:20:18Z

src/forge/actors/trainer.py

    )
    use_vllm_builtin_load: bool = True
    compile: Compile = field(default_factory=Compile)
-    float8: Float8Dense = field(default_factory=Float8Dense)


This was also a part of the updated Torchtitan package

felipemello1

10x engineer

felipemello1 · 2025-10-01T20:23:17Z

nice wandb logging :D

joecummings · 2025-10-01T20:24:11Z

nice wandb logging :D

🫡

Update torchtitan for proper bf16 & new quant APIs

00ce572

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 1, 2025

joecummings added 2 commits October 1, 2025 12:38

Merge remote-tracking branch 'upstream/main' into update-torchtitan-f…

1dcf00a

…or-bf16

Update num tokens back to 512

b42a756

joecummings marked this pull request as ready for review October 1, 2025 20:19

joecummings commented Oct 1, 2025

View reviewed changes

felipemello1 approved these changes Oct 1, 2025

View reviewed changes

joecummings merged commit 3186797 into meta-pytorch:main Oct 1, 2025
5 checks passed

photomz pushed a commit to photomz/forge that referenced this pull request Oct 25, 2025

Update torchtitan for proper bf16 & new quant APIs (meta-pytorch#281)

68eec1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update torchtitan for proper bf16 & new quant APIs #281

Update torchtitan for proper bf16 & new quant APIs #281

Uh oh!

joecummings commented Oct 1, 2025 •

edited

Loading

Uh oh!

joecummings Oct 1, 2025

Uh oh!

felipemello1 left a comment

Uh oh!

felipemello1 commented Oct 1, 2025

Uh oh!

joecummings commented Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update torchtitan for proper bf16 & new quant APIs #281

Update torchtitan for proper bf16 & new quant APIs #281

Uh oh!

Conversation

joecummings commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joecummings Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

felipemello1 left a comment

Choose a reason for hiding this comment

Uh oh!

felipemello1 commented Oct 1, 2025

Uh oh!

joecummings commented Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joecummings commented Oct 1, 2025 •

edited

Loading