### Bug description https://github.com/pytorch/torchtitan/blob/a8899e4b2cab74eadbe4b9a2ca2776ceb8829db3/torchtitan/models/utils.py#L432-L437 However, `head_dim` is not necessarily equal to `dim / n_heads` e.g. Qwen3-4B, dim=2560, n_heads=32, head_dim=128 ### Versions latest main