Skip to content

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Aug 5, 2025

Fixes #2683 and pytorch/pytorch#159819

Let's just make sure CI is all green before merging

myenv) ➜  ao git:(msaroufim-patch-32) pytest test/dtypes/test_affine_quantized_tensor_parallel.py -v

============================================================= test session starts =============================================================
platform linux -- Python 3.10.16, pytest-7.4.0, pluggy-1.5.0 -- /home/marksaroufim/.conda/envs/nv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/home/marksaroufim/ao/.hypothesis/examples'))
rootdir: /home/marksaroufim/ao
plugins: hypothesis-6.130.3, anyio-4.9.0, typeguard-4.3.0
collected 6 items                                                                                                                             

test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_bfloat16 PASSED                  [ 16%]
test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_float16 PASSED                   [ 33%]
test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_float32 PASSED                   [ 50%]
test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt4woAffineQuantizedTensorParallel::test_tp_bfloat16 SKIPPED (This doesn...) [ 66%]
test/dtypes/test_affine_quantized_tensor_parallel.py::TestGemliteLayoutTensorParallel::test_tp_gemlite_float16 SKIPPED (gemlite not...) [ 83%]
test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8dqAffineQuantizedTensorParallel::test_tp_bfloat16 PASSED                  [100%]

================================================== 4 passed, 2 skipped in 107.14s (0:01:47) ===================================================
(myenv) ➜  ao git:(msaroufim-patch-32) # Run only the failed tests by name
pytest test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_bfloat16 \
       test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_float16 \
       test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_float32 \
       test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8dqAffineQuantizedTensorParallel::test_tp_bfloat16 \
       test/dtypes/test_nf4.py::TestQLoRA::test_qlora_fsdp2 \
       test/dtypes/test_nf4.py::TestComm::test_comm \
       test/prototype/test_quantized_training.py::TestFSDP2::test_fsdp2_correctness \
       test/prototype/test_quantized_training.py::TestFSDP2::test_precompute_bitnet_scale \
       test/test_low_bit_optim.py::TestFSDP2::test_fsdp2 \
       test/test_low_bit_optim.py::TestFSDP2::test_uneven_shard -v
============================================================= test session starts =============================================================
platform linux -- Python 3.10.16, pytest-7.4.0, pluggy-1.5.0 -- /home/marksaroufim/.conda/envs/nv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/home/marksaroufim/ao/.hypothesis/examples'))
rootdir: /home/marksaroufim/ao
plugins: hypothesis-6.130.3, anyio-4.9.0, typeguard-4.3.0
collected 10 items                                                                                                                            

test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_bfloat16 PASSED                  [ 10%]
test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_float16 PASSED                   [ 20%]
test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8woAffineQuantizedTensorParallel::test_tp_float32 PASSED                   [ 30%]
test/dtypes/test_affine_quantized_tensor_parallel.py::TestInt8dqAffineQuantizedTensorParallel::test_tp_bfloat16 PASSED                  [ 40%]
test/dtypes/test_nf4.py::TestQLoRA::test_qlora_fsdp2 PASSED                                                                             [ 50%]
test/dtypes/test_nf4.py::TestComm::test_comm PASSED                                                                                     [ 60%]
test/prototype/test_quantized_training.py::TestFSDP2::test_fsdp2_correctness PASSED                                                     [ 70%]
test/prototype/test_quantized_training.py::TestFSDP2::test_precompute_bitnet_scale PASSED                                               [ 80%]
test/test_low_bit_optim.py::TestFSDP2::test_fsdp2 PASSED                                                                                [ 90%]
test/test_low_bit_optim.py::TestFSDP2::test_uneven_shard PASSED [100%]

================== 10 passed in 392.09s (0:06:32) ===================
(myenv) ➜  ao git:(msaroufim-patch-32) pip list | grep torch                                                                               
pytorch-triton                3.4.0+git11ec6354
torch                         2.9.0.dev20250804+cu126 /home/marksaroufim/.conda/envs/nv/lib/python3.10/site-packages
torchao                       0.13.0+git331c28906     /home/marksaroufim/ao
torchdata                     0.11.0
torchtitan                    0.1.0
torchvision                   0.22.0a0+5f03dc5        /home/marksaroufim/vision
(myenv) ➜  ao git:(msaroufim-patch-32) 

Copy link

pytorch-bot bot commented Aug 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2684

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 1b2eb00 with merge base b757fb9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2025
@msaroufim msaroufim added the topic: bug fix Use this tag for PRs that fix bugs label Aug 5, 2025
@msaroufim msaroufim changed the title Try fixing the FSDP2 breakage in nightly Fix FSDP2 breakage in nightly Aug 5, 2025
@msaroufim msaroufim merged commit 2e361d7 into main Aug 5, 2025
20 checks passed
@msaroufim msaroufim deleted the msaroufim-patch-32 branch August 5, 2025 03:58
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bug fix Use this tag for PRs that fix bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI test failures due to torch nightly fsdp2 bug
2 participants