Add PerBlock to safe globals #3370

andrewor14 · 2025-11-21T15:48:37Z

Summary: Add PerBlock to safe globals so users don't have to do this themselves when they load config.json with PerBlock.

WeightsUnpickler error: Unsupported global: GLOBAL torchao.quantization.granularity.PerBlock was not an allowed global by default. Please use `torch.serialization.add_safe_globals([torchao.quantization.granularity.PerBlock])` or the `torch.serialization.safe_globals([torchao.quantization.granularity.PerBlock])` context manager to allowlist this global if you trust this class/function.

Test Plan:

python test/core/test_config.py -k test_granularity_serialization

pytorch-bot · 2025-11-21T15:48:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3370

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-11-21T16:46:12Z

I think we should

put this in the same place where the other granularities are exposed for serialization
add a test (same as we should have for other granularities)

andrewor14 · 2025-11-21T17:43:15Z

put this in the same place where the other granularities are exposed for serialization

Today these are added in observer.py, I think it just so happens that they're called when loading that file indirectly: https://github.com/pytorch/ao/blob/main/torchao/quantization/observer.py#L356

Should I move all of these to granularity.py to be more explicit?

jerryzh168 · 2025-11-21T18:17:36Z

put this in the same place where the other granularities are exposed for serialization

Today these are added in observer.py, I think it just so happens that they're called when loading that file indirectly: main/torchao/quantization/observer.py#L356

Should I move all of these to granularity.py to be more explicit?

yeah I think just adding all used granularity would be good, and remove these from observer.py

jerryzh168 · 2025-11-21T22:53:36Z

torchao/quantization/quant_api.py

    ```
        import torch
        import torch.nn as nn
+        from torchao.quantization.granularity import PerTensor


nit: just import torchao.quantization import PerTensor is better I think

jerryzh168 · 2025-11-21T22:53:48Z

test/dtypes/test_affine_quantized_tensor_parallel.py

    Int8WeightOnlyConfig,
 )
-from torchao.quantization.observer import PerRow, PerTensor
+from torchao.quantization.granularity import PerRow, PerTensor


nit: remove granularity

jerryzh168 · 2025-11-21T22:54:20Z

test/core/test_config.py

+    code = f"""
+import torch
+import torchao
+_ = torch.load('{fname}', weights_only=True)
+    """


why do we need these instead of just code?

It's because by the time we run the test we have already imported everything. This starts a fresh environment and shows you only need to import torchao for loading to work. Copied from: https://github.com/pytorch/ao/blob/main/test/prototype/mx_formats/test_mx_serialization.py#L36

jerryzh168

lg, see comments inline

**Summary:** Following unslothai#3440, this PR extends torchao FP8 + RL support to also handle 128x128 PerBlock granularity (in addition to PerRow). **Example usage:** ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = "block", # or "row" or True ) ``` **Initial results:** TBD **Note:** - Requires pytorch/ao#3370

* Add 128x128 PerBlock FP8 + RL **Summary:** Following #3440, this PR extends torchao FP8 + RL support to also handle 128x128 PerBlock granularity (in addition to PerRow). **Example usage:** ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = "block", # or "row" or True ) ``` **Initial results:** TBD **Note:** - Requires pytorch/ao#3370 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

**Summary:** Add PerBlock to safe globals so users don't have to do this themselves when they load config.json with PerBlock. ``` WeightsUnpickler error: Unsupported global: GLOBAL torchao.quantization.granularity.PerBlock was not an allowed global by default. Please use `torch.serialization.add_safe_globals([torchao.quantization.granularity.PerBlock])` or the `torch.serialization.safe_globals([torchao.quantization.granularity.PerBlock])` context manager to allowlist this global if you trust this class/function. ``` **Test Plan:** ``` python test/core/test_config.py -k test_granularity_serialization ```

andrewor14 · 2025-11-23T20:04:59Z

Test failures don't seem related. Thanks, merging this

* Enable FP8 + RL training for bf16 models (#3440) * Enable FP8 + RL training for bf16 models **Summary:** Enable FP8 + RL training using TorchAO for 1.33x faster training and 42% less model memory usage: - We quantize the frozen LoRA weights into fp8 and keep the LoRA adapters in bf16 - We leverage TorchAO's `Float8Tensor`, which calls into fbgemm's fp8 x fp8 rowwise matmul kernel - For now, we need to do an offline quantization first, because vllm doesn't support on-the-fly quantization for torchao yet (this is in progress: vllm-project/vllm#26327) **Example usage:** ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = True, # set this to True ) \# the rest is the same as before model = FastLanguageModel.get_peft_model(...) ``` **Initial results:** ``` \# fp8 {'train_runtime': 1725.4337, 'train_samples_per_second': 0.232, 'train_steps_per_second': 0.058, 'train_loss': 0.00015715716748673002, 'epoch': 0.01} \# bf16 {'train_runtime': 2297.8145, 'train_samples_per_second': 0.174, 'train_steps_per_second': 0.044, 'train_loss': 0.00016081033063528594, 'epoch': 0.01} ``` <img width="1199" height="448" alt="Screenshot 2025-11-11 at 4 10 50 PM" src="https://github.com/user-attachments/assets/b6304afd-89e9-42b1-8064-775807e17b23" /> Test script: https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423 **Requires:** - pytorch/ao#3158 (torchao nightly or 0.15.0+) - unslothai/unsloth-zoo#351 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * _get_inference_mode_context_manager * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py * Update utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Fix/save torchao model loading logic (#3621) * make loading gpt-oss-BF16 faster. Linked to unsloth-zoo PR #314 * fix model loading and clean merged model directory * revert default quant * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert mapper.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update loader_utils.py * Update loader_utils.py * Add 128x128 PerBlock FP8 + RL (#3629) * Add 128x128 PerBlock FP8 + RL **Summary:** Following #3440, this PR extends torchao FP8 + RL support to also handle 128x128 PerBlock granularity (in addition to PerRow). **Example usage:** ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = "block", # or "row" or True ) ``` **Initial results:** TBD **Note:** - Requires pytorch/ao#3370 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Version * Update vision.py * Update rl.py * Add torch 2.9.1 * Fix auto installer * Update fp8.py * Float8 * Update fp8.py * Update mapper.py * Update mapper.py * Update loader_utils.py * Update loader.py * Update fp8.py * Versioning * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: andrewor14 <andrewor14@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>

**Summary:** Add PerBlock to safe globals so users don't have to do this themselves when they load config.json with PerBlock. ``` WeightsUnpickler error: Unsupported global: GLOBAL torchao.quantization.granularity.PerBlock was not an allowed global by default. Please use `torch.serialization.add_safe_globals([torchao.quantization.granularity.PerBlock])` or the `torch.serialization.safe_globals([torchao.quantization.granularity.PerBlock])` context manager to allowlist this global if you trust this class/function. ``` **Test Plan:** ``` python test/core/test_config.py -k test_granularity_serialization ```

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2025

andrewor14 requested review from jerryzh168 and vkuzo November 21, 2025 15:48

andrewor14 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Nov 21, 2025

andrewor14 force-pushed the per_block_safe_globals branch 2 times, most recently from 2a12d58 to fd31a98 Compare November 21, 2025 22:47

jerryzh168 reviewed Nov 21, 2025

View reviewed changes

jerryzh168 approved these changes Nov 21, 2025

View reviewed changes

andrewor14 force-pushed the per_block_safe_globals branch 2 times, most recently from bfba7fc to 150ac89 Compare November 21, 2025 23:00

andrewor14 mentioned this pull request Nov 21, 2025

Add 128x128 PerBlock FP8 + RL unslothai/unsloth#3629

Merged

andrewor14 force-pushed the per_block_safe_globals branch from 150ac89 to a71f121 Compare November 22, 2025 23:56

andrewor14 force-pushed the per_block_safe_globals branch from a71f121 to c2daf56 Compare November 23, 2025 02:02

andrewor14 merged commit b55713a into main Nov 23, 2025
16 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PerBlock to safe globals #3370

Add PerBlock to safe globals #3370

Uh oh!

andrewor14 commented Nov 21, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

vkuzo commented Nov 21, 2025

Uh oh!

andrewor14 commented Nov 21, 2025

Uh oh!

jerryzh168 commented Nov 21, 2025 •

edited

Loading

Uh oh!

jerryzh168 Nov 21, 2025

Uh oh!

jerryzh168 Nov 21, 2025

Uh oh!

jerryzh168 Nov 21, 2025

Uh oh!

andrewor14 Nov 21, 2025

Uh oh!

jerryzh168 left a comment

Uh oh!

andrewor14 commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add PerBlock to safe globals #3370

Add PerBlock to safe globals #3370

Uh oh!

Conversation

andrewor14 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3370

Uh oh!

vkuzo commented Nov 21, 2025

Uh oh!

andrewor14 commented Nov 21, 2025

Uh oh!

jerryzh168 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

andrewor14 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andrewor14 commented Nov 21, 2025 •

edited

Loading

pytorch-bot bot commented Nov 21, 2025 •

edited

Loading

jerryzh168 commented Nov 21, 2025 •

edited

Loading