MX4 quantization #2659

spcyppt · 2024-06-01T00:22:37Z

Summary:
Implement MX4 quantization-dequantization ops

Usage:
Quantization:

quantized_output = torch.ops.fbgemm.quantize_mx_cuda(
            A,
            split_sizes,
            scale_bits=8,
            ebits=2,
            mbits=3,
            max_norm=6.0f,
            mx_group_size=32,
        )

where
A is 1-D input tensor and
split_sizes is list of int containing number of elements in each rank e.g., split_sizes = [1024, 2048] for 2 ranks. Note that each value needs to be a multiple of group size.
The output is ensured to be 16-byte aligned. Anything less than 16 bytes is padded to make 16 bytes.
Given that the group_size is 32, for the best performance, the value should be a multiple of 32 x 16 = 512.

Dequantization:

dequanted_output = torch.ops.fbgemm.dequantize_mx_cuda(
            quantized_output,
            split_sizes,
            mx_group_size=32,
        )

Reviewed By: sryap

Differential Revision: D57145102

Summary: Implement MX4 quantization-dequantization ops Usage: **Quantization:** ``` quantized_output = torch.ops.fbgemm.quantize_mx_cuda( A, split_sizes, scale_bits=8, ebits=2, mbits=3, max_norm=6.0f, mx_group_size=32, ) ``` where `A` is 1-D input tensor and `split_sizes` is list of int containing number of elements in each rank e.g., `split_sizes = [1024, 2048]` for 2 ranks. Note that each value needs to be a multiple of `group size`. The output is ensured to be 16-byte aligned. Anything less than 16 bytes is padded to make 16 bytes. Given that the `group_size` is 32, for the best performance, the value should be a multiple of 32 x 16 = 512. **Dequantization:** ``` dequanted_output = torch.ops.fbgemm.dequantize_mx_cuda( quantized_output, split_sizes, mx_group_size=32, ) ``` Reviewed By: sryap Differential Revision: D57145102

facebook-github-bot · 2024-06-01T00:22:46Z

This pull request was exported from Phabricator. Differential Revision: D57145102

netlify · 2024-06-01T00:22:55Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`345e654`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/665a69e13eeda70008ec9d61
😎 Deploy Preview	https://deploy-preview-2659--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2024-06-01T00:23:00Z

This pull request was exported from Phabricator. Differential Revision: D57145102

facebook-github-bot · 2024-06-02T20:25:32Z

This pull request has been merged in eb7ccec.

facebook-github-bot added the cla signed label Jun 1, 2024

facebook-github-bot added the fb-exported label Jun 1, 2024

spcyppt force-pushed the export-D57145102 branch from b5908b8 to 345e654 Compare June 1, 2024 00:22

facebook-github-bot closed this in eb7ccec Jun 2, 2024

facebook-github-bot added the Merged label Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MX4 quantization #2659

MX4 quantization #2659

spcyppt commented Jun 1, 2024

facebook-github-bot commented Jun 1, 2024

netlify bot commented Jun 1, 2024 •

edited

Loading

facebook-github-bot commented Jun 1, 2024

facebook-github-bot commented Jun 2, 2024

MX4 quantization #2659

MX4 quantization #2659

Conversation

spcyppt commented Jun 1, 2024

facebook-github-bot commented Jun 1, 2024

netlify bot commented Jun 1, 2024 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Jun 1, 2024

facebook-github-bot commented Jun 2, 2024

netlify bot commented Jun 1, 2024 •

edited

Loading