Add test case generator for groupwise low bit LUT based quantization #2359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

szyszyzys merged 4 commits into pytorch:main from szyszyzys:lut_test_generation

Jun 13, 2025

Contributor

szyszyzys commented Jun 11, 2025

No description provided.


          Add test case generator for groupwise low bit LUT based quantization …

b437e39

…kernel

pytorch-bot bot commented Jun 11, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2359

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Pending

As of commit fdce227 with merge base d72a6d1 ():

NEW FAILURE - The following job has failed:

Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_True_dynamic_True

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

szyszyzys added the topic: not user facing label

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h Outdated

+                  const int lut_size = 1 << weight_nbit;
+                  // Generate random quantized indices (this remains the same)
+                  auto weight_qvals = std::vector<uint8_t>(total_weights);

Contributor

metascroy Jun 11, 2025

Can you use get_random_lowbit_vector for this?

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h Outdated

+                  std::vector<float> base_codebook(lut_size);
+                  float start_val = -(static_cast<float>(lut_size) / 2.0f) + 0.5f;
+                  for(int i = 0; i < lut_size; ++i) {
+                      base_codebook[i] = start_val + i;

Contributor

metascroy Jun 11, 2025

Why can't base codebook just be output of get_random_vector?

Contributor Author

szyszyzys Jun 11, 2025

Yes we can. I tried to make it simpler as the scale is randomly generated.

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h Outdated

+                  }
+                  // 2c. Create the final LUTs by scaling the base codebook for each group
+                  std::vector<float> weight_luts(num_weight_groups * lut_size);

Contributor

metascroy Jun 11, 2025

There are two group sizes here. There is a group size for the LUT (e.g., if we have 2 LUTs for 100 values, then the lut_group_size is 50; you could also represent this with n_luts).

There is also the group size for the scale. For example, if we have 4 scales for 100 values, then the scale_group_size is 25).

Contributor Author

szyszyzys Jun 11, 2025

I used the single LUT for save all the luts. Will update this part.

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h Outdated

+                        float activation_val = activations[m_idx * k + k_idx];
+                        int weight_idx = n_idx * k + k_idx;
+                        int group_idx = weight_idx / weight_group_size;
+                        uint8_t lut_index = weight_qvals[weight_idx];

Contributor

metascroy Jun 11, 2025

weight_qvals looks more like weight_qval_idxs?

Contributor Author

szyszyzys Jun 11, 2025

Yes, gonna do some renaming for consistentcy

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h Outdated

+                        int weight_idx = n_idx * k + k_idx;
+                        int group_idx = weight_idx / weight_group_size;
+                        uint8_t lut_index = weight_qvals[weight_idx];
+                        float weight_dequant_val = weight_luts[group_idx * lut_size + lut_index];

Contributor

metascroy Jun 11, 2025

Where is the scale applied?

Contributor Author

szyszyzys Jun 11, 2025 •

edited

Loading

scale applied on line 672. I can move it here.


          Add granularity to LUT and scale generation in test cases

e2a3bce

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h Outdated

+                 */
+                static groupwise_lowbit_weight_lut_test_case generate_with_grouping(
+                    int m, int k, int n,
+                    int weight_group_size, int scale_group_size, int lut_group_size, int weight_nbit,

Contributor

metascroy Jun 12, 2025

What is weight_group_size?

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h

+                  for (int i = 0; i < num_weight_groups; ++i) group_to_lut_map[i] = lut_map_dis(gen);
+                  // 2c. Generate random quantized indices for each weight.
+                  auto weight_qval_indices = std::vector<uint8_t>(total_weights);

Contributor

metascroy Jun 12, 2025

Why can't we use get_random_lowbit_vector?


          Update LUT test case generation. scale_group_size and lut_group_size …

d1c825c

…control the frequency of group change.

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h

+                  // 2b. Generate random quantized indices for each weight.
+                  auto weight_qval_indices = std::vector<uint8_t>(total_weights);
+                  std::uniform_int_distribution<int> qval_dis(0, lut_size - 1);
+                  for (int i = 0; i < total_weights; ++i) weight_qval_indices[i] = static_cast<uint8_t>(qval_dis(gen));

Contributor

metascroy Jun 12, 2025

@szyszyzys why can't we use get_random_lowbit_vector here?

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h Outdated

+                    for (int k_idx = 0; k_idx < k; ++k_idx) {
+                      float activation_val = activations[m_idx * k + k_idx];
+                      int weight_idx = n_idx * k + k_idx;
+                      uint8_t qval = weight_qval_indices[weight_idx];

Contributor

metascroy Jun 12, 2025

nit: qval_idx

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h

+                 */
+                static groupwise_lowbit_weight_lut_test_case generate_per_group(
+                  int m, int k, int n,
+                  int group_size, // The size of the block for both scales and LUTs

Contributor

metascroy Jun 12, 2025

If they are the same, we'd just integrate the scales with the LUTs, no?

metascroy reviewed

View reviewed changes

torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h

+                /**
+                 * @brief OVERLOAD 2: Advanced generator with separate grouping for scales and LUTs.
+                 */
+                static groupwise_lowbit_weight_lut_test_case generate_with_decoupled_grouping(

Contributor

metascroy Jun 12, 2025

Let's add a flag for has_scales. When set to false, make all the scales 1.0.

metascroy approved these changes

View reviewed changes

Contributor

metascroy left a comment

Overall looks great! Approving PR, left a few comments.

Contributor

metascroy commented Jun 12, 2025

Overall looks great! Approving PR, left a few comments.

Looks like there are some CI errors as well

szyszyzys force-pushed the lut_test_generation branch 2 times, most recently from fca5d8c to 2c9b99b Compare

June 12, 2025 20:40


          Add has_scales tag to the LUT test case generation

fdce227

szyszyzys force-pushed the lut_test_generation branch from 2c9b99b to fdce227 Compare

June 13, 2025 17:28

szyszyzys merged commit 6243040 into pytorch:main

17 of 19 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed topic: not user facing