Delete rowwise_scaled_linear_cutlass kernels and APIs by jerryzh168 · Pull Request #3723 · pytorch/ao

jerryzh168 · 2026-01-26T20:15:28Z

Stack from ghstack (oldest at bottom):

-> Delete rowwise_scaled_linear_cutlass kernels and APIs #3723

Summary:
Deleted Int4DynamicActivationInt4WeightConfig and CutlassInt4PackedLayout of Int8DynamicActivationInt4WeightConfig top level API and related kernels since these are not used

BC breaking note:
We are removing Int4DynamicActivationInt4WeightConfig and CutlassInt4PackedLayout option of Int8DynamicActivationInt4WeightConfig

0.15.0

config = Int8DynamicActivationInt4WeightConfig(
  group_size=None,
  mapping_type=MappingType.SYMMETRIC,
  act_mapping_type=MappingType.SYMMETRIC,
  layout=CutlassInt4PackedLayout(),
)
quantize_(model, config)

config = Int4DynamicActivationInt4WeightConfig()
quantize_(model, config)

0.16.0
Both configs are dropped. Please use torchao <= 0.15.0 to use these configs

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: TODO: Files Deleted: 1. CUDA source files: - torchao/csrc/cuda/rowwise_scaled_linear_cutlass/rowwise_scaled_linear_cutlass.cuh - torchao/csrc/cuda/rowwise_scaled_linear_cutlass/rowwise_scaled_linear_cutlass_s4s4.cu - torchao/csrc/cuda/rowwise_scaled_linear_cutlass/rowwise_scaled_linear_cutlass_s8s4.cu - torchao/csrc/cuda/rowwise_scaled_linear_cutlass/README.md 2. Python layout files: - torchao/prototype/dtypes/uintx/cutlass_int4_packed_layout.py - torchao/dtypes/uintx/cutlass_int4_packed_layout.py 3. Test and benchmark files: - test/test_ops_rowwise_scaled_linear_cutlass.py - benchmarks/benchmark_rowwise_scaled_linear_cutlass.py Files Modified: 1. torchao/ops.py - Removed lib.define for rowwise_scaled_linear_cutlass_s8s4 and rowwise_scaled_linear_cutlass_s4s4, and their function implementations 2. torchao/csrc/README.md - Removed reference to cutlass naming convention 3. torchao/dtypes/affine_quantized_tensor_ops.py - Removed imports and dispatch registrations for cutlass int4 4. torchao/prototype/dtypes/__init__.py and torchao/prototype/dtypes/uintx/__init__.py - Removed CutlassInt4PackedLayout exports 5. torchao/dtypes/__init__.py - Removed CutlassInt4PackedLayout import and export 6. torchao/quantization/__init__.py and torchao/quantization/quant_api.py - Removed CutlassInt4PackedLayout and Int4DynamicActivationInt4WeightConfig exports 7. torchao/prototype/quantization/quant_api.py - Removed Int4DynamicActivationInt4WeightConfig class and updated Int8DynamicActivationInt4WeightConfig to remove CutlassInt4PackedLayout handling 8. Test files (test/dtypes/test_affine_quantized.py, test/dtypes/test_uintx.py, test/quantization/test_quant_api.py, test/core/test_config.py, test/integration/test_vllm.py) - Removed all references to CutlassInt4PackedLayout and Int4DynamicActivationInt4WeightConfig 9. benchmarks/microbenchmarks/utils.py - Removed int8adq-int4w-symm quantization option Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2026-01-26T20:15:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3723

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 832e40e with merge base 79372e7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: TODO: Files Deleted: 1. CUDA source files: - torchao/csrc/cuda/rowwise_scaled_linear_cutlass/rowwise_scaled_linear_cutlass.cuh - torchao/csrc/cuda/rowwise_scaled_linear_cutlass/rowwise_scaled_linear_cutlass_s4s4.cu - torchao/csrc/cuda/rowwise_scaled_linear_cutlass/rowwise_scaled_linear_cutlass_s8s4.cu - torchao/csrc/cuda/rowwise_scaled_linear_cutlass/README.md 2. Python layout files: - torchao/prototype/dtypes/uintx/cutlass_int4_packed_layout.py - torchao/dtypes/uintx/cutlass_int4_packed_layout.py 3. Test and benchmark files: - test/test_ops_rowwise_scaled_linear_cutlass.py - benchmarks/benchmark_rowwise_scaled_linear_cutlass.py Files Modified: 1. torchao/ops.py - Removed lib.define for rowwise_scaled_linear_cutlass_s8s4 and rowwise_scaled_linear_cutlass_s4s4, and their function implementations 2. torchao/csrc/README.md - Removed reference to cutlass naming convention 3. torchao/dtypes/affine_quantized_tensor_ops.py - Removed imports and dispatch registrations for cutlass int4 4. torchao/prototype/dtypes/__init__.py and torchao/prototype/dtypes/uintx/__init__.py - Removed CutlassInt4PackedLayout exports 5. torchao/dtypes/__init__.py - Removed CutlassInt4PackedLayout import and export 6. torchao/quantization/__init__.py and torchao/quantization/quant_api.py - Removed CutlassInt4PackedLayout and Int4DynamicActivationInt4WeightConfig exports 7. torchao/prototype/quantization/quant_api.py - Removed Int4DynamicActivationInt4WeightConfig class and updated Int8DynamicActivationInt4WeightConfig to remove CutlassInt4PackedLayout handling 8. Test files (test/dtypes/test_affine_quantized.py, test/dtypes/test_uintx.py, test/quantization/test_quant_api.py, test/core/test_config.py, test/integration/test_vllm.py) - Removed all references to CutlassInt4PackedLayout and Int4DynamicActivationInt4WeightConfig 9. benchmarks/microbenchmarks/utils.py - Removed int8adq-int4w-symm quantization option Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 44b139f Pull Request resolved: #3723

andrewor14

Looks great. Also cc @jcaip

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 26, 2026

jerryzh168 added the module: deprecation Use this tag if this PR deprecates a feature label Jan 26, 2026

andrewor14 mentioned this pull request Jan 26, 2026

Making torchao ABI stable and moving closer to python only #3516

Closed

jerryzh168 requested review from andrewor14 and howardzhang-cv January 26, 2026 20:59

andrewor14 added the module: bc-breaking Use this tag if this PR breaks backward compatibility label Jan 26, 2026

andrewor14 approved these changes Jan 26, 2026

View reviewed changes

jerryzh168 changed the base branch from gh/jerryzh168/26/base to main January 26, 2026 23:10

jerryzh168 merged commit 9b9d558 into main Jan 27, 2026
37 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete rowwise_scaled_linear_cutlass kernels and APIs#3723

Delete rowwise_scaled_linear_cutlass kernels and APIs#3723
jerryzh168 merged 1 commit intomainfrom
gh/jerryzh168/26/head

jerryzh168 commented Jan 26, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

andrewor14 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jerryzh168 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3723

✅ No Failures

Uh oh!

andrewor14 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jerryzh168 commented Jan 26, 2026 •

edited

Loading

pytorch-bot bot commented Jan 26, 2026 •

edited

Loading