Add support for flashinfer quantize kernel option for nvfp4#3912
Add support for flashinfer quantize kernel option for nvfp4#3912jerryzh168 wants to merge 21 commits intogh/jerryzh168/43/basefrom
Conversation
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3912
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled JobAs of commit 452f27a with merge base 15df843 ( CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7ec4b65 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 39bdea0 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 834531d Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1a9b7b1 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0ea6062 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5480f76 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 51072c2 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: cb5cda5 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 6d8af1c Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2d70cb7 Pull Request resolved: #3912
| @torch.no_grad() | ||
| def test_triton_nvfp4_quantize_equivalence(M, N, use_per_tensor_scale, dtype): | ||
| """Test that Triton and PyTorch NVFP4 quantization produce equivalent results.""" | ||
| def test_kernel_choice_numerical_equivalence(M, N, use_per_tensor_scale, dtype): |
There was a problem hiding this comment.
clarify this is for quantization from bf16 to nvfp4
There was a problem hiding this comment.
will rename to test_quantize_to_nvfp4_kernel_numerical_equivalence
original test is testing both fp32 and bf16, please let me know if you feel we should remove fp32 test case as well
| # For kernel choices that use the same quantization algorithm as TORCH | ||
| # (TRITON should be bitwise identical), verify internal data matches exactly | ||
| if kc == NVFP4QuantizeKernelChoice.TRITON: | ||
| torch.testing.assert_close( |
There was a problem hiding this comment.
add check to ensure bitwise match
|
Can we include performance benchmarking in this PR?
|
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d0132c3 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f281c57 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fe20eba Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b1c5919 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d86b009 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0db4b3f Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5cd280a Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 50ce441 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 08be951 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4e6c022 Pull Request resolved: #3912
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different nvfp4_quantize_kernel_choice options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence We'll test speedup a bit later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Added the flashinfer option for better performance on some of the workflow we are interested in, also added numerical equivalence test between different quantize_kernel_preference options Test Plan: pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3577f51 Pull Request resolved: #3912
|
seems mslk kernels can give us similar performance as flashinfer kernel, this is no longer needed |
Stack from ghstack (oldest at bottom):
use_triton_kernelto usenvfp4_quantize_kernel_choice#3911Summary:
Added the flashinfer option for better performance on some of the workflow
we are interested in, also added numerical equivalence test between different
nvfp4_quantize_kernel_choice options
Test Plan:
pytest test/prototype/mx_formats/test_nvfp4_tensor.py -k test_kernel_preference_numerical_equivalence
perf test: #4031
We'll test speedup a bit later
Reviewers:
Subscribers:
Tasks:
Tags: