[X86] Update smooth quant fusion pattern#3784
Conversation
Signed-off-by: Cui, Lily <lily.cui@intel.com>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3784
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 8 New Failures, 1 Cancelled JobAs of commit a3d1fd9 with merge base 9472d7d ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Xia-Weiwen
left a comment
There was a problem hiding this comment.
Thanks for the PR. Please also add a UT.
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Moved to the other pr. Signed-off-by: Cui, Lily <lily.cui@intel.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
|
Claude finished @Xia-Weiwen's task in 2m 51s —— View job Code Review: PR #3784 - Update smooth quant fusion pattern (Round 7)
SummarySince round 6, two new commits have been added:
The x86.py inductor pass code is unchanged from round 5/6. Issues1.
|
Signed-off-by: Cui, Lily <lily.cui@intel.com>
|
@claude review again |
|
Claude finished @Xia-Weiwen's task in 2m 8s —— View job Code Review: PR #3784 - Update smooth quant fusion pattern (Round 8)
SummarySince round 7, two new commits have been added:
The x86.py inductor pass pattern code is unchanged from rounds 5-7. Remaining Minor Items1.
|
| Union[Granularity, Tuple[Granularity, Granularity], list[Granularity]] | ||
| ] = PerRow() | ||
| set_inductor_config: bool = True | ||
| reduce_range: Optional[bool] = False |
There was a problem hiding this comment.
Thanks for reviewing. This is for compatibility purpose (per review comments from claude). If it is not optional, it may break backward compatibility if users saved quantized model with old torchao and now want to load it with the new torchao.
| "Please set it to MappingType.SYMMETRIC or " | ||
| "MappingType.ASYMMETRIC." | ||
| ) | ||
| assert self.reduce_range in (True, False), ( |
There was a problem hiding this comment.
if this is only for x86, maybe do a check to make sure user is using cpu when the flag is set to True?
There was a problem hiding this comment.
If it's not on CPU, reduce_range still works so not necessary to check I think.
There was a problem hiding this comment.
probably better to add the check if we don't expect other backend to use it, this will help us decrease the user surface and make it less error prone for users I think
| "Please set it to MappingType.SYMMETRIC or " | ||
| "MappingType.ASYMMETRIC." | ||
| ) | ||
| assert self.reduce_range in (True, False), ( |
|
seems reduce_range is not tested, I'd recommend split the reduce_range change and the fusion change in two separate PRs to make it clearer |
Hi @jerryzh168 Thanks for reviewing. We add this flag because some or all of the CI machines do not have AVX512_VNNI and the test cases would fail. So, we only test |
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Thanks for comments, split the reduce_range to the PR #4266 |
Signed-off-by: Cui, Lily <yuxin.cui@intel.com>
jerryzh168
left a comment
There was a problem hiding this comment.
@Xia-Weiwen please feel free to be the codeowner for cpu only changes
you can add yourself to https://github.com/pytorch/ao/blob/main/CODEOWNERS as well for relevant files
can this be set automatically when we are in CPU + not in maybe one question here is do we always prepare and serve the quantized model in the same type of machine or not |
Thanks for comments, I removed the manual reduce_range setting and set it automatically in int8 tensor. |
Signed-off-by: Cui, Lily <yuxin.cui@intel.com>
Signed-off-by: Cui, Lily <yuxin.cui@intel.com>
Signed-off-by: Cui, Lily <yuxin.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
|
@claude Review again |
|
Claude finished @Xia-Weiwen's task in 3m 46s —— View job Code Review: PR #3784 - Update smooth quant fusion pattern (Round 9)
SummarySince the last review (round 8), the branch has been rebased/merged with upstream main (commit I reviewed the full diff (4 files, ~430 additions, ~200 deletions) against the upstream merge parent. Changes Overview
Issues1.
|
The SmoothQuant implementation has changed in Torchao recently, so we need to update the fusion patterns in FX graph passes of Inductor accordingly.
Besides, we added a
reduce_rangeargument inInt8Tensorto accommodate CPUs without AVX512_VNNI to avoid overflow.Unit tests are updated too.