-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add quantized.linear_unpacked_dynamic_fp16 #122762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format. This op does packing on the fly for each call with standard at::Tensor weight & bias. Test Plan: Included in commit. test_quantized_op::test_unpacked_qlinear_dynamic_fp16 [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122762
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c4c3361 with merge base f2c1060 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format. This op does packing on the fly for each call with standard at::Tensor weight & bias. Test Plan: Included in commit. test_quantized_op::test_unpacked_qlinear_dynamic_fp16 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203) [ghstack-poisoned]
@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format. This op does packing on the fly for each call with standard at::Tensor weight & bias. Test Plan: Included in commit. test_quantized_op::test_unpacked_qlinear_dynamic_fp16 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203) [ghstack-poisoned]
@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format. This op does packing on the fly for each call with standard at::Tensor weight & bias. Test Plan: Included in commit. test_quantized_op::test_unpacked_qlinear_dynamic_fp16 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203) [ghstack-poisoned]
Summary: We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format. This op does packing on the fly for each call with standard at::Tensor weight & bias. Test Plan: Included in commit. test_quantized_op::test_unpacked_qlinear_dynamic_fp16 ghstack-source-id: ce0eb1b Pull Request resolved: #122762
@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
2 similar comments
@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: We add wrappers for fbgemm's packing so we can pass it through PT2 to lowering phase of AOTInductor. Test Plan: Included in commit. test_quantized_ops::test_wrapped_fbgemm_linear_fp16 Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55433204](https://our.internmc.facebook.com/intern/diff/D55433204) Pull Request resolved: #122763 Approved by: https://github.com/jerryzh168 ghstack dependencies: #122762
Summary: We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format. This op does packing on the fly for each call with standard at::Tensor weight & bias. Test Plan: Included in commit. test_quantized_op::test_unpacked_qlinear_dynamic_fp16 Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203) Pull Request resolved: pytorch#122762 Approved by: https://github.com/jerryzh168
Summary: We add wrappers for fbgemm's packing so we can pass it through PT2 to lowering phase of AOTInductor. Test Plan: Included in commit. test_quantized_ops::test_wrapped_fbgemm_linear_fp16 Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55433204](https://our.internmc.facebook.com/intern/diff/D55433204) Pull Request resolved: pytorch#122763 Approved by: https://github.com/jerryzh168 ghstack dependencies: pytorch#122762
Stack from ghstack (oldest at bottom):
Summary:
We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.
Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10
Differential Revision: D55433203