Skip to content

Conversation

muchulee8
Copy link
Contributor

@muchulee8 muchulee8 commented Mar 27, 2024

Stack from ghstack (oldest at bottom):

Summary:

We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.

Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Differential Revision: D55433203

Summary:

We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.

Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Mar 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122762

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c4c3361 with merge base f2c1060 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: quantization release notes category labels Mar 27, 2024
@muchulee8
Copy link
Contributor Author

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@muchulee8
Copy link
Contributor Author

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:

We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.

Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203)

[ghstack-poisoned]
@muchulee8
Copy link
Contributor Author

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:

We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.

Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203)

[ghstack-poisoned]
@muchulee8
Copy link
Contributor Author

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 27, 2024
Summary:

We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.

Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203)

[ghstack-poisoned]
muchulee8 added a commit that referenced this pull request Mar 28, 2024
Summary:

We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.

Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16

ghstack-source-id: ce0eb1b
Pull Request resolved: #122762
@muchulee8
Copy link
Contributor Author

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@muchulee8
Copy link
Contributor Author

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@muchulee8
Copy link
Contributor Author

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Mar 28, 2024
Summary:
We add wrappers for fbgemm's packing so we can pass it through PT2 to
lowering phase of AOTInductor.

Test Plan:
Included in commit.
test_quantized_ops::test_wrapped_fbgemm_linear_fp16

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D55433204](https://our.internmc.facebook.com/intern/diff/D55433204)
Pull Request resolved: #122763
Approved by: https://github.com/jerryzh168
ghstack dependencies: #122762
sanketpurandare pushed a commit to sanketpurandare/pytorch that referenced this pull request Apr 22, 2024
Summary:

We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.

Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16

Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203)
Pull Request resolved: pytorch#122762
Approved by: https://github.com/jerryzh168
sanketpurandare pushed a commit to sanketpurandare/pytorch that referenced this pull request Apr 22, 2024
Summary:
We add wrappers for fbgemm's packing so we can pass it through PT2 to
lowering phase of AOTInductor.

Test Plan:
Included in commit.
test_quantized_ops::test_wrapped_fbgemm_linear_fp16

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D55433204](https://our.internmc.facebook.com/intern/diff/D55433204)
Pull Request resolved: pytorch#122763
Approved by: https://github.com/jerryzh168
ghstack dependencies: pytorch#122762
@github-actions github-actions bot deleted the gh/muchulee8/28/head branch April 28, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) release notes: quantization release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants