Skip to content

Conversation

@mcr229
Copy link
Contributor

@mcr229 mcr229 commented Sep 13, 2024

Summary:

Motivation

A current drawback to XNNPACK is that weights are duplicated across delegate instances if they do not soley belong to one partition. For ops like LSTM, they use the same few weights and bias's in multiple linear nodes. This can explode out LSTM as we have to duplicate the LSTM Weight/Bias for every instance of linear.

XNNPACK has dynamic linear in which weights are given at runtime, rather than packed AoT. This allows us to force the partitioner to not partition weights so XNNPACK delegate does not own the weights, and thus does not duplicate them. This is only supported for FP32 weights atm, but we can leverage this to balance between slower perf with smaller file sizes.

Differential Revision: D62621998

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5338

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b566387 with merge base aa1bcc3 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 13, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62621998

Summary:
Pull Request resolved: pytorch#5338

# Motivation
A current drawback to XNNPACK is that weights are duplicated across delegate instances if they do not soley belong to one partition. For ops like LSTM, they use the same few weights and bias's in multiple linear nodes. This can explode out LSTM as we have to duplicate the LSTM Weight/Bias for every instance of linear.

XNNPACK has dynamic linear in which weights are given at runtime, rather than packed AoT. This allows us to force the partitioner to not partition weights so XNNPACK delegate does not own the weights, and thus does not duplicate them. This is only supported for FP32 weights atm, but we can leverage this to balance between slower perf with smaller file sizes.

Differential Revision: D62621998
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62621998

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 71602a0.

@mcr229 mcr229 deleted the export-D62621998 branch July 25, 2025 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants