Allow Partitioner to Force Dynamic Linear Computation #5338
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Motivation
A current drawback to XNNPACK is that weights are duplicated across delegate instances if they do not soley belong to one partition. For ops like LSTM, they use the same few weights and bias's in multiple linear nodes. This can explode out LSTM as we have to duplicate the LSTM Weight/Bias for every instance of linear.
XNNPACK has dynamic linear in which weights are given at runtime, rather than packed AoT. This allows us to force the partitioner to not partition weights so XNNPACK delegate does not own the weights, and thus does not duplicate them. This is only supported for FP32 weights atm, but we can leverage this to balance between slower perf with smaller file sizes.
Differential Revision: D62621998