-
Notifications
You must be signed in to change notification settings - Fork 563
how to mark KJT offsets as dynamic? #2202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This pull request was exported from Phabricator. Differential Revision: D59201188 |
Differential Revision: D59172050
Summary: Pull Request resolved: meta-pytorch#2202 # context 1. KJT contains three necessary tensors: `_values`, `_lengths`, `_offsets` **a.** the shape of `_values` is independent **b.** dim(`_lengths`) = dim(`batch_size`) * const(`len(kjt.keys())`) **c.** dim(`_offsets`) = dim(`lengths`) + 1 2. `_lengths` and `_offsets` can be calculated from the other, so usually a KJT only stores one is the memory and calculate the other when needed. 3. previously only the `_lengths` is marked as dynamic shape, because `batch_size` and `len(kjt.keys())` are constant across iterations. 4. however, when we declare a KJT has both `_values` and `_offsets` as the dynamic shape, it won't pass the export function # notes 1. the `feature2` in the test has **NO** impact on the failure because it errors out before `feature2` is used 2. the error is purely due to the change that marks `_offsets` as dynamic. # investigation * `_offsets` is set to `3 * batch_size + 1` as shown below: ``` {'features': [(<class 'torchrec.ir.utils.vlen1'>,), None, None, (<class 'torch.export.dynamic_shapes.3*batch_size1 + 1'>,)]} ``` * dynamic_shape `s1` is created for `_offsets`, dynamic_shape `s2` is craeted for `batch_size` * why there is no `s1 == 3*batch_size + 1`? ``` 0702 09:50:39.181000 140316068409792 torch/fx/experimental/symbolic_shapes.py:3575] create_symbol s1 = 7 for L['args'][0][0]._offsets.size()[0] [2, 12884901886] (_export/non_strict_utils.py:93 in fakify), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s1" V0702 09:50:39.183000 140316068409792 torch/fx/experimental/symbolic_shapes.py:5189] eval False == False [statically known] I0702 09:50:39.190000 140316068409792 torch/fx/experimental/symbolic_shapes.py:3575] create_symbol s2 = 2 for batch_size1 [2, 4294967295] (export/dynamic_shapes.py:569 in _process_equalities), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s2" V0702 09:50:39.267000 140316068409792 torch/fx/experimental/symbolic_shapes.py:5189] eval ((s1 - 1)//3) >= 0 == True [statically known] I0702 09:50:39.273000 140316068409792 torch/fx/experimental/symbolic_shapes.py:5104] eval Ne(((s1 - 1)//3), 0) [guard added] (_subclasses/functional_tensor.py:134 in __new__), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Ne(((s1 - 1)//3), 0)" V0702 09:50:39.322000 140316068409792 torch/fx/experimental/symbolic_shapes.py:4736] _update_var_to_range s1 = VR[7, 7] (update) I0702 09:50:39.330000 140316068409792 torch/fx/experimental/symbolic_shapes.py:4855] set_replacement s1 = 7 (range_refined_to_singleton) VR[7, 7] ``` # resolve the issue * there is an internal flag `_allow_complex_guards_as_runtime_asserts=True` can support this correlation * before ``` ep = torch.export.export( model, (feature1,), {}, dynamic_shapes=collection.dynamic_shapes(model, (feature1,)), strict=False, # Allows KJT to not be unflattened and run a forward on unflattened EP preserve_module_call_signature=tuple(sparse_fqns), ) ``` * after ``` ep = torch.export._trace._export( model, (feature1,), {}, dynamic_shapes=collection.dynamic_shapes(model, (feature1,)), strict=False, # Allows KJT to not be unflattened and run a forward on unflattened EP preserve_module_call_signature=tuple(sparse_fqns), _allow_complex_guards_as_runtime_asserts=True, ) ``` Differential Revision: D59201188
This pull request was exported from Phabricator. Differential Revision: D59201188 |
80c8f0f
to
75fa081
Compare
TroyGarden
pushed a commit
to TroyGarden/torchrec
that referenced
this pull request
Jul 27, 2024
Summary: Pull Request resolved: meta-pytorch#2202 # context * Discussed this ctx.new_dynamic_size() with Angela and she told me it's just to register a brand new dynamic shape for the batch_size. so I traced back why my registering batch_size as dynamic shape has so many problem. * all the complexity really comes from this line(s): serializer.py: batch_size = id_list_features.stride() * we get the batch_size from this function, which is derived from either "_maybe_compute_stride_kjt" or other sources. * Especially this line stride = lengths.numel() // len(keys), explained a lot of error messages I encountered before. * this correlation between the lengths.numel() and the batch_size really complicates the dynamic shape guards. for this I made up many artificial workarounds like: utils.py # details * we still declare/register the dynamic shape in the mark_dynamic_kjt function, * but pass the _dynamic_batch_size to kjt and retrieve it from the serializer's meta_forward. Differential Revision: D59201188
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
fb-exported
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
context
_values
,_lengths
,_offsets
a. the shape of
_values
is independentb. dim(
_lengths
) = dim(batch_size
) * const(len(kjt.keys())
)c. dim(
_offsets
) = dim(lengths
) + 1_lengths
and_offsets
can be calculated from the other, so usually a KJT only stores one is the memory and calculate the other when needed._lengths
is marked as dynamic shape, becausebatch_size
andlen(kjt.keys())
are constant across iterations._values
and_offsets
as the dynamic shape, it won't pass the export functionnotes
feature2
in the test has NO impact on the failure because it errors out beforefeature2
is used_offsets
as dynamic.investigation
_offsets
is set to3 * batch_size + 1
as shown below:s1
is created for_offsets
, dynamic_shapes2
is craeted forbatch_size
s1 == 3*batch_size + 1
?resolve the issue
_allow_complex_guards_as_runtime_asserts=True
can support this correlationDifferential Revision: D59201188