[dynamo] Add guards for `traceable_tensor_subclass` ctx metadata #114469

jon-chuang · 2023-11-23T16:26:57Z

Add a new storage to the guard function globals: ___stored_objs_by_id, which stores arbitrary objects to perform == testing on.
Store a deep copy of the __tensor_flatten__ ctx in ___stored_objs_by_id, to perform equality testing with input tensor's

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @rohan-varma @aakhundov @kiukchung @LucasLLC

pytorch-bot · 2023-11-23T16:27:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114469

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 2ec50ba with merge base 1a3dbf5 ():

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge, unstable) (gh)
test_unary_ufuncs.py::TestUnaryUfuncsCPU::test_reference_numerics_large__refs_special_multigammaln_mvlgamma_p_1_cpu_bfloat16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jon-chuang · 2023-11-23T16:29:12Z

torch/_dynamo/guards.py

@@ -734,6 +737,15 @@ def TENSOR_MATCH(self, guard: Guard, value=None):
                self.tensor_check_examples.append(value)
                self.tensor_check_guards.append(guard)

+            # Nested Tensor ctx - "ragged_size" is symint placeholder, "requires_grad" is guarded upon


cc @ezyang on whether this treatment of the symint ragged_size (i.e. ignore guarding on it) is the right one.

mentioned this in the other PR, but after #114311 lands, there should no longer be a need for special treatment of NestedTensor

yep, I think now that the linked PR landed, all of the metadata returned by NestedTensor.__tensor_flatten__() is now constant. So it should be fine not to special-case here.

jon-chuang · 2023-11-23T17:37:43Z

Also possibly worth adding to testing path: SparseSemiStructuredTensor

(follow up PR, it is currently broken and mistakenly not on testing path, see: #114477)

jon-chuang · 2023-11-23T19:20:49Z

torch/_dynamo/guards.py

+                if ctx is not None:
+                    # Assume that the ctx obeys object equality
+                    obj_store = self.get("G['___stored_objs_by_id']")
+                    obj_store[id(ctx)] = copy.deepcopy(ctx)


Potentially controversial deepcopy; it is assumed that apart from NestedTensor's symint, ctx are relatively simple objects (as is true for AsyncTensor, DTensor, and SparseSemiStructuredTensor)

The alternate is to use str repr and this would also be serializable (e.g. to be in line with #114499). It would work well for standard python containers/types and dataclasses.

@voznesenskym do you think we should use object equality as the default as per this PR or use a str repr?

str repr as the default seems risky to me.

The reason we need a copy is that the original ctx object may be mutated. So we need to freeze it at the time when the guard is created.

I suppose you're right that we would get very strange behavior if the ctx object was mutated at some point and we referenced it directly in our guards.

One upside of str is that we want to (eventually) have more things be cacheable on disk, to improve compile times during warm start.

Are there asserts you think we should add to everything in the ctx that would convince us that str() or something similar is reasonable?

wondering instead of saving the object directly or use object str, shall we use the object's hash instead?

The problem of str is that it might not be serializable in every case, i.e. if the object defines a __str__ in addition to __repr__, then I think str would give a readable representation instead of serializable representation.

jon-chuang · 2023-11-24T13:48:57Z

test/distributed/_tensor/test_dtensor_compile.py

+
+        x = DTensor.from_local(torch.rand(2, 2), mesh, [Shard(0)], run_check=False)
+        x2 = DTensor.from_local(torch.rand(2, 2), mesh, [Shard(0)], run_check=False)
+        x3 = DTensor.from_local(torch.rand(2, 2), mesh, [Shard(1)], run_check=False)


cc @wanchaol for whether this behaviour is desirable/expected.

Nice! Yeah I think this is expected behavior, if input sharding changes, we should trigger recompilation. Thanks for adding the tests!

…subclasses" Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted as there's no current use case) * Add an `outer_size` argument to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size (you are expected to return a tensor that compares equal to the outer symbols) * Needed for `NestedTensor` at least * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Introduce `SubclassCreateSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called * Addresses this: https://github.com/pytorch/pytorch/blob/6a86cf00adb071fe7200229cc1996dfefbb79289/torch/_dynamo/variables/builder.py#L1750 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu tianyu-l [ghstack-poisoned]

Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted as there's no current use case) * Add an `outer_size` argument to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size (you are expected to return a tensor that compares equal to the outer symbols) * Needed for `NestedTensor` at least * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Introduce `SubclassCreateSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called * Addresses this: https://github.com/pytorch/pytorch/blob/6a86cf00adb071fe7200229cc1996dfefbb79289/torch/_dynamo/variables/builder.py#L1750 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu tianyu-l [ghstack-poisoned]

…subclasses" Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted as there's no current use case) * Add an `outer_size` argument to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size (you are expected to return a tensor that compares equal to the outer symbols) * Needed for `NestedTensor` at least * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Introduce `SubclassCreateSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called * Addresses this: https://github.com/pytorch/pytorch/blob/6a86cf00adb071fe7200229cc1996dfefbb79289/torch/_dynamo/variables/builder.py#L1750 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu tianyu-l [ghstack-poisoned]

Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted as there's no current use case) * Add an `outer_size` argument to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size (you are expected to return a tensor that compares equal to the outer symbols) * Needed for `NestedTensor` at least * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Introduce `SubclassCreateSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called * Addresses this: https://github.com/pytorch/pytorch/blob/6a86cf00adb071fe7200229cc1996dfefbb79289/torch/_dynamo/variables/builder.py#L1750 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu tianyu-l [ghstack-poisoned]

…subclasses" Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted as there's no current use case) * Add an `outer_size` argument to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size (you are expected to return a tensor that compares equal to the outer symbols) * Needed for `NestedTensor` at least * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Introduce `SubclassCreateSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called * Addresses this: https://github.com/pytorch/pytorch/blob/6a86cf00adb071fe7200229cc1996dfefbb79289/torch/_dynamo/variables/builder.py#L1750 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu tianyu-l [ghstack-poisoned]

Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called * Addresses this: https://github.com/pytorch/pytorch/blob/6a86cf00adb071fe7200229cc1996dfefbb79289/torch/_dynamo/variables/builder.py#L1750 * Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols) * Signatures now: ```python # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr) # ctx is anything useful for rebuilding the class we want to guard on attrs, ctx = x.__tensor_flatten__() ... # inner_tensors is a dict of {attr -> tensor} # ctx is taken unmodified from flattening and (eventually) guarded on # outer_size is the expected size of the output; possibly symbolic # outer_stride is the expected strides of the output; possibly symbolic y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride) # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride # the assert simplifies symbols when there are relationships between outer and inner symbols ``` * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work) * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Next PR: add TENSOR_MATCH guards on inner tensors cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu tianyu-l [ghstack-poisoned]

…subclasses" Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called * Addresses this: https://github.com/pytorch/pytorch/blob/6a86cf00adb071fe7200229cc1996dfefbb79289/torch/_dynamo/variables/builder.py#L1750 * Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols) * Signatures now: ```python # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr) # ctx is anything useful for rebuilding the class we want to guard on attrs, ctx = x.__tensor_flatten__() ... # inner_tensors is a dict of {attr -> tensor} # ctx is taken unmodified from flattening and (eventually) guarded on # outer_size is the expected size of the output; possibly symbolic # outer_stride is the expected strides of the output; possibly symbolic y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride) # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride # the assert simplifies symbols when there are relationships between outer and inner symbols ``` * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work) * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Next PR: add TENSOR_MATCH guards on inner tensors cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu tianyu-l [ghstack-poisoned]