Add tensor to fake clone snapshot for immutable source of truth #100128

voznesenskym · 2023-04-26T22:12:17Z

Stack from ghstack (oldest at bottom):

-> Add tensor to fake clone snapshot for immutable source of truth #100128

There's a longstanding, well known mutability bug in dynamo, #93610 (and more issues, but this is the one I had at hand).

Ops that do in place mutation of tensors will mutate their corresponding FakeTensors.

So, for example, if you do t_ on a tensor, you will reverse its strides. This, in turn, means that the FakeTensors strides are now also reversed, say, if you are trying to torch.compile:

class F(torch.nn.Module):
            def forward(self, x, y):
                x = x.t_()
                y = y.t_()
                return (x + y,)

However, we recently introduced accessing the fake_tensor memo/cache to get the symbolic shape values for sizes and strides during guard installation time.

This means that tensors captured with a given size and stride, say, for x above, size:(3,3) stride:(3, 1), will get their memo updates to size(3, 3), stride(1, 3). Now, whenever you access this value for anything, it reflects it's current state in the tracing, as opposed to the state at which we initially started tracing on.

This causes us to produce guards that are never valid, for the example above, that x.stride()[0] == 3.

The solution is to not allow mutation to affect the fake tensors we use as source of truth here. We can do this by forcing a clone of the fake tensor at builder time, and storing that as the source of truth for our dynamic sizes and strides during guard installation.

cc @soumith @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

Finish clone map Lint [ghstack-poisoned]

pytorch-bot · 2023-04-26T22:12:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100128

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ba90acc:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Finish clone map Lint ghstack-source-id: f0b61d0952862be33b7580bb08d656d362d69f64 Pull Request resolved: #100128

…truth" Finish clone map Lint cc soumith penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

Finish clone map Lint ghstack-source-id: 81f7442b002462edbcc6de12fd7abb775ab0887c Pull Request resolved: #100128 Rm trash

…truth" There's a longstanding, well known mutability bug in dynamo, #93610 (and more issues, but this is the one I had at hand). Ops that do in place mutation of tensors will mutate their corresponding FakeTensors. So, for example, if you do `t_` on a tensor, you will reverse its strides. This, in turn, means that the FakeTensors strides are now also reversed, say, if you are trying to torch.compile: ``` class F(torch.nn.Module): def forward(self, x, y): x = x.t_() y = y.t_() return (x + y,) ``` However, we recently introduced accessing the fake_tensor memo/cache to get the symbolic shape values for sizes and strides during guard installation time. This means that tensors captured with a given size and stride, say, for x above, size:(3,3) stride:(3, 1), will get their memo updates to size(3, 3), stride(1, 3). Now, whenever you access this value for anything, it reflects it's current state in the tracing, as opposed to the state at which we initially started tracing on. This causes us to produce guards that are never valid, for the example above, that `x.stride()[0] == 3`. The solution is to not allow mutation to affect the fake tensors we use as source of truth here. We can do this by forcing a clone of the fake tensor at builder time, and storing that as the source of truth for our dynamic sizes and strides during guard installation. cc soumith penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

Finish clone map Lint ghstack-source-id: 5f02091baaddf615c7e1502069e1c009d7cd3cb0 Pull Request resolved: #100128 Rm trash Replace w/ update

ezyang · 2023-04-27T02:07:38Z

torch/_dynamo/output_graph.py

@@ -87,6 +87,7 @@ class OutputGraphState(NamedTuple):
    param_name_to_source: Optional[Dict[str, Source]]
    side_effects: SideEffects
    timestamp: int
+    tensor_id_to_fake_clone: Dict[int, torch._subclasses.FakeTensor]


Can we please use a weakref instead of an id. This will help prevent awful bugs where the input Tensor is not kept live and then an id gets reused.

torch/utils/weak.py has the stuff you want

ezyang · 2023-04-27T02:08:59Z

torch/_dynamo/variables/builder.py

@@ -1250,6 +1250,7 @@ def wrap_to_fake_tensor_and_record(
        )
        if is_tensor and not (static_shapes and source.is_nn_module()):
            tx.output.tracked_fakes.append(TrackedFake(fake_e, source, constraint_dims))
+        tx.output.tensor_id_to_fake_clone[id(e)] = fake_e.clone()


Given how you're using this, there is no reason to clone the entire fake tensor. Just extract and save the sizes and strides.

Sure, we can do that.

ezyang

This seems like a fine unblock. Please check comments.

…truth" There's a longstanding, well known mutability bug in dynamo, #93610 (and more issues, but this is the one I had at hand). Ops that do in place mutation of tensors will mutate their corresponding FakeTensors. So, for example, if you do `t_` on a tensor, you will reverse its strides. This, in turn, means that the FakeTensors strides are now also reversed, say, if you are trying to torch.compile: ``` class F(torch.nn.Module): def forward(self, x, y): x = x.t_() y = y.t_() return (x + y,) ``` However, we recently introduced accessing the fake_tensor memo/cache to get the symbolic shape values for sizes and strides during guard installation time. This means that tensors captured with a given size and stride, say, for x above, size:(3,3) stride:(3, 1), will get their memo updates to size(3, 3), stride(1, 3). Now, whenever you access this value for anything, it reflects it's current state in the tracing, as opposed to the state at which we initially started tracing on. This causes us to produce guards that are never valid, for the example above, that `x.stride()[0] == 3`. The solution is to not allow mutation to affect the fake tensors we use as source of truth here. We can do this by forcing a clone of the fake tensor at builder time, and storing that as the source of truth for our dynamic sizes and strides during guard installation. cc soumith penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

Finish clone map Lint ghstack-source-id: 8b1c36ca7c4780921b4b7522823cb77e17b823db Pull Request resolved: #100128 Rm trash Replace w/ update Breaks

torch/_dynamo/variables/builder.py

…truth" There's a longstanding, well known mutability bug in dynamo, #93610 (and more issues, but this is the one I had at hand). Ops that do in place mutation of tensors will mutate their corresponding FakeTensors. So, for example, if you do `t_` on a tensor, you will reverse its strides. This, in turn, means that the FakeTensors strides are now also reversed, say, if you are trying to torch.compile: ``` class F(torch.nn.Module): def forward(self, x, y): x = x.t_() y = y.t_() return (x + y,) ``` However, we recently introduced accessing the fake_tensor memo/cache to get the symbolic shape values for sizes and strides during guard installation time. This means that tensors captured with a given size and stride, say, for x above, size:(3,3) stride:(3, 1), will get their memo updates to size(3, 3), stride(1, 3). Now, whenever you access this value for anything, it reflects it's current state in the tracing, as opposed to the state at which we initially started tracing on. This causes us to produce guards that are never valid, for the example above, that `x.stride()[0] == 3`. The solution is to not allow mutation to affect the fake tensors we use as source of truth here. We can do this by forcing a clone of the fake tensor at builder time, and storing that as the source of truth for our dynamic sizes and strides during guard installation. cc soumith penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

Finish clone map Lint ghstack-source-id: 168818ad91dbfe9d6e9bb6d430408693e516f0fa Pull Request resolved: #100128 Rm trash Replace w/ update Breaks Feedback

…truth" There's a longstanding, well known mutability bug in dynamo, #93610 (and more issues, but this is the one I had at hand). Ops that do in place mutation of tensors will mutate their corresponding FakeTensors. So, for example, if you do `t_` on a tensor, you will reverse its strides. This, in turn, means that the FakeTensors strides are now also reversed, say, if you are trying to torch.compile: ``` class F(torch.nn.Module): def forward(self, x, y): x = x.t_() y = y.t_() return (x + y,) ``` However, we recently introduced accessing the fake_tensor memo/cache to get the symbolic shape values for sizes and strides during guard installation time. This means that tensors captured with a given size and stride, say, for x above, size:(3,3) stride:(3, 1), will get their memo updates to size(3, 3), stride(1, 3). Now, whenever you access this value for anything, it reflects it's current state in the tracing, as opposed to the state at which we initially started tracing on. This causes us to produce guards that are never valid, for the example above, that `x.stride()[0] == 3`. The solution is to not allow mutation to affect the fake tensors we use as source of truth here. We can do this by forcing a clone of the fake tensor at builder time, and storing that as the source of truth for our dynamic sizes and strides during guard installation. cc soumith penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

Finish clone map Lint ghstack-source-id: 070f7080a6ab63fe25caf5ce612f97e3838b5be5 Pull Request resolved: #100128 Rm trash Replace w/ update Breaks Feedback Lint

voznesenskym · 2023-04-27T21:22:25Z

@pytorchbot rebase

…o_sizes_strides with a weak dict" Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…with a weak dict" Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…s_strides with a weak dict" Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

… weak dict" Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…s_strides with a weak dict" Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

… weak dict" Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…113412) Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. Pull Request resolved: #113412 Approved by: https://github.com/Skylion007, https://github.com/voznesenskym ghstack dependencies: #113413, #113518, #113519

…s_strides with a weak dict" Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

… weak dict" Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

…ytorch#113412) Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in pytorch#100128. Pull Request resolved: pytorch#113412 Approved by: https://github.com/Skylion007, https://github.com/voznesenskym ghstack dependencies: pytorch#113413, pytorch#113518, pytorch#113519

Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. **Background**: Module parametrization injects a property as the module parameter attribute that forwards a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` **Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * Graph break for things in torch/nn/utils/parametrize.py. Before, Dynamo's trace rules would try to inline everything in torch.nn, so this is an exception to that behavior. * Avoid introducing `call_module()` node when tracing the forward func of parametrized modules, preferring inlining. This is an exception to the general rule that modules in torch.nn should have `call_module()` nodes in the Dynamo graph for the forward func call. **Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. [ghstack-poisoned]

Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. **Background**: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` **Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * Graph break for things in torch/nn/utils/parametrize.py. Before, Dynamo's trace rules would try to inline everything in torch.nn, so this is an exception to that behavior. * Avoid introducing `call_module()` node when tracing the forward func of parametrized modules, preferring inlining. This is an exception to the general rule that modules in torch.nn should have `call_module()` nodes in the Dynamo graph for the forward func call. **Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. **Background**: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` **Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * For parametrized modules, call `convert_to_unspecialized()` to restart analysis where Dynamo starts inlining the module. **Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

…arametrization" Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. **Background**: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` **Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * For parametrized modules, call `convert_to_unspecialized()` to restart analysis where Dynamo starts inlining the module. **Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. **Background**: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` **Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * For parametrized modules, call `convert_to_unspecialized()` to restart analysis where Dynamo starts inlining the module. **Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

…arametrization" Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. **Background**: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` **Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * For parametrized modules, call `convert_to_unspecialized()` to restart analysis where Dynamo starts inlining the module. **Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. **Background**: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` **Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * For parametrized modules, call `convert_to_unspecialized()` to restart analysis where Dynamo starts inlining the module. **Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

Fixes #118795 This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now. **Background**: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter. Example: ``` class MyParametrization(nn.Module): def forward(X): # This reparametrization just negates the original parameter value return -X m = nn.Linear(...) p = MyParametrization() register_parametrization(m, "weight", p) # Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight. # m.weight here is now an injected property that does the above instead of an actual Parameter. # This property is defined in torch/nn/utils/parametrize.py. m.weight # NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear) print(type(m)) ``` **Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this: * For parametrized modules, call `convert_to_unspecialized()` to restart analysis where Dynamo starts inlining the module. **Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this: * Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache. * This cache was originally introduced in #100128. Pull Request resolved: #121041 Approved by: https://github.com/anijain2305

Add tensor to fake clone snapshot for immutable source of truth

fda5a7c

Finish clone map Lint [ghstack-poisoned]

voznesenskym added a commit that referenced this pull request Apr 26, 2023

Add tensor to fake clone snapshot for immutable source of truth

672b1cd

Finish clone map Lint ghstack-source-id: f0b61d0952862be33b7580bb08d656d362d69f64 Pull Request resolved: #100128

github-actions bot requested review from albanD, antoniojkim, bdhirsh, ezyang, jbschlosser, miladm, SherlockNoMad and wconstab April 26, 2023 22:12

github-actions bot added ciflow/inductor module: dynamo labels Apr 26, 2023

Update on "Add tensor to fake clone snapshot for immutable source of …

623e47f

…truth" Finish clone map Lint cc soumith penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

voznesenskym added a commit that referenced this pull request Apr 26, 2023

Add tensor to fake clone snapshot for immutable source of truth

a64a853

Finish clone map Lint ghstack-source-id: 81f7442b002462edbcc6de12fd7abb775ab0887c Pull Request resolved: #100128 Rm trash

albanD removed their request for review April 26, 2023 22:19

voznesenskym added a commit that referenced this pull request Apr 26, 2023

Add tensor to fake clone snapshot for immutable source of truth

7bdcb72

Finish clone map Lint ghstack-source-id: 5f02091baaddf615c7e1502069e1c009d7cd3cb0 Pull Request resolved: #100128 Rm trash Replace w/ update

ezyang reviewed Apr 27, 2023

View reviewed changes

ezyang approved these changes Apr 27, 2023

View reviewed changes

voznesenskym added a commit that referenced this pull request Apr 27, 2023

Add tensor to fake clone snapshot for immutable source of truth

01c27db

Finish clone map Lint ghstack-source-id: 8b1c36ca7c4780921b4b7522823cb77e17b823db Pull Request resolved: #100128 Rm trash Replace w/ update Breaks

eellison reviewed Apr 27, 2023

View reviewed changes

torch/_dynamo/variables/builder.py Outdated Show resolved Hide resolved

voznesenskym added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 27, 2023

pytorchmergebot added Merged and removed merging labels Apr 27, 2023

pytorchmergebot closed this in a145a33 Apr 27, 2023

facebook-github-bot deleted the gh/voznesenskym/122/head branch June 8, 2023 19:00

int3 mentioned this pull request Nov 10, 2023

[dynamo] Initialize tensor_weakref_to_sizes_strides with a weak dict #113412

Closed

jbschlosser mentioned this pull request Mar 1, 2024

Graph break cleanly in Dynamo for module parametrization #121041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tensor to fake clone snapshot for immutable source of truth #100128

Add tensor to fake clone snapshot for immutable source of truth #100128

voznesenskym commented Apr 26, 2023 •

edited

Loading

pytorch-bot bot commented Apr 26, 2023 •

edited

Loading

ezyang Apr 27, 2023

ezyang Apr 27, 2023

voznesenskym Apr 27, 2023

ezyang Apr 27, 2023

voznesenskym Apr 27, 2023

ezyang left a comment

voznesenskym commented Apr 27, 2023

Add tensor to fake clone snapshot for immutable source of truth #100128

Add tensor to fake clone snapshot for immutable source of truth #100128

Conversation

voznesenskym commented Apr 26, 2023 • edited Loading

pytorch-bot bot commented Apr 26, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100128

✅ No Failures

ezyang Apr 27, 2023

Choose a reason for hiding this comment

ezyang Apr 27, 2023

Choose a reason for hiding this comment

voznesenskym Apr 27, 2023

Choose a reason for hiding this comment

ezyang Apr 27, 2023

Choose a reason for hiding this comment

voznesenskym Apr 27, 2023

Choose a reason for hiding this comment

ezyang left a comment

Choose a reason for hiding this comment

voznesenskym commented Apr 27, 2023

voznesenskym commented Apr 26, 2023 •

edited

Loading

pytorch-bot bot commented Apr 26, 2023 •

edited

Loading