Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parametrization version of weight_norm #103001

Closed
wants to merge 3 commits into from

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Jun 5, 2023

Stack from ghstack (oldest at bottom):

This done in the ordinary way, but also:

  • Deprecation warning for the old API, and a migration guide
  • Backwards compatibility for state_dict loading the old weight_norm
  • Test for pickling and deepcopy, which was the motivating reason

weight_norm is still used by HuggingFace Wav2Vec2.

Signed-off-by: Edward Z. Yang ezyang@meta.com

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 5, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103001

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1cd8a97:

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Jun 5, 2023
Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 62f063e486e5141d225f19de597892e6c3e12fd6
Pull Request resolved: #103001
@ezyang ezyang added release notes: nn release notes category topic: new features topic category labels Jun 5, 2023
ezyang added a commit to ezyang/transformers that referenced this pull request Jun 5, 2023
See pytorch/pytorch#103001

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ezyang added a commit that referenced this pull request Jun 5, 2023
This is not actually enabled in the benchmark suite as you need
#103001 and also training
is broken per #101160
but might as well review this part first.

Contains #102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 5, 2023
This is not actually enabled in the benchmark suite as you need
#103001 and also training
is broken per #101160
but might as well review this part first.

Contains #102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: c19a37cb6b5fe5c6d7434a406a10e3633efac430
Pull Request resolved: #103009
@ezyang
Copy link
Contributor Author

ezyang commented Jun 5, 2023

cc @vadimkantorov

This done in the ordinary way, but also:

* Deprecation warning for the old API, and a migration guide
* Backwards compatibility for state_dict loading the old weight_norm
* Test for pickling and deepcopy, which was the motivating reason

weight_norm is still used by HuggingFace Wav2Vec2.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 5, 2023
This is not actually enabled in the benchmark suite as you need
#103001 and also training
is broken per #101160
but might as well review this part first.

Contains #102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy bertmaher

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 5, 2023
This is not actually enabled in the benchmark suite as you need
#103001 and also training
is broken per #101160
but might as well review this part first.

Contains #102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy bertmaher

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 5, 2023
This is not actually enabled in the benchmark suite as you need
#103001 and also training
is broken per #101160
but might as well review this part first.

Contains #102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 7838c840ad75fb47e29487bba71cd96b010baf69
Pull Request resolved: #103009
which uses the modern parametrization API. The new ``weight_norm`` is compatible
with ``state_dict`` generated from old ``weight_norm``.

Migration guide:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10/10 migration guide.

torch/nn/utils/parametrizations.py Outdated Show resolved Hide resolved
Comment on lines 302 to 304
# TODO: is the .data necessary?
weight_g = torch.norm_except_dim(weight, 2, self.dim).data
weight_v = weight.data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure it's not necessary in the first one. Not sure about the second one as it's written.

That being said, weight_v shoudl be weight divided by weight_g. When that's written that way, then I'm pretty sure you don't need the .data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but that is not what the original code does?

        module.register_parameter(name + '_g', Parameter(norm_except_dim(weight, 2, dim).data))
        module.register_parameter(name + '_v', Parameter(weight.data))
        setattr(module, name, fn.compute_weight(module))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reread the paper and not dividing is correct. What is confusing in the docs is we refer to a direction vector, but we do not actually represent the direction vector directly (because it is required to orthogonal) so classic weight norm also reparametrizes the direction vector into v (the original weight vector) with the direction vector now represented as v / || v ||

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, had a look at the paper and indeed it's as you mention.

@ezyang ezyang added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 6, 2023
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

and one specifying the direction.

By default, with ``dim=0``, the norm is computed independently per output
channel/plane. To compute a norm over the entire weight tensor, use
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was here before but dim=None -> dim=-1 which computes the norm over the last dimension no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I believe weight_norm is special cased for this weird behavior lol.

>>> v = torch.randn(2, 2)
>>> g = torch.randn(())
>>> torch._weight_norm(v, g, 0)
tensor([[ 0.0577, -0.0345],
        [-0.0000,  0.0000]])
>>> torch._weight_norm(v, g, 1)
tensor([[0.0589, 0.0000],
        [0.0325, 0.0000]])
>>> torch._weight_norm(v, g, -1)
tensor([[ 0.0489, -0.0293],
        [ 0.0270, -0.0233]])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May also be a good moment to eliminate older implicit behavior? and maybe simply require passing explicit dim?

@ezyang
Copy link
Contributor Author

ezyang commented Jun 6, 2023

@pytorchbot merge -f "irrelevant failures only"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

ezyang added a commit that referenced this pull request Jun 6, 2023
This is not actually enabled in the benchmark suite as you need
#103001 and also training
is broken per #101160
but might as well review this part first.

Contains #102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy bertmaher

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 6, 2023
This is not actually enabled in the benchmark suite as you need
#103001 and also training
is broken per #101160
but might as well review this part first.

Contains #102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy bertmaher

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 6, 2023
This is not actually enabled in the benchmark suite as you need
#103001 and also training
is broken per #101160
but might as well review this part first.

Contains #102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: f8c8a23789175a67bf3cbe2950f425cba2a19707
Pull Request resolved: #103009
_weight_norm = _WeightNorm(dim)
parametrize.register_parametrization(module, name, _weight_norm, unsafe=True)

def _weight_norm_compat_hook(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs):
Copy link
Contributor

@vadimkantorov vadimkantorov Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, should installation of this hook be optional? (to avoid surprising interaction / influence with other hooks)

One way for this could be simply renaming _WeightNorm to WeightNorm and making it public

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, I can add a kwarg. register_compat_hook? What do you think the default should be? Right now I default compat to help people migrate, but maybe an argument could be made the other way around?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe still remove the underscore from _WeightNorm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano can you explain to us why the preexisting parametrizations are underscore'd on their module?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that they are not exposed as public API (that API is not documented) hence the underscore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's already possible to compose parametrizations just by registering several parametrizations on the same parameter. I also think that there are already people doing exactly the two other things that @vadimkantorov described with the current API. I am not sold on what would be won by making the module public.

We can do some archaeology, but I'm pretty sure it was @albanD (lord of the public and private API) that suggested making these modules private in the first place. The idea was exactly that described by Ed: It's not clear what are the benefits of making this public, so better to keep the API surface small whenever possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I propose that people directly use parametrize.register_parametrization for that. And deprecating the special method weight_norm(...) and asking to directly use parametrize.register_parametrization would highlight this compositionality - especially if this helper now does nothing except create a _WeightNorm and call parametrize.register_parametrization

Copy link
Contributor

@vadimkantorov vadimkantorov Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in earlier days making them private was more justified because it was the move to the new parameterization interface (and the old way was calling a helper method), but now it's more stable and it makes sense to promote the direct use of parametrization modules like _WeightNorm...

Regarding the API surface, I think there's no such severe reservations for regular modules, and having in the API a clean module is better than an obscure helper method (traditionally in PyTorch such helper methods were a sign of some "magic" / hooks - at least for me; and public modules were a sign of mature, compositional designs)

I agree that it's already compositional, I just propose to highlight this a bit more by making those module classes public, and parametrize.register_parametrization as the recommended way.

Sorry for the a bit subjective adjectives :) This is opinionated of course

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer these helper methods. I did something similar in my library https://github.com/Lezcano/geotorch/, and I very much prefer the functional API than the one where you use register_parametrization directly. But yeah, this is very subjective indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw register_parametrization seems to return self, so could be used in quasi-functional way as well! :)

aaronenyeshi pushed a commit to aaronenyeshi/pytorch that referenced this pull request Jun 6, 2023
This is not actually enabled in the benchmark suite as you need
pytorch#103001 and also training
is broken per pytorch#101160
but might as well review this part first.

Contains pytorch#102979 but
I will probably rebase past that once it lands.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: pytorch#103009
Approved by: https://github.com/Skylion007
sgugger pushed a commit to huggingface/transformers that referenced this pull request Jun 6, 2023
* Use new parametrization based weight norm if available

See pytorch/pytorch#103001

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

* handle copies

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

* black

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

---------

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/2133/head branch June 9, 2023 14:16
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
…4030)

* Use new parametrization based weight norm if available

See pytorch/pytorch#103001

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

* handle copies

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

* black

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

---------

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Jun 12, 2024
…tensor

Summary:
As title, to fix the error when exporting/lowering lstm, the error message is:
```
Cell In[13], line 3
      1 from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
----> 3 edge_manager = edge_manager.to_backend(XnnpackPartitioner())
      5 print(edge_manager.exported_program())
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/program/_program.py:1166, in EdgeProgramManager.to_backend(self, partitioner)
   1164 else:  # apply partitioner to every method
   1165     for name, program in self._edge_programs.items():
-> 1166         new_edge_programs[name] = to_backend(program, partitioner)
   1168 config = EdgeCompileConfig(_check_ir_validity=False)
   1169 return EdgeProgramManager(
   1170     new_edge_programs, copy.deepcopy(self._config_methods), config
   1171 )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:384, in _(edge_program, partitioner_instance)
    381 for tag, _ in partitioner_result.partition_tags.items():
    382     _maybe_duplicate_constant_nodes(tagged_exported_program, tag, edge_program)
--> 384 tagged_graph_module = _partition_and_lower(
    385     tagged_exported_program.graph_module, partitioner_result, edge_program
    386 )
    388 # TODO(angelayi): Update this signature in a less manual way (maybe through
    389 # retracing)
    390 new_signature, new_state_dict, new_constants = _get_new_signature(
    391     edge_program,
    392     tagged_graph_module,
    393 )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:299, in _partition_and_lower(tagged_graph_module, partition_result, owning_program)
    290 def _partition_and_lower(
    291     tagged_graph_module: torch.fx.GraphModule,
    292     partition_result: PartitionResult,
    293     owning_program: ExportedProgram,
    294 ) -> torch.fx.GraphModule:
    295     """
    296     Partitions the graph module into submodules based on tags, and then lowered the nodes with the same tag as one lowered module, including the submodule from control flow
    297     """
--> 299     partitioned_module = _partition_and_lower_one_graph_module(
    300         tagged_graph_module, partition_result, owning_program
    301     )
    303     # Recursively partition and lower for submodules
    304     for name, submod, _node in get_control_flow_submodules(partitioned_module):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:230, in _partition_and_lower_one_graph_module(tagged_graph_module, partition_result, owning_program)
    224 logging.debug(f"Partitioned graph module: {tagged_graph_module}")
    226 submodule_program = create_exported_program_from_submodule(
    227     submodule, owning_program, tag
    228 )
--> 230 lowered_submodule = to_backend(
    231     delegation_spec.backend_id,
    232     submodule_program,
    233     delegation_spec.compile_specs,
    234 )
    236 # call delegate args should only use user_inputs
    237 call_delegate_args = []
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:113, in _(backend_id, edge_program, compile_specs)
    111 for cls in BackendDetails.__subclasses__():
    112     if backend_id == cls.__name__:
--> 113         copied_edge_program = copy.deepcopy(edge_program)
    114         preprocess_result: PreprocessResult = cls.preprocess(
    115             copied_edge_program,
    116             compile_specs,
    117         )
    118         lowered_module = LoweredBackendModule(
    119             edge_program=edge_program,
    120             backend_id=backend_id,
    121             processed_bytes=preprocess_result.processed_bytes,
    122             compile_specs=compile_specs,
    123         )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:172, in deepcopy(x, memo, _nil)
    170                 y = x
    171             else:
--> 172                 y = _reconstruct(x, memo, *rv)
    174 # If is its own copy, don't memoize.
    175 if y is not x:
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:271, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    269 if state is not None:
    270     if deep:
--> 271         state = deepcopy(state, memo)
    272     if hasattr(y, '__setstate__'):
    273         y.__setstate__(state)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/torch/_tensor.py:86, in Tensor.__deepcopy__(self, memo)
     84     return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo)
     85 if not self.is_leaf:
---> 86     raise RuntimeError(
     87         "Only Tensors created explicitly by the user "
     88         "(graph leaves) support the deepcopy protocol at the moment.  "
     89         "If you were attempting to deepcopy a module, this may be because "
     90         "of a torch.nn.utils.weight_norm usage, "
     91         "see pytorch/pytorch#103001"
     92     )
     93 if id(self) in memo:
     94     return memo[id(self)]
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment.  If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see pytorch/pytorch#103001
```
The reason is the constant tensor has the grad_fn when run it without the contextmgr

Reviewed By: angelayi

Differential Revision: D58436236
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Jun 13, 2024
…tensor (pytorch#3948)

Summary:
Pull Request resolved: pytorch#3948

As title, to fix the error when exporting/lowering lstm, the error message is:
```
Cell In[13], line 3
      1 from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
----> 3 edge_manager = edge_manager.to_backend(XnnpackPartitioner())
      5 print(edge_manager.exported_program())
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/program/_program.py:1166, in EdgeProgramManager.to_backend(self, partitioner)
   1164 else:  # apply partitioner to every method
   1165     for name, program in self._edge_programs.items():
-> 1166         new_edge_programs[name] = to_backend(program, partitioner)
   1168 config = EdgeCompileConfig(_check_ir_validity=False)
   1169 return EdgeProgramManager(
   1170     new_edge_programs, copy.deepcopy(self._config_methods), config
   1171 )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:384, in _(edge_program, partitioner_instance)
    381 for tag, _ in partitioner_result.partition_tags.items():
    382     _maybe_duplicate_constant_nodes(tagged_exported_program, tag, edge_program)
--> 384 tagged_graph_module = _partition_and_lower(
    385     tagged_exported_program.graph_module, partitioner_result, edge_program
    386 )
    388 # TODO(angelayi): Update this signature in a less manual way (maybe through
    389 # retracing)
    390 new_signature, new_state_dict, new_constants = _get_new_signature(
    391     edge_program,
    392     tagged_graph_module,
    393 )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:299, in _partition_and_lower(tagged_graph_module, partition_result, owning_program)
    290 def _partition_and_lower(
    291     tagged_graph_module: torch.fx.GraphModule,
    292     partition_result: PartitionResult,
    293     owning_program: ExportedProgram,
    294 ) -> torch.fx.GraphModule:
    295     """
    296     Partitions the graph module into submodules based on tags, and then lowered the nodes with the same tag as one lowered module, including the submodule from control flow
    297     """
--> 299     partitioned_module = _partition_and_lower_one_graph_module(
    300         tagged_graph_module, partition_result, owning_program
    301     )
    303     # Recursively partition and lower for submodules
    304     for name, submod, _node in get_control_flow_submodules(partitioned_module):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:230, in _partition_and_lower_one_graph_module(tagged_graph_module, partition_result, owning_program)
    224 logging.debug(f"Partitioned graph module: {tagged_graph_module}")
    226 submodule_program = create_exported_program_from_submodule(
    227     submodule, owning_program, tag
    228 )
--> 230 lowered_submodule = to_backend(
    231     delegation_spec.backend_id,
    232     submodule_program,
    233     delegation_spec.compile_specs,
    234 )
    236 # call delegate args should only use user_inputs
    237 call_delegate_args = []
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:113, in _(backend_id, edge_program, compile_specs)
    111 for cls in BackendDetails.__subclasses__():
    112     if backend_id == cls.__name__:
--> 113         copied_edge_program = copy.deepcopy(edge_program)
    114         preprocess_result: PreprocessResult = cls.preprocess(
    115             copied_edge_program,
    116             compile_specs,
    117         )
    118         lowered_module = LoweredBackendModule(
    119             edge_program=edge_program,
    120             backend_id=backend_id,
    121             processed_bytes=preprocess_result.processed_bytes,
    122             compile_specs=compile_specs,
    123         )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:172, in deepcopy(x, memo, _nil)
    170                 y = x
    171             else:
--> 172                 y = _reconstruct(x, memo, *rv)
    174 # If is its own copy, don't memoize.
    175 if y is not x:
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:271, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    269 if state is not None:
    270     if deep:
--> 271         state = deepcopy(state, memo)
    272     if hasattr(y, '__setstate__'):
    273         y.__setstate__(state)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/torch/_tensor.py:86, in Tensor.__deepcopy__(self, memo)
     84     return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo)
     85 if not self.is_leaf:
---> 86     raise RuntimeError(
     87         "Only Tensors created explicitly by the user "
     88         "(graph leaves) support the deepcopy protocol at the moment.  "
     89         "If you were attempting to deepcopy a module, this may be because "
     90         "of a torch.nn.utils.weight_norm usage, "
     91         "see pytorch/pytorch#103001"
     92     )
     93 if id(self) in memo:
     94     return memo[id(self)]
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment.  If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see pytorch/pytorch#103001
```
The reason is the constant tensor has the grad_fn when run it without the contextmgr

Reviewed By: angelayi

Differential Revision: D58436236
facebook-github-bot pushed a commit to pytorch/executorch that referenced this pull request Jun 13, 2024
…tensor (#3948)

Summary:
Pull Request resolved: #3948

As title, to fix the error when exporting/lowering lstm, the error message is:
```
Cell In[13], line 3
      1 from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
----> 3 edge_manager = edge_manager.to_backend(XnnpackPartitioner())
      5 print(edge_manager.exported_program())
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/program/_program.py:1166, in EdgeProgramManager.to_backend(self, partitioner)
   1164 else:  # apply partitioner to every method
   1165     for name, program in self._edge_programs.items():
-> 1166         new_edge_programs[name] = to_backend(program, partitioner)
   1168 config = EdgeCompileConfig(_check_ir_validity=False)
   1169 return EdgeProgramManager(
   1170     new_edge_programs, copy.deepcopy(self._config_methods), config
   1171 )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:384, in _(edge_program, partitioner_instance)
    381 for tag, _ in partitioner_result.partition_tags.items():
    382     _maybe_duplicate_constant_nodes(tagged_exported_program, tag, edge_program)
--> 384 tagged_graph_module = _partition_and_lower(
    385     tagged_exported_program.graph_module, partitioner_result, edge_program
    386 )
    388 # TODO(angelayi): Update this signature in a less manual way (maybe through
    389 # retracing)
    390 new_signature, new_state_dict, new_constants = _get_new_signature(
    391     edge_program,
    392     tagged_graph_module,
    393 )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:299, in _partition_and_lower(tagged_graph_module, partition_result, owning_program)
    290 def _partition_and_lower(
    291     tagged_graph_module: torch.fx.GraphModule,
    292     partition_result: PartitionResult,
    293     owning_program: ExportedProgram,
    294 ) -> torch.fx.GraphModule:
    295     """
    296     Partitions the graph module into submodules based on tags, and then lowered the nodes with the same tag as one lowered module, including the submodule from control flow
    297     """
--> 299     partitioned_module = _partition_and_lower_one_graph_module(
    300         tagged_graph_module, partition_result, owning_program
    301     )
    303     # Recursively partition and lower for submodules
    304     for name, submod, _node in get_control_flow_submodules(partitioned_module):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:230, in _partition_and_lower_one_graph_module(tagged_graph_module, partition_result, owning_program)
    224 logging.debug(f"Partitioned graph module: {tagged_graph_module}")
    226 submodule_program = create_exported_program_from_submodule(
    227     submodule, owning_program, tag
    228 )
--> 230 lowered_submodule = to_backend(
    231     delegation_spec.backend_id,
    232     submodule_program,
    233     delegation_spec.compile_specs,
    234 )
    236 # call delegate args should only use user_inputs
    237 call_delegate_args = []
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:113, in _(backend_id, edge_program, compile_specs)
    111 for cls in BackendDetails.__subclasses__():
    112     if backend_id == cls.__name__:
--> 113         copied_edge_program = copy.deepcopy(edge_program)
    114         preprocess_result: PreprocessResult = cls.preprocess(
    115             copied_edge_program,
    116             compile_specs,
    117         )
    118         lowered_module = LoweredBackendModule(
    119             edge_program=edge_program,
    120             backend_id=backend_id,
    121             processed_bytes=preprocess_result.processed_bytes,
    122             compile_specs=compile_specs,
    123         )
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:172, in deepcopy(x, memo, _nil)
    170                 y = x
    171             else:
--> 172                 y = _reconstruct(x, memo, *rv)
    174 # If is its own copy, don't memoize.
    175 if y is not x:
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:271, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    269 if state is not None:
    270     if deep:
--> 271         state = deepcopy(state, memo)
    272     if hasattr(y, '__setstate__'):
    273         y.__setstate__(state)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)
File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/torch/_tensor.py:86, in Tensor.__deepcopy__(self, memo)
     84     return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo)
     85 if not self.is_leaf:
---> 86     raise RuntimeError(
     87         "Only Tensors created explicitly by the user "
     88         "(graph leaves) support the deepcopy protocol at the moment.  "
     89         "If you were attempting to deepcopy a module, this may be because "
     90         "of a torch.nn.utils.weight_norm usage, "
     91         "see pytorch/pytorch#103001"
     92     )
     93 if id(self) in memo:
     94     return memo[id(self)]
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment.  If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see pytorch/pytorch#103001
```
The reason is the constant tensor has the grad_fn when run it without the contextmgr

Reviewed By: angelayi

Differential Revision: D58436236

fbshipit-source-id: 565810b662f0eb086d02ffff9e16d7c86c0304c4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: nn release notes category topic: new features topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants