Add parametrization version of weight_norm #103001

ezyang · 2023-06-05T19:28:17Z

Stack from ghstack (oldest at bottom):

This done in the ordinary way, but also:

Deprecation warning for the old API, and a migration guide
Backwards compatibility for state_dict loading the old weight_norm
Test for pickling and deepcopy, which was the motivating reason

weight_norm is still used by HuggingFace Wav2Vec2.

Signed-off-by: Edward Z. Yang ezyang@meta.com

Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

pytorch-bot · 2023-06-05T19:28:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103001

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1cd8a97:

NEW FAILURE - The following job has failed:

linux-focal-rocm5.4.2-py3.8 / test (default, 2, 3, linux.rocm.gpu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 62f063e486e5141d225f19de597892e6c3e12fd6 Pull Request resolved: #103001

See pytorch/pytorch#103001 Signed-off-by: Edward Z. Yang <ezyang@meta.com>

This is not actually enabled in the benchmark suite as you need #103001 and also training is broken per #101160 but might as well review this part first. Contains #102979 but I will probably rebase past that once it lands. Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

This is not actually enabled in the benchmark suite as you need #103001 and also training is broken per #101160 but might as well review this part first. Contains #102979 but I will probably rebase past that once it lands. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: c19a37cb6b5fe5c6d7434a406a10e3633efac430 Pull Request resolved: #103009

torch/nn/utils/parametrizations.py

ezyang · 2023-06-05T19:55:39Z

cc @vadimkantorov

This done in the ordinary way, but also: * Deprecation warning for the old API, and a migration guide * Backwards compatibility for state_dict loading the old weight_norm * Test for pickling and deepcopy, which was the motivating reason weight_norm is still used by HuggingFace Wav2Vec2. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

This is not actually enabled in the benchmark suite as you need #103001 and also training is broken per #101160 but might as well review this part first. Contains #102979 but I will probably rebase past that once it lands. Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy bertmaher [ghstack-poisoned]

This is not actually enabled in the benchmark suite as you need #103001 and also training is broken per #101160 but might as well review this part first. Contains #102979 but I will probably rebase past that once it lands. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 7838c840ad75fb47e29487bba71cd96b010baf69 Pull Request resolved: #103009

lezcano · 2023-06-05T22:32:47Z

torch/nn/utils/weight_norm.py

+        which uses the modern parametrization API.  The new ``weight_norm`` is compatible
+        with ``state_dict`` generated from old ``weight_norm``.
+
+        Migration guide:


10/10 migration guide.

torch/nn/utils/parametrizations.py

lezcano · 2023-06-05T22:41:15Z

torch/nn/utils/parametrizations.py

+        # TODO: is the .data necessary?
+        weight_g = torch.norm_except_dim(weight, 2, self.dim).data
+        weight_v = weight.data


Pretty sure it's not necessary in the first one. Not sure about the second one as it's written.

That being said, weight_v shoudl be weight divided by weight_g. When that's written that way, then I'm pretty sure you don't need the .data.

Hmm, but that is not what the original code does?

module.register_parameter(name + '_g', Parameter(norm_except_dim(weight, 2, dim).data)) module.register_parameter(name + '_v', Parameter(weight.data)) setattr(module, name, fn.compute_weight(module))

I reread the paper and not dividing is correct. What is confusing in the docs is we refer to a direction vector, but we do not actually represent the direction vector directly (because it is required to orthogonal) so classic weight norm also reparametrizes the direction vector into v (the original weight vector) with the direction vector now represented as v / || v ||

Right, had a look at the paper and indeed it's as you mention.

albanD

Thanks!

albanD · 2023-06-06T13:00:43Z

torch/nn/utils/parametrizations.py

+    and one specifying the direction.
+
+    By default, with ``dim=0``, the norm is computed independently per output
+    channel/plane. To compute a norm over the entire weight tensor, use


I know this was here before but dim=None -> dim=-1 which computes the norm over the last dimension no?

No, I believe weight_norm is special cased for this weird behavior lol.

>>> v = torch.randn(2, 2) >>> g = torch.randn(()) >>> torch._weight_norm(v, g, 0) tensor([[ 0.0577, -0.0345], [-0.0000, 0.0000]]) >>> torch._weight_norm(v, g, 1) tensor([[0.0589, 0.0000], [0.0325, 0.0000]]) >>> torch._weight_norm(v, g, -1) tensor([[ 0.0489, -0.0293], [ 0.0270, -0.0233]])

May also be a good moment to eliminate older implicit behavior? and maybe simply require passing explicit dim?

ezyang · 2023-06-06T13:12:33Z

@pytorchbot merge -f "irrelevant failures only"

pytorchmergebot · 2023-06-06T13:14:39Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This is not actually enabled in the benchmark suite as you need #103001 and also training is broken per #101160 but might as well review this part first. Contains #102979 but I will probably rebase past that once it lands. Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy bertmaher [ghstack-poisoned]

This is not actually enabled in the benchmark suite as you need #103001 and also training is broken per #101160 but might as well review this part first. Contains #102979 but I will probably rebase past that once it lands. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: f8c8a23789175a67bf3cbe2950f425cba2a19707 Pull Request resolved: #103009

vadimkantorov · 2023-06-06T14:25:03Z

torch/nn/utils/parametrizations.py

+    _weight_norm = _WeightNorm(dim)
+    parametrize.register_parametrization(module, name, _weight_norm, unsafe=True)
+
+    def _weight_norm_compat_hook(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs):


also, should installation of this hook be optional? (to avoid surprising interaction / influence with other hooks)

One way for this could be simply renaming _WeightNorm to WeightNorm and making it public

Hmm, yes, I can add a kwarg. register_compat_hook? What do you think the default should be? Right now I default compat to help people migrate, but maybe an argument could be made the other way around?

And maybe still remove the underscore from _WeightNorm?

@lezcano can you explain to us why the preexisting parametrizations are underscore'd on their module?

The idea is that they are not exposed as public API (that API is not documented) hence the underscore.

So, it's already possible to compose parametrizations just by registering several parametrizations on the same parameter. I also think that there are already people doing exactly the two other things that @vadimkantorov described with the current API. I am not sold on what would be won by making the module public.

We can do some archaeology, but I'm pretty sure it was @albanD (lord of the public and private API) that suggested making these modules private in the first place. The idea was exactly that described by Ed: It's not clear what are the benefits of making this public, so better to keep the API surface small whenever possible.

Yes, but I propose that people directly use parametrize.register_parametrization for that. And deprecating the special method weight_norm(...) and asking to directly use parametrize.register_parametrization would highlight this compositionality - especially if this helper now does nothing except create a _WeightNorm and call parametrize.register_parametrization

I think in earlier days making them private was more justified because it was the move to the new parameterization interface (and the old way was calling a helper method), but now it's more stable and it makes sense to promote the direct use of parametrization modules like _WeightNorm...

Regarding the API surface, I think there's no such severe reservations for regular modules, and having in the API a clean module is better than an obscure helper method (traditionally in PyTorch such helper methods were a sign of some "magic" / hooks - at least for me; and public modules were a sign of mature, compositional designs)

I agree that it's already compositional, I just propose to highlight this a bit more by making those module classes public, and parametrize.register_parametrization as the recommended way.

Sorry for the a bit subjective adjectives :) This is opinionated of course

I do prefer these helper methods. I did something similar in my library https://github.com/Lezcano/geotorch/, and I very much prefer the functional API than the one where you use register_parametrization directly. But yeah, this is very subjective indeed.

Btw register_parametrization seems to return self, so could be used in quasi-functional way as well! :)

This is not actually enabled in the benchmark suite as you need pytorch#103001 and also training is broken per pytorch#101160 but might as well review this part first. Contains pytorch#102979 but I will probably rebase past that once it lands. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: pytorch#103009 Approved by: https://github.com/Skylion007

* Use new parametrization based weight norm if available See pytorch/pytorch#103001 Signed-off-by: Edward Z. Yang <ezyang@meta.com> * handle copies Signed-off-by: Edward Z. Yang <ezyang@meta.com> * black Signed-off-by: Edward Z. Yang <ezyang@meta.com> --------- Signed-off-by: Edward Z. Yang <ezyang@meta.com>

…4030) * Use new parametrization based weight norm if available See pytorch/pytorch#103001 Signed-off-by: Edward Z. Yang <ezyang@meta.com> * handle copies Signed-off-by: Edward Z. Yang <ezyang@meta.com> * black Signed-off-by: Edward Z. Yang <ezyang@meta.com> --------- Signed-off-by: Edward Z. Yang <ezyang@meta.com>

…tensor Summary: As title, to fix the error when exporting/lowering lstm, the error message is: ``` Cell In[13], line 3 1 from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner ----> 3 edge_manager = edge_manager.to_backend(XnnpackPartitioner()) 5 print(edge_manager.exported_program()) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/program/_program.py:1166, in EdgeProgramManager.to_backend(self, partitioner) 1164 else: # apply partitioner to every method 1165 for name, program in self._edge_programs.items(): -> 1166 new_edge_programs[name] = to_backend(program, partitioner) 1168 config = EdgeCompileConfig(_check_ir_validity=False) 1169 return EdgeProgramManager( 1170 new_edge_programs, copy.deepcopy(self._config_methods), config 1171 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw) 885 if not args: 886 raise TypeError(f'{funcname} requires at least ' 887 '1 positional argument') --> 889 return dispatch(args[0].__class__)(*args, **kw) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:384, in _(edge_program, partitioner_instance) 381 for tag, _ in partitioner_result.partition_tags.items(): 382 _maybe_duplicate_constant_nodes(tagged_exported_program, tag, edge_program) --> 384 tagged_graph_module = _partition_and_lower( 385 tagged_exported_program.graph_module, partitioner_result, edge_program 386 ) 388 # TODO(angelayi): Update this signature in a less manual way (maybe through 389 # retracing) 390 new_signature, new_state_dict, new_constants = _get_new_signature( 391 edge_program, 392 tagged_graph_module, 393 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:299, in _partition_and_lower(tagged_graph_module, partition_result, owning_program) 290 def _partition_and_lower( 291 tagged_graph_module: torch.fx.GraphModule, 292 partition_result: PartitionResult, 293 owning_program: ExportedProgram, 294 ) -> torch.fx.GraphModule: 295 """ 296 Partitions the graph module into submodules based on tags, and then lowered the nodes with the same tag as one lowered module, including the submodule from control flow 297 """ --> 299 partitioned_module = _partition_and_lower_one_graph_module( 300 tagged_graph_module, partition_result, owning_program 301 ) 303 # Recursively partition and lower for submodules 304 for name, submod, _node in get_control_flow_submodules(partitioned_module): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:230, in _partition_and_lower_one_graph_module(tagged_graph_module, partition_result, owning_program) 224 logging.debug(f"Partitioned graph module: {tagged_graph_module}") 226 submodule_program = create_exported_program_from_submodule( 227 submodule, owning_program, tag 228 ) --> 230 lowered_submodule = to_backend( 231 delegation_spec.backend_id, 232 submodule_program, 233 delegation_spec.compile_specs, 234 ) 236 # call delegate args should only use user_inputs 237 call_delegate_args = [] File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw) 885 if not args: 886 raise TypeError(f'{funcname} requires at least ' 887 '1 positional argument') --> 889 return dispatch(args[0].__class__)(*args, **kw) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:113, in _(backend_id, edge_program, compile_specs) 111 for cls in BackendDetails.__subclasses__(): 112 if backend_id == cls.__name__: --> 113 copied_edge_program = copy.deepcopy(edge_program) 114 preprocess_result: PreprocessResult = cls.preprocess( 115 copied_edge_program, 116 compile_specs, 117 ) 118 lowered_module = LoweredBackendModule( 119 edge_program=edge_program, 120 backend_id=backend_id, 121 processed_bytes=preprocess_result.processed_bytes, 122 compile_specs=compile_specs, 123 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:172, in deepcopy(x, memo, _nil) 170 y = x 171 else: --> 172 y = _reconstruct(x, memo, *rv) 174 # If is its own copy, don't memoize. 175 if y is not x: File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:271, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 269 if state is not None: 270 if deep: --> 271 state = deepcopy(state, memo) 272 if hasattr(y, '__setstate__'): 273 y.__setstate__(state) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil) 144 copier = _deepcopy_dispatch.get(cls) 145 if copier is not None: --> 146 y = copier(x, memo) 147 else: 148 if issubclass(cls, type): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy) 229 memo[id(x)] = y 230 for key, value in x.items(): --> 231 y[deepcopy(key, memo)] = deepcopy(value, memo) 232 return y File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil) 144 copier = _deepcopy_dispatch.get(cls) 145 if copier is not None: --> 146 y = copier(x, memo) 147 else: 148 if issubclass(cls, type): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy) 229 memo[id(x)] = y 230 for key, value in x.items(): --> 231 y[deepcopy(key, memo)] = deepcopy(value, memo) 232 return y File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil) 151 copier = getattr(x, "__deepcopy__", None) 152 if copier is not None: --> 153 y = copier(memo) 154 else: 155 reductor = dispatch_table.get(cls) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/torch/_tensor.py:86, in Tensor.__deepcopy__(self, memo) 84 return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo) 85 if not self.is_leaf: ---> 86 raise RuntimeError( 87 "Only Tensors created explicitly by the user " 88 "(graph leaves) support the deepcopy protocol at the moment. " 89 "If you were attempting to deepcopy a module, this may be because " 90 "of a torch.nn.utils.weight_norm usage, " 91 "see pytorch/pytorch#103001" 92 ) 93 if id(self) in memo: 94 return memo[id(self)] RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment. If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see pytorch/pytorch#103001 ``` The reason is the constant tensor has the grad_fn when run it without the contextmgr Reviewed By: angelayi Differential Revision: D58436236

…tensor (pytorch#3948) Summary: Pull Request resolved: pytorch#3948 As title, to fix the error when exporting/lowering lstm, the error message is: ``` Cell In[13], line 3 1 from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner ----> 3 edge_manager = edge_manager.to_backend(XnnpackPartitioner()) 5 print(edge_manager.exported_program()) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/program/_program.py:1166, in EdgeProgramManager.to_backend(self, partitioner) 1164 else: # apply partitioner to every method 1165 for name, program in self._edge_programs.items(): -> 1166 new_edge_programs[name] = to_backend(program, partitioner) 1168 config = EdgeCompileConfig(_check_ir_validity=False) 1169 return EdgeProgramManager( 1170 new_edge_programs, copy.deepcopy(self._config_methods), config 1171 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw) 885 if not args: 886 raise TypeError(f'{funcname} requires at least ' 887 '1 positional argument') --> 889 return dispatch(args[0].__class__)(*args, **kw) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:384, in _(edge_program, partitioner_instance) 381 for tag, _ in partitioner_result.partition_tags.items(): 382 _maybe_duplicate_constant_nodes(tagged_exported_program, tag, edge_program) --> 384 tagged_graph_module = _partition_and_lower( 385 tagged_exported_program.graph_module, partitioner_result, edge_program 386 ) 388 # TODO(angelayi): Update this signature in a less manual way (maybe through 389 # retracing) 390 new_signature, new_state_dict, new_constants = _get_new_signature( 391 edge_program, 392 tagged_graph_module, 393 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:299, in _partition_and_lower(tagged_graph_module, partition_result, owning_program) 290 def _partition_and_lower( 291 tagged_graph_module: torch.fx.GraphModule, 292 partition_result: PartitionResult, 293 owning_program: ExportedProgram, 294 ) -> torch.fx.GraphModule: 295 """ 296 Partitions the graph module into submodules based on tags, and then lowered the nodes with the same tag as one lowered module, including the submodule from control flow 297 """ --> 299 partitioned_module = _partition_and_lower_one_graph_module( 300 tagged_graph_module, partition_result, owning_program 301 ) 303 # Recursively partition and lower for submodules 304 for name, submod, _node in get_control_flow_submodules(partitioned_module): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:230, in _partition_and_lower_one_graph_module(tagged_graph_module, partition_result, owning_program) 224 logging.debug(f"Partitioned graph module: {tagged_graph_module}") 226 submodule_program = create_exported_program_from_submodule( 227 submodule, owning_program, tag 228 ) --> 230 lowered_submodule = to_backend( 231 delegation_spec.backend_id, 232 submodule_program, 233 delegation_spec.compile_specs, 234 ) 236 # call delegate args should only use user_inputs 237 call_delegate_args = [] File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw) 885 if not args: 886 raise TypeError(f'{funcname} requires at least ' 887 '1 positional argument') --> 889 return dispatch(args[0].__class__)(*args, **kw) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:113, in _(backend_id, edge_program, compile_specs) 111 for cls in BackendDetails.__subclasses__(): 112 if backend_id == cls.__name__: --> 113 copied_edge_program = copy.deepcopy(edge_program) 114 preprocess_result: PreprocessResult = cls.preprocess( 115 copied_edge_program, 116 compile_specs, 117 ) 118 lowered_module = LoweredBackendModule( 119 edge_program=edge_program, 120 backend_id=backend_id, 121 processed_bytes=preprocess_result.processed_bytes, 122 compile_specs=compile_specs, 123 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:172, in deepcopy(x, memo, _nil) 170 y = x 171 else: --> 172 y = _reconstruct(x, memo, *rv) 174 # If is its own copy, don't memoize. 175 if y is not x: File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:271, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 269 if state is not None: 270 if deep: --> 271 state = deepcopy(state, memo) 272 if hasattr(y, '__setstate__'): 273 y.__setstate__(state) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil) 144 copier = _deepcopy_dispatch.get(cls) 145 if copier is not None: --> 146 y = copier(x, memo) 147 else: 148 if issubclass(cls, type): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy) 229 memo[id(x)] = y 230 for key, value in x.items(): --> 231 y[deepcopy(key, memo)] = deepcopy(value, memo) 232 return y File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil) 144 copier = _deepcopy_dispatch.get(cls) 145 if copier is not None: --> 146 y = copier(x, memo) 147 else: 148 if issubclass(cls, type): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy) 229 memo[id(x)] = y 230 for key, value in x.items(): --> 231 y[deepcopy(key, memo)] = deepcopy(value, memo) 232 return y File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil) 151 copier = getattr(x, "__deepcopy__", None) 152 if copier is not None: --> 153 y = copier(memo) 154 else: 155 reductor = dispatch_table.get(cls) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/torch/_tensor.py:86, in Tensor.__deepcopy__(self, memo) 84 return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo) 85 if not self.is_leaf: ---> 86 raise RuntimeError( 87 "Only Tensors created explicitly by the user " 88 "(graph leaves) support the deepcopy protocol at the moment. " 89 "If you were attempting to deepcopy a module, this may be because " 90 "of a torch.nn.utils.weight_norm usage, " 91 "see pytorch/pytorch#103001" 92 ) 93 if id(self) in memo: 94 return memo[id(self)] RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment. If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see pytorch/pytorch#103001 ``` The reason is the constant tensor has the grad_fn when run it without the contextmgr Reviewed By: angelayi Differential Revision: D58436236

…tensor (#3948) Summary: Pull Request resolved: #3948 As title, to fix the error when exporting/lowering lstm, the error message is: ``` Cell In[13], line 3 1 from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner ----> 3 edge_manager = edge_manager.to_backend(XnnpackPartitioner()) 5 print(edge_manager.exported_program()) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/program/_program.py:1166, in EdgeProgramManager.to_backend(self, partitioner) 1164 else: # apply partitioner to every method 1165 for name, program in self._edge_programs.items(): -> 1166 new_edge_programs[name] = to_backend(program, partitioner) 1168 config = EdgeCompileConfig(_check_ir_validity=False) 1169 return EdgeProgramManager( 1170 new_edge_programs, copy.deepcopy(self._config_methods), config 1171 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw) 885 if not args: 886 raise TypeError(f'{funcname} requires at least ' 887 '1 positional argument') --> 889 return dispatch(args[0].__class__)(*args, **kw) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:384, in _(edge_program, partitioner_instance) 381 for tag, _ in partitioner_result.partition_tags.items(): 382 _maybe_duplicate_constant_nodes(tagged_exported_program, tag, edge_program) --> 384 tagged_graph_module = _partition_and_lower( 385 tagged_exported_program.graph_module, partitioner_result, edge_program 386 ) 388 # TODO(angelayi): Update this signature in a less manual way (maybe through 389 # retracing) 390 new_signature, new_state_dict, new_constants = _get_new_signature( 391 edge_program, 392 tagged_graph_module, 393 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:299, in _partition_and_lower(tagged_graph_module, partition_result, owning_program) 290 def _partition_and_lower( 291 tagged_graph_module: torch.fx.GraphModule, 292 partition_result: PartitionResult, 293 owning_program: ExportedProgram, 294 ) -> torch.fx.GraphModule: 295 """ 296 Partitions the graph module into submodules based on tags, and then lowered the nodes with the same tag as one lowered module, including the submodule from control flow 297 """ --> 299 partitioned_module = _partition_and_lower_one_graph_module( 300 tagged_graph_module, partition_result, owning_program 301 ) 303 # Recursively partition and lower for submodules 304 for name, submod, _node in get_control_flow_submodules(partitioned_module): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:230, in _partition_and_lower_one_graph_module(tagged_graph_module, partition_result, owning_program) 224 logging.debug(f"Partitioned graph module: {tagged_graph_module}") 226 submodule_program = create_exported_program_from_submodule( 227 submodule, owning_program, tag 228 ) --> 230 lowered_submodule = to_backend( 231 delegation_spec.backend_id, 232 submodule_program, 233 delegation_spec.compile_specs, 234 ) 236 # call delegate args should only use user_inputs 237 call_delegate_args = [] File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw) 885 if not args: 886 raise TypeError(f'{funcname} requires at least ' 887 '1 positional argument') --> 889 return dispatch(args[0].__class__)(*args, **kw) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/executorch/exir/backend/backend_api.py:113, in _(backend_id, edge_program, compile_specs) 111 for cls in BackendDetails.__subclasses__(): 112 if backend_id == cls.__name__: --> 113 copied_edge_program = copy.deepcopy(edge_program) 114 preprocess_result: PreprocessResult = cls.preprocess( 115 copied_edge_program, 116 compile_specs, 117 ) 118 lowered_module = LoweredBackendModule( 119 edge_program=edge_program, 120 backend_id=backend_id, 121 processed_bytes=preprocess_result.processed_bytes, 122 compile_specs=compile_specs, 123 ) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:172, in deepcopy(x, memo, _nil) 170 y = x 171 else: --> 172 y = _reconstruct(x, memo, *rv) 174 # If is its own copy, don't memoize. 175 if y is not x: File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:271, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 269 if state is not None: 270 if deep: --> 271 state = deepcopy(state, memo) 272 if hasattr(y, '__setstate__'): 273 y.__setstate__(state) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil) 144 copier = _deepcopy_dispatch.get(cls) 145 if copier is not None: --> 146 y = copier(x, memo) 147 else: 148 if issubclass(cls, type): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy) 229 memo[id(x)] = y 230 for key, value in x.items(): --> 231 y[deepcopy(key, memo)] = deepcopy(value, memo) 232 return y File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil) 144 copier = _deepcopy_dispatch.get(cls) 145 if copier is not None: --> 146 y = copier(x, memo) 147 else: 148 if issubclass(cls, type): File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy) 229 memo[id(x)] = y 230 for key, value in x.items(): --> 231 y[deepcopy(key, memo)] = deepcopy(value, memo) 232 return y File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/runtime/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil) 151 copier = getattr(x, "__deepcopy__", None) 152 if copier is not None: --> 153 y = copier(memo) 154 else: 155 reductor = dispatch_table.get(cls) File /mnt/xarfuse/uid-27416/e8d86d8d-seed-nspid4026533405_cgpid10356714-ns-4026533402/torch/_tensor.py:86, in Tensor.__deepcopy__(self, memo) 84 return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo) 85 if not self.is_leaf: ---> 86 raise RuntimeError( 87 "Only Tensors created explicitly by the user " 88 "(graph leaves) support the deepcopy protocol at the moment. " 89 "If you were attempting to deepcopy a module, this may be because " 90 "of a torch.nn.utils.weight_norm usage, " 91 "see pytorch/pytorch#103001" 92 ) 93 if id(self) in memo: 94 return memo[id(self)] RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment. If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see pytorch/pytorch#103001 ``` The reason is the constant tensor has the grad_fn when run it without the contextmgr Reviewed By: angelayi Differential Revision: D58436236 fbshipit-source-id: 565810b662f0eb086d02ffff9e16d7c86c0304c4

Add parametrization version of weight_norm

9c9e8f3

Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

ezyang requested review from lezcano, albanD, jbschlosser and mikaylagawarecki as code owners June 5, 2023 19:28

github-actions bot requested review from antoniojkim, bdhirsh, miladm, SherlockNoMad, voznesenskym and wconstab June 5, 2023 19:28

ezyang added a commit that referenced this pull request Jun 5, 2023

Add parametrization version of weight_norm

583ecea

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 62f063e486e5141d225f19de597892e6c3e12fd6 Pull Request resolved: #103001

ezyang added release notes: nn release notes category topic: new features topic category labels Jun 5, 2023

ezyang mentioned this pull request Jun 5, 2023

Use new parametrization based weight norm if available huggingface/transformers#24030

Merged

5 tasks

ezyang added a commit to ezyang/transformers that referenced this pull request Jun 5, 2023

Use new parametrization based weight norm if available

5fb6dc9

See pytorch/pytorch#103001 Signed-off-by: Edward Z. Yang <ezyang@meta.com>

ezyang mentioned this pull request Jun 5, 2023

Add Wav2Vec2 HuggingFace support #103009

Closed

ezyang mentioned this pull request Jun 5, 2023

weight_norm cannot be deepcopy #102981

Closed

Skylion007 reviewed Jun 5, 2023

View reviewed changes

torch/nn/utils/parametrizations.py Outdated Show resolved Hide resolved

ezyang mentioned this pull request Jun 5, 2023

[proposal] [discussion] Refactor pruning/weight_norm using new Reparametrization functionality + actually deprecate old impl of SpectralNorm #7313

Open

ezyang requested a review from t-vi June 5, 2023 19:55

lezcano reviewed Jun 5, 2023

View reviewed changes

ezyang added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 6, 2023

albanD approved these changes Jun 6, 2023

View reviewed changes

pytorchmergebot added the merging label Jun 6, 2023

pytorchmergebot added the Merged label Jun 6, 2023

pytorchmergebot closed this in ba962fe Jun 6, 2023

pytorchmergebot removed the merging label Jun 6, 2023

vadimkantorov reviewed Jun 6, 2023

View reviewed changes

facebook-github-bot deleted the gh/ezyang/2133/head branch June 9, 2023 14:16

lezcano mentioned this pull request Jul 4, 2023

Preserve weight_g/weight_v accessors on new weight_norm #102999

Open

sanchit-gandhi mentioned this pull request Sep 4, 2023

[VITS] Handle deprecated weight norm huggingface/transformers#25946

Merged

Quinticx mentioned this pull request Sep 7, 2023

[BUG]MARL PPO render fails in observation deepcopy pytorch/rl#1509

Closed

3 tasks

kit1980 mentioned this pull request Jan 11, 2024

Add torch.nn.utils.weight_norm to deprecated symbols pytorch-labs/torchfix#9

Merged

Goh-IE mentioned this pull request Mar 1, 2024

[BUG] 'RuntimeError' related to deepcopy ai4co/rl4co#123

Closed

leiouultraman mentioned this pull request Apr 5, 2024

An error occurred when replacing the convolution module ultralytics/ultralytics#9566

Closed

GAOTAO04 mentioned this pull request Jun 21, 2024

流式推理报错，这个怎么处理 6drf21e/ChatTTS_colab#49

Open

Jimi-from-mars mentioned this pull request Jun 21, 2024

0.0.6版出现无法继续工作的问题，汇报 6drf21e/ChatTTS_colab#50

Open

Arjeus mentioned this pull request Jun 26, 2024

Copying KAN models KindXiaoming/pykan#285

Open

eye4got mentioned this pull request Jun 27, 2024

Training and validation loss over the epochs unit8co/darts#811

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parametrization version of weight_norm #103001

Add parametrization version of weight_norm #103001

ezyang commented Jun 5, 2023 •

edited

Loading

pytorch-bot bot commented Jun 5, 2023 •

edited

Loading

ezyang commented Jun 5, 2023

lezcano Jun 5, 2023

lezcano Jun 5, 2023

ezyang Jun 6, 2023

ezyang Jun 6, 2023

lezcano Jun 6, 2023

albanD left a comment

albanD Jun 6, 2023

ezyang Jun 6, 2023

vadimkantorov Jun 6, 2023

ezyang commented Jun 6, 2023

pytorchmergebot commented Jun 6, 2023

vadimkantorov Jun 6, 2023 •

edited

Loading

ezyang Jun 6, 2023

vadimkantorov Jun 7, 2023

ezyang Jun 7, 2023

lezcano Jun 8, 2023

lezcano Jun 8, 2023

vadimkantorov Jun 8, 2023

vadimkantorov Jun 8, 2023 •

edited

Loading

lezcano Jun 8, 2023

vadimkantorov Jun 8, 2023

Add parametrization version of weight_norm #103001

Add parametrization version of weight_norm #103001

Conversation

ezyang commented Jun 5, 2023 • edited Loading

pytorch-bot bot commented Jun 5, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103001

❌ 1 New Failure

ezyang commented Jun 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang commented Jun 6, 2023

pytorchmergebot commented Jun 6, 2023

Merge started

vadimkantorov Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vadimkantorov Jun 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang commented Jun 5, 2023 •

edited

Loading

pytorch-bot bot commented Jun 5, 2023 •

edited

Loading

vadimkantorov Jun 6, 2023 •

edited

Loading

vadimkantorov Jun 8, 2023 •

edited

Loading