Support factory kwargs in torch.nn modules #54508

jbschlosser · 2021-03-23T15:15:10Z

Continuation of #53144

facebook-github-bot · 2021-03-23T15:15:23Z

💊 CI failures summary and remediations

As of commit 044ff1d (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_test1 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: Found no NVIDIA driver on your sy...ver from http://www.nvidia.com/Download/index.aspx

177c:2744 @ 00497531 - LdrpGetProcedureAddress - INFO: Locating procedure "FlsGetValue" by name
(177c.2744): C++ EH exception - code e06d7363 (first chance)
(177c.2744): C++ EH exception - code e06d7363 (first chance)
(177c.2744): C++ EH exception - code e06d7363 (first chance)
(177c.2744): C++ EH exception - code e06d7363 (first chance)
(177c.2744): C++ EH exception - code e06d7363 (first chance)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\circleci\project\build\win_tmp\build\torch\cuda\__init__.py", line 170, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
177c:2744 @ 00497578 - LdrLoadDll - ENTER: DLL name: api-ms-win-appmodel-runtime-l1-1-2
177c:2744 @ 00497578 - LdrpPreprocessDllName - INFO: DLL api-ms-win-appmodel-runtime-l1-1-2 was redirected to C:\Windows\SYSTEM32\kernel.appcore.dll by API set
177c:2744 @ 00497578 - LdrpLoadDllInternal - ENTER: DLL name: C:\Windows\SYSTEM32\kernel.appcore.dll
177c:2744 @ 00497578 - LdrpFindKnownDll - ENTER: DLL name: kernel.appcore.dll
177c:2744 @ 00497578 - LdrpFindKnownDll - RETURN: Status: 0x00000000
177c:2744 @ 00497578 - LdrpMinimalMapModule - ENTER: DLL name: C:\Windows\System32\kernel.appcore.dll
ModLoad: 00007ffa`89660000 00007ffa`89671000   C:\Windows\System32\kernel.appcore.dll
177c:2744 @ 00497578 - LdrpMinimalMapModule - RETURN: Status: 0x00000000
177c:2744 @ 00497578 - LdrpFindDllActivationContext - INFO: Probing for the manifest of DLL "C:\Windows\System32\kernel.appcore.dll" failed with status 0xc000008a
177c:2744 @ 00497578 - LdrpPreprocessDllName - INFO: DLL api-ms-win-core-profile-l1-1-0.dll was redirected to C:\Windows\SYSTEM32\kernelbase.dll by API set

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ezyang · 2021-03-23T15:16:24Z

looks like you got some merge conflicts

ezyang · 2021-03-23T15:42:00Z

torch/__init__.py

@@ -472,6 +472,24 @@ def is_warn_always_enabled():
    """
    return _C._get_warnAlways()

+
+def factory_kwargs(kwargs):


Need a docblock. Maybe something like:

""" Given kwargs, returns a canonicalized dict of factory kwargs that can be directly passed to factory functions like torch.empty, or errors if unrecognized kwargs are present. This function makes it simple to write code like this:: class MyModule(nn.Module): def __init__(self, **kwargs): factory_kwargs = torch.factory_kwargs(kwargs) self.weight = Parameter(torch.empty(10, **factory_kwargs)) Why should you use this function instead of just passing `kwargs` along directly? 1. This function does error validation, so if there are unexpected kwargs we will immediately report an error, instead of deferring it to the factory call 2. This function supports a special `factory_kwargs` argument, which can be used to explicitly specify a kwarg to be used for factory functions, in the event one of the factory kwargs conflicts with an already existing argument in the signature (e.g., in the signature ``def f(dtype, **kwargs)``, you can specify ``dtype`` for factory functions, as distinct from the dtype argument, by saying ``f(dtype1, factory_kwargs={"dtype": dtype2})``) """

note that the block needs to be in the function to be recognized as doc for the function IIRC

whoops fixed!

ezyang · 2021-03-23T15:43:23Z

torch/nn/modules/activation.py


-    def _reset_parameters(self):
+    def reset_parameters(self):


nit: idly wondering if we should define _reset_parameters calling to reset_parameters for BC, ha!

Haha, I thought the same, but decided against it originally because _reset_parameters is technically private?

Happy to add it though!

ezyang · 2021-03-23T16:33:30Z

torch/nn/modules/adaptive.py

            )

            self.tail.append(projection)

+        if reset_parameters:
+            self.reset_parameters()


This seems wrong; you don't have to explicitly reset parameters here because the propagated reset parameters arguments should have handled it already

Yep, good catch! fixed

ezyang · 2021-03-23T16:41:54Z

torch/nn/modules/transformer.py

        self.dropout1 = Dropout(dropout)
        self.dropout2 = Dropout(dropout)

        self.activation = _get_activation_fn(activation)

+        if reset_parameters:
+            self.reset_parameters()


Probably better not to have this one either

Fixed this one and TransformerDecoderLayer

ezyang · 2021-03-23T16:42:52Z

I don't want to hold this too much on tests, but there may be some simple things we can do in the generic nn Module tests; e.g., instead of constructing the Module with just the known arguments, try also passing in device cpu.

jbschlosser · 2021-03-26T04:54:39Z

torch/nn/modules/activation.py

        else:
            self.bias_k = self.bias_v = None

        self.add_zero_attn = add_zero_attn

-        self._reset_parameters()
+        MultiheadAttention.reset_parameters(self)


This is ugly, and apparently necessary for any base class. With self.reset_parameters() instead, instantiating a subclass errors out:

subclass.__init__() is called

super().__init__() is called

In the base class constructor, self.reset_parameters() calls the overriding subclass.reset_parameters()

subclass.reset_parameters() tries to reset parameters or buffers that haven't been created yet - error

Maybe this syntax should be made universal so we don't have to decide which classes can be base classes. Or some other workaround?

jbschlosser · 2021-03-26T04:56:49Z

test/test_module_init.py

+# Returns a database of args & kwargs that can be used to construct each module.
+# Each entry is in class -> (args, kwargs) format.
+# Example: torch.nn.Linear -> ([10, 5], {})
+def build_constructor_arg_db():


Alternatively, there's a similar DB in common_nn.py or something. It didn't have constructor args for every module, and in some cases, I need to pick different args to ensure paths are taken that create params / buffers.

Whenever ModuleInfo (analogous to the new OpInfo) comes about, this should be dealt with there.

+1 for ModuleInfo (just imagine if there was already a basic version!)

Why is this a function that returns a dict instead of just being a dict?

The function is left over from when I started with the common_nn.py new_module_tests and only filled in entries for the missing modules. Could make it just a dict but I kinda liked the mildly functional style

jbschlosser · 2021-03-26T05:03:21Z

I'll rebase tomorrow :)

albanD

Thanks for working on this Joel!

The changes for factory kwargs look good to me.
Also the generic tests look quite good!

I am not as convinced by the reset_parameters changes though.
I feel like it is somewhere between being a new API on nn.Module and just some internal convention we use.

If we want it to be the second, then my thinking is:

We should not modify the base nn.Module
We shouldn't need to add empty methods everywhere (check if the child Module has that method or not?)
Make that method internal so that it does not lead to issue if users implement another method with the same name
You can use .apply() as well to run this method on all the Modules that have that method without the need to implement custom functions on every single containers.

If we actually decide to go with the first one, I think we should go all in and get all the benefits from such a significant change.
In particular, we want to make it clear to the user which type of Module they are implementing and using. And there should be a clear benefit for implementing a structured version. Being able to call the default initialization independently is one, but why can't we leverage that to automatically call it during initialization as well?

albanD · 2021-03-26T13:21:30Z

torch/__init__.py

@@ -472,6 +472,24 @@ def is_warn_always_enabled():
    """
    return _C._get_warnAlways()

+
+def factory_kwargs(kwargs):


note that the block needs to be in the function to be recognized as doc for the function IIRC

albanD · 2021-03-26T13:21:48Z

torch/__init__.py

+
+    class MyModule(nn.Module):
+        def __init__(self, **kwargs):
+            factory_kwargs = torch.factory_kwargs(kwargs)


Why is this in torch. and not torch.nn. ?

Good point - I'll move it to torch.nn

albanD · 2021-03-26T13:26:13Z

torch/nn/parameter.py

@@ -145,8 +145,9 @@ class UninitializedParameter(UninitializedTensorMixin, Parameter):

    cls_to_become = Parameter

-    def __new__(cls, requires_grad=True):
-        data = torch.Tensor()
+    def __new__(cls, requires_grad=True, **kwargs) -> None:


Should we update some doc to mention that this takes more arguments than the regular nn.Parameter()?

I think so- the UninitializedParameter / UninitializedBuffer docs seem like the right place to do it

albanD · 2021-03-26T13:26:55Z

torch/nn/quantizable/modules/activation.py

@@ -55,18 +55,20 @@ class MultiheadAttention(nn.MultiheadAttention):
    def __init__(self, embed_dim: int, num_heads: int,
                 dropout: float = 0., bias: bool = True,
                 add_bias_kv: bool = False, add_zero_attn: bool = False,
-                 kdim: int = None, vdim: int = None):
+                 kdim: int = None, vdim: int = None, **kwargs) -> None:
+        factory_kwargs = torch.factory_kwargs(kwargs)


Why do you unpack them here? Why not just pass them throw as is?

You do this is a couple other places.

Auto-pilot updates :) I'll remove the unnecessary unpacking for subclasses

ezyang · 2021-03-26T17:41:03Z

adding @mruberry for the new testing infra

ezyang · 2021-03-26T17:46:40Z

We shouldn't need to add empty methods everywhere (check if the child Module has that method or not?)

This is not good because you cannot distinguish if a Module has parameters but forgot to implement reset_parameters, or if it has no parameters and it is supposed to not have it. This is one of the reasons why adding the empty reset_parameters is a good idea, it makes it explicit "yes, we checked, and no, the reset parameters here is a noop".

You can use .apply() as well to run this method on all the Modules that have that method without the need to implement custom functions on every single containers.

While this seems reasonable, I am torn because I don't want a default implementation of this. Maybe a helper function for recursively calling reset parameters on all submodules is the doctor's order.

why can't we leverage that to automatically call it during initialization as well?

Can't, we historically didn't specify if you call super constructor first or last in the constructor, I don't think we can assume that it is called after the parameters are setup.

mruberry · 2021-03-29T01:19:30Z

test/test_module_init.py

@@ -0,0 +1,330 @@
+import inspect


To run the tests add the file name here:

pytorch/test/run_test.py

Line 31 in d4045e9

TESTS = [

And verify the tests appear in the PR CI output.

mruberry · 2021-03-29T01:21:29Z

test/test_module_init.py

+# Example: torch.nn.Linear -> ([10, 5], {})
+def build_constructor_arg_db():
+    return {
+        torch.nn.AdaptiveAvgPool1d: ([5], {}),


If someone adds a new module, will they know all the lists they need to add it to?

Any module in __all__ for torch.nn modules (and the various quantization modules) will be tested here and an error will be thrown indicating that an entry should be added here.

mruberry · 2021-03-29T01:22:26Z

test/test_module_init.py

+        torch.nn.AdaptiveAvgPool2d: ([5], {}),
+        torch.nn.AdaptiveAvgPool3d: ([5], {}),
+        torch.nn.AdaptiveLogSoftmaxWithLoss: ([100, 20, [5, 10, 15]], {}),
+        torch.nn.AdaptiveMaxPool1d: ([5], {}),


Style nit (would not block PR on this or make this change if it's a nontrivial amount of work): prefer tuples to lists

Any particular reason? I don't mind doing the work to make the change, but it makes it harder to read imo

mruberry · 2021-03-29T01:23:07Z

test/test_module_init.py

+
+
+# Instantiates the given class with the given args, kwargs, optionally on a given device.
+def instantiate_class(cls, args, kwargs, device=None):


This is interesting. Why is device not part of kwargs?

args / kwargs are used to instantiate each type of module, and it simplified the calls to just add in device separately when it's needed. But you're right that I could alternatively add device to kwargs

Whatever's easiest.

mruberry · 2021-03-29T01:30:56Z

test/test_module_init.py

+
+# Returns a function that calls the real implementation of a method
+# in addition to passing args to a mock object.
+def mock_wrapper(method):


Use wraps (https://docs.python.org/3/library/functools.html#functools.wraps)

I couldn't get wraps to work for this specific use case where I don't have a particular object. This wrapper is used below to hook into all parameter creations or buffer registrations. If you know how to use wraps for this, let me know and I'll change it.

Interesting. I suppose this can wait until it's an issue (and it may never be an issue).

mruberry · 2021-03-29T01:31:14Z

test/test_module_init.py

+
+# Returns a function that calls the real implementation of a method
+# in addition to passing args to a mock object.
+def mock_wrapper(method):


Why call the mock?

Calling the mock logs that the method was called

mruberry · 2021-03-29T01:34:27Z

test/test_module_init.py

+                if module_creates_params_or_buffers and module_cls not in MODULES_WITHOUT_KWARGS_SUPPORT:
+                    args, kwargs = get_example_args(module_cls, constructor_arg_db, device=device)
+
+                    # if module_cls in LAZY_MODULES:


What's up with this comment?

This is testing logic for LazyTensors that should be put back in once meta tensor functionality is expanded. Not sure whether this PR should wait for that so the tests will be better?

I probably wouldn't wait (although you know better). I would just remove this comment (possibly replacing it with a TODO) or add a meta-comment explaining why this section is commented out.

Luckily enough, the support is in now so I uncommented it!

mruberry · 2021-03-29T01:42:00Z

test/test_module_init.py

+            module_cls = getattr(mod_namespace, module_name)
+            if module_cls in MODULES_TO_SKIP: continue
+
+            # Create a function to run the test and setattr it onto the test class.


I'm not saying this should be the PR that creates ModuleInfo... but MAYBE this (or a follow-up) should be the PR that does it. In particular, generators like this that are lengthy and complicated was one of the reasons we wanted to switch to decorating test templates that look like more typical tests.

I for one am excited for ModuleInfo! As discussed offline, we'll meet sometime shortly after this PR to design what that could look like.

Also adding a TODO in the code to refactor the constructor arg DB into ModuleInfo.

mruberry · 2021-03-29T01:43:40Z

test/test_module_init.py

+            test_name = f'test_{namespace_basename}_{module_name}'
+            setattr(TestModuleInit, test_name, run_test)
+
+    instantiate_device_type_tests(test_cls, globals())


I would put this call on the current line 327 for readability and consistency with other test files

mruberry · 2021-03-29T01:44:22Z

test/test_module_init.py

+    return args, kwargs
+
+
+def generate_tests(test_cls, constructor_arg_db):


Add a comment describing what this test tests

Test logic is actually below in run_test(), and its checks are commented

I split the logic out for generating tests for a single module, so the separation between that and iterating over all the modules should be clearer now

ezyang

Giving approval modulo testing; please look at mruberry's comments

ngimel · 2021-04-20T03:04:50Z

Reverting (again) because it broke asan https://app.circleci.com/pipelines/github/pytorch/pytorch/304608/workflows/d07158c1-b75a-49a1-a3c6-a41371d0bce3/jobs/12561730

facebook-github-bot · 2021-04-20T03:08:40Z

This pull request has been reverted by 92d24e3.

facebook-github-bot · 2021-04-21T15:07:15Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-04-21T16:18:52Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2021-04-21T20:32:09Z

ASAN is failing in the same way on PR

jbschlosser · 2021-04-21T20:37:11Z

ASAN is failing in the same way on PR

Yep sorry for the mess. I thought the land would have stopped if the problem was unfixed but it went right through :/

Unlanding now and will make sure it's actually fixed before re-landing.

facebook-github-bot · 2021-04-22T13:32:53Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Continuation of pytorch#53144 Pull Request resolved: pytorch#54508 Reviewed By: mrshenli Differential Revision: D27600457 Pulled By: jbschlosser fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1

Summary: Continuation of pytorch#53144 Pull Request resolved: pytorch#54508 Reviewed By: bdhirsh Differential Revision: D27855386 Pulled By: jbschlosser fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c

Summary: Continuation of pytorch#53144 Pull Request resolved: pytorch#54508 Reviewed By: malfet Differential Revision: D27909732 Pulled By: jbschlosser fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76

Summary: Continuation of pytorch#53144 Pull Request resolved: pytorch#54508 Reviewed By: albanD Differential Revision: D27939544 Pulled By: jbschlosser fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805

Needed since PyTorch v1.9 (pytorch/pytorch#54508)

jbschlosser requested a review from ezyang March 23, 2021 15:15

jbschlosser requested a review from albanD as a code owner March 23, 2021 15:15

facebook-github-bot added the cla signed label Mar 23, 2021

ezyang reviewed Mar 23, 2021

View reviewed changes

gchanan mentioned this pull request Mar 25, 2021

Support nn.Module.to(..., copy_data=False) #54600

Closed

jbschlosser commented Mar 26, 2021

View reviewed changes

albanD reviewed Mar 26, 2021

View reviewed changes

ezyang requested a review from mruberry March 26, 2021 17:40

jbschlosser force-pushed the module_kwargs branch from 7de5112 to 0912b88 Compare March 26, 2021 20:20

mruberry reviewed Mar 29, 2021

View reviewed changes

ezyang approved these changes Mar 29, 2021

View reviewed changes

facebook-github-bot closed this in 40483ac Apr 19, 2021

jbschlosser reopened this Apr 21, 2021

jbschlosser force-pushed the module_kwargs branch from 60431cf to 5ee4848 Compare April 21, 2021 14:44

Support device / dtype kwargs for torch.nn modules

6dec5bf

jbschlosser force-pushed the module_kwargs branch from 5ee4848 to 6dec5bf Compare April 21, 2021 16:18

facebook-github-bot closed this in 5a09def Apr 21, 2021

jbschlosser reopened this Apr 21, 2021

jbschlosser added 2 commits April 21, 2021 13:45

Skip the correct EmbeddingBag test to make ASan happy

dfdaa57

Block list test for Embedding; it uses buggy qembeddingbag_prepack

044ff1d

facebook-github-bot closed this in febff45 Apr 22, 2021

jbschlosser mentioned this pull request May 3, 2021

[POC] Pass on Module kwargs to Parameter initialization #53144

Closed

jbschlosser mentioned this pull request Jan 31, 2022

[feature request] [discussion] Generalize / recommend behavior of reset_parameters (and potentially rename) #71404

Open

acairncross added a commit to MyrtleSoftware/myrtle-vision that referenced this pull request May 11, 2022

Handle kwargs being passed to weight quantizer

042b629

Needed since PyTorch v1.9 (pytorch/pytorch#54508)

acairncross added a commit to MyrtleSoftware/myrtle-vision that referenced this pull request May 11, 2022

Handle kwargs being passed to weight quantizer

af00449

Needed since PyTorch v1.9 (pytorch/pytorch#54508)



		# Instantiates the given class with the given args, kwargs, optionally on a given device.
		def instantiate_class(cls, args, kwargs, device=None):

		return args, kwargs


		def generate_tests(test_cls, constructor_arg_db):

Support factory kwargs in torch.nn modules #54508

Support factory kwargs in torch.nn modules #54508

Conversation

jbschlosser commented Mar 23, 2021

facebook-github-bot commented Mar 23, 2021 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda10.1_test1 (1/1)

ezyang commented Mar 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang Mar 23, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbschlosser Mar 23, 2021 • edited

Choose a reason for hiding this comment

ezyang commented Mar 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbschlosser commented Mar 26, 2021

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang commented Mar 26, 2021

ezyang commented Mar 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mruberry Mar 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang left a comment

Choose a reason for hiding this comment

ngimel commented Apr 20, 2021

facebook-github-bot commented Apr 20, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

ngimel commented Apr 21, 2021

jbschlosser commented Apr 21, 2021 • edited

facebook-github-bot commented Apr 22, 2021

facebook-github-bot commented Mar 23, 2021 •

edited

ezyang Mar 23, 2021 •

edited

jbschlosser Mar 23, 2021 •

edited

mruberry Mar 29, 2021 •

edited

jbschlosser commented Apr 21, 2021 •

edited