Make DPMultiheadAttention drop-in compatible with nn.MultiheadAttention #529

Wei-1 · 2022-10-25T09:37:11Z

Summary: This PR is target to resolve #123 on GitHub by having an additional re-naming mechanism to match the state_dict structure of nn.MultiheadAttention.

Differential Revision: D40671870

facebook-github-bot · 2022-10-25T09:38:00Z

This pull request was exported from Phabricator. Differential Revision: D40671870

facebook-github-bot · 2022-10-25T09:48:55Z

This pull request was exported from Phabricator. Differential Revision: D40671870

Wei-1 · 2022-10-25T09:52:54Z

You can test the edited part by running pytest within the opacus folder.
And you should see the following expected result:

============= 162 passed, 41 skipped, 4411 warnings in 243.18s (0:04:03) =============

facebook-github-bot · 2022-10-25T12:34:05Z

This pull request was exported from Phabricator. Differential Revision: D40671870

facebook-github-bot · 2022-10-25T14:37:25Z

This pull request was exported from Phabricator. Differential Revision: D40671870

facebook-github-bot · 2022-10-25T15:27:25Z

This pull request was exported from Phabricator. Differential Revision: D40671870

ffuuugor

Thanks for the great work! Making this work is very intricate work and your efforts are hugely appreciated.

@alexandresablayrolles @karthikprasad What are your thoughts on having state_dict not represent the actual physical structure of the model? Loading is not a problem, since DPMultiheadAttention can load both DP and vanilla state dictionaries.

On the one hand, it makes it easy to interoperate with vanilla nn.MultiHeadAttention - you can train a model with DP, save it's state_dict and then load this dict into a non-DP model.

On the other hand, I can potentially see issues with unexpected state_dict: keys not matching parameter names. That said, I don't see any immediate problems, but I might be missing something.

ffuuugor · 2022-10-25T20:18:54Z

opacus/tests/dp_layers/dp_multihead_attention_test.py

+        if "in_proj_bias" in dp_attn.state_dict():
+            dp_attn._register_state_dict_hook(remove_in_proj_bias_hook)
+            self.assertFalse("in_proj_bias" in dp_attn.state_dict())


I'm not sure I understand the point of this assertion

The state_dict function is copied from the original nn.module, and one function that is has is to have the capability to apply hook to modify the state_dict. This test is basically validating the hook can also work in the new state_dict function in our code.
The reason why I didn't directly use the hook function as the core method to modify state_dict is because I think it might cause confusion when people are trying to add, modify, or remove other hook in their network.

Oh, I see, it is to check if state dict hooks still work, gotcha. Can you maybe add a comment explaining this?

ffuuugor · 2022-10-25T20:22:45Z

opacus/tests/dp_layers/dp_multihead_attention_test.py

@@ -126,3 +130,37 @@ def test_attn(
            need_weights=True,
            attn_mask=None,
        )
+
+        attn.load_state_dict(dp_attn.state_dict())


I would put this in a separate test with independently initialized modules. First, this would make it conceptually clearer and easier to understand potential fails. Second, attn and dp_attn has been initialized with the same weights (line 96), which defeats the purpose of this test

Make sense! I will do the modification tomorrow!

ffuuugor · 2022-10-25T21:18:58Z

opacus/layers/dp_multihead_attention.py

+    def named_parameters(self, prefix: str = '', recurse: bool = True):
+        return self.state_dict(prefix = prefix).items()


Tbh, I would not change named_parameters - having this method not match the actual structure of the model would be very confusing and would lead to unexpected behaviour:

named_parameters and parameters should return the same set of parameters

sometimes you need to modify the output of these methods and you want them to point to the actual objects used in the model

it would mess with calling model attributes directly

I think that is a valid concern.
Currently, in opacus/tests/dp_layers/common.py line-220, we check if the two named_parameters can match.

nn_params = dict(nn_module.named_parameters()) dp_params = dict(dp_module.named_parameters())

How about we just change them into:

nn_params = nn_module.state_dict() dp_params = dp_module.state_dict()

Since getting named_parameters and then change them back to dict is a bit weird as well.

Yeah, that make sense

Wei-1 · 2022-10-25T22:15:39Z

On the other hand, I can potentially see issues with unexpected state_dict: keys not matching parameter names. That said, I don't see any immediate problems, but I might be missing something.

As we try to cover the parameter naming logic in nn.MultiheadAttention and DPMultiheadAttention, I think the major problem might come with maintenance. Since the entire transformation logic is rule-based, things will most likely break when there are modifications in nn.MultiheadAttention or DPMultiheadAttention in their naming method.

facebook-github-bot · 2022-10-26T11:18:44Z

This pull request was exported from Phabricator. Differential Revision: D40671870

Wei-1 · 2022-10-26T12:15:21Z

@ffuuugor Changes had been made to address the concerns. Please let me know if this make sense to you!

facebook-github-bot · 2022-10-28T07:57:21Z

This pull request was exported from Phabricator. Differential Revision: D40671870

facebook-github-bot · 2022-10-28T08:01:47Z

This pull request was exported from Phabricator. Differential Revision: D40671870

ffuuugor · 2022-10-28T14:35:44Z

Thanks for addressing the comments!
I believe this PR is close to landing, but we need to sort out one thing first. Due to some bug in how CircleCI works with phabricator, our main testing pipeline is not being triggered on this PR.

I can see that it won't pass the linter check. Please refer to our Contributor's guide and run isort/black/flake commands to check your code it formatted properly

facebook-github-bot · 2022-10-28T15:16:11Z

This pull request was exported from Phabricator. Differential Revision: D40671870

Wei-1 · 2022-10-28T15:22:10Z

Thanks, @ffuuugor! An update had been made to address the lint consistency with Black/ISort/Flake8.

ffuuugor · 2022-11-01T10:38:40Z

Hey
Thanks for taking care of this.
One last thing - sometimes isort give different recommendations depending on the version. I have mine set up exactly as CircleCI and it gives the following:

--- a/opacus/layers/dp_multihead_attention.py
+++ b/opacus/layers/dp_multihead_attention.py
@@ -14,14 +14,13 @@
 # limitations under the License.

 import warnings
+from collections import OrderedDict

 import torch
 import torch.nn as nn
 import torch.nn.functional as F
 from torch.nn.parameter import Parameter

-from collections import OrderedDict
-

 class SequenceBias(nn.Module):

Can you pls make the change to make the linter happy?

And I'm really sorry for back and forth on this. Tests not triggering for fbcode-exported PRs is painful and we're investigating.

…on (#529) Summary: Pull Request resolved: #529 This PR is target to resolve #123 on GitHub by having an additional re-naming mechanism to match the `state_dict` structure of `nn.MultiheadAttention`. GitHub Issue Link: #123 Differential Revision: D40671870 fbshipit-source-id: b1e2a4526bde8c53fc01e30c65a33e8615e6ecce

facebook-github-bot · 2022-11-01T11:20:57Z

This pull request was exported from Phabricator. Differential Revision: D40671870

ffuuugor

Thank you!

Wei-1 · 2022-11-01T11:24:49Z

I just pushed a new version to address this issue! Please let me know if everything is good!

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Oct 25, 2022

Wei-1 mentioned this pull request Oct 25, 2022

DPMultiheadAttention is not drop-in compatible with nn.MultiheadAttention #123

Closed

ffuuugor reviewed Oct 25, 2022

View reviewed changes

Wei-1 requested a review from ffuuugor October 27, 2022 17:21

ffuuugor approved these changes Nov 1, 2022

View reviewed changes

facebook-github-bot closed this in f8087f3 Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make DPMultiheadAttention drop-in compatible with nn.MultiheadAttention #529

Make DPMultiheadAttention drop-in compatible with nn.MultiheadAttention #529

Wei-1 commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

Wei-1 commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

ffuuugor left a comment

ffuuugor Oct 25, 2022

Wei-1 Oct 25, 2022

ffuuugor Oct 26, 2022

ffuuugor Oct 25, 2022

Wei-1 Oct 25, 2022

ffuuugor Oct 25, 2022

Wei-1 Oct 25, 2022

ffuuugor Oct 26, 2022

Wei-1 commented Oct 25, 2022

facebook-github-bot commented Oct 26, 2022

Wei-1 commented Oct 26, 2022

facebook-github-bot commented Oct 28, 2022

facebook-github-bot commented Oct 28, 2022

ffuuugor commented Oct 28, 2022

facebook-github-bot commented Oct 28, 2022

Wei-1 commented Oct 28, 2022

ffuuugor commented Nov 1, 2022

facebook-github-bot commented Nov 1, 2022

ffuuugor left a comment

Wei-1 commented Nov 1, 2022

		def named_parameters(self, prefix: str = '', recurse: bool = True):
		return self.state_dict(prefix = prefix).items()

Make DPMultiheadAttention drop-in compatible with nn.MultiheadAttention #529

Make DPMultiheadAttention drop-in compatible with nn.MultiheadAttention #529

Conversation

Wei-1 commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

Wei-1 commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

facebook-github-bot commented Oct 25, 2022

ffuuugor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wei-1 commented Oct 25, 2022

facebook-github-bot commented Oct 26, 2022

Wei-1 commented Oct 26, 2022

facebook-github-bot commented Oct 28, 2022

facebook-github-bot commented Oct 28, 2022

ffuuugor commented Oct 28, 2022

facebook-github-bot commented Oct 28, 2022

Wei-1 commented Oct 28, 2022

ffuuugor commented Nov 1, 2022

facebook-github-bot commented Nov 1, 2022

ffuuugor left a comment

Choose a reason for hiding this comment

Wei-1 commented Nov 1, 2022