Fix TransformerEncoderLayer for bias=False #116760

mikaylagawarecki · 2024-01-04T04:15:28Z

Don't call torch._transformer_encoder_layer_fwd when bias=False

bias=False was not something that torch._transformer_encoder_layer_fwd was meant to work with, it was my bad that this wasn't tested as I approved #101687.

bias=False was causing the tensor_args in TransformerEncoder/TransformerEncoderLayer to contain Nones and error on checks for the fastpath like t.requires_grad for t in tensor_args.

Alternative fix would be to

Pass torch.zeros_like({*}.weight) to the kernel when bias=False and filter tensor_args as appropriate
Fix torch._transformer_encoder_layer_fwd to take Optional<Tensor> for biases and fix the kernels as appropriate

Let me know if these approaches are preferable

Stack from ghstack (oldest at bottom):

-> Fix TransformerEncoderLayer for bias=False #116760

[ghstack-poisoned]

pytorch-bot · 2024-01-04T04:15:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/116760

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit c9db481 with merge base a8a9695 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3.11-clang10 / test (dynamo, 7, 7, linux.2xlarge) (gh)
torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype0

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-py3.11-clang10 / test (dynamo, 4, 7, linux.2xlarge) (gh)
torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_float32
pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 7, linux.2xlarge) (gh)
torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_both_abstract
pull / linux-focal-py3.8-clang10 / test (dynamo, 4, 7, linux.2xlarge) (gh)
torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_float32
pull / linux-focal-py3.8-clang10 / test (dynamo, 7, 7, linux.2xlarge) (gh)
torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype0

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikaylagawarecki · 2024-01-04T04:28:50Z

torch/testing/_internal/common_modules.py

+    # Samples below where we pass reference_fn are for validating the fast path,
+    # since the fast path requires no_grad mode, we run the fast path in .eval()
+    # and no_grad() in the reference_fn and verify that against the results in train mode.
    def fast_path_reference_fn(module, parameters, *args, **kwargs):
-        assert not module.training
-        module = module.train(True)
-        output = module(*args, **kwargs)
-        module = module.train(False)
+        assert module.training
+        module.train(False)
+        with torch.no_grad():
+            output = module(*args, **kwargs)
+        module.train(True)
        return output

-    if not training:
-        for norm_first in (True, False):
+    if training:
+        for norm_first, bias in itertools.product((True, False), (True, False)):
            samples.append(
                ModuleInput(
-                    constructor_input=FunctionInput(4, 2, 8, dropout=0.0, batch_first=True, norm_first=norm_first),
+                    constructor_input=FunctionInput(
+                        4, 2, 8, dropout=0.0, batch_first=True, norm_first=norm_first, bias=bias
+                    ),
                    forward_input=FunctionInput(
                        make_input((2, 3, 4)),
                    ),
-                    reference_fn=fast_path_reference_fn,
-                    desc="fast_path_norm_first" if norm_first else "fast_path"
+                    # fastpath doesn't run when bias=False
+                    reference_fn=fast_path_reference_fn if bias else None,
+                    desc=f'fastpath_{bias}_norm_first_{norm_first}'


The existing fastpath test was not actually testing the fastpath against training mode since the fastpath is only run in no_grad mode and TestModule.test_forward is not run under no_grad context

I modified this such that the outputs in train mode are compared against a reference that runs the fastpath in eval/no_grad mode

I modified this such that the outputs in train mode are compared against a reference that runs the fastpath in eval/no_grad mode

possibly dumb Q: do we need to compare train mode outputs vs. fastpath reference?

from offline discussion: this swaps the reference to use fastpath in order to run it under no_grad mode since dropout=0 the outputs should match

Fixes #116385 Don't call `torch._transformer_encoder_layer_fwd` when `bias=False` (which sets biases to `None`). This also prevents us from ever doing checks on properties of `tensor_args` in `TransformerEncoder`/`TransformerEncoderLayer` which contained the Nones and was erroring on checks like `t.requires_grad for t in tensor_args`. Alternative fix would be to 1) Pass `torch.zeros_like({*}.weight)` to the kernel when `bias=False` and filter `tensor_args` as appropriate 2) Fix `torch._transformer_encoder_layer_fwd` to take `Optional<Tensor>` for biases Let me know if this approach is preferable [ghstack-poisoned]

mikaylagawarecki · 2024-01-04T04:39:19Z

@pytorchbot rebase

pytorchmergebot · 2024-01-04T04:41:02Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-01-04T04:41:07Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict gh/mikaylagawarecki/172/orig returned non-zero exit code 1

Rebasing (1/1)
Auto-merging test/test_transformers.py
Auto-merging torch/nn/modules/transformer.py
CONFLICT (content): Merge conflict in torch/nn/modules/transformer.py
error: could not apply 594e9fa8bb9... Fix TransformerEncoderLayer fastpath for bias=False
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 594e9fa8bb9... Fix TransformerEncoderLayer fastpath for bias=False

Raised by https://github.com/pytorch/pytorch/actions/runs/7405594863

Fixes #116385 Don't call `torch._transformer_encoder_layer_fwd` when `bias=False` (which sets biases to `None`). This also prevents us from ever doing checks on properties of `tensor_args` in `TransformerEncoder`/`TransformerEncoderLayer` which contained the Nones and was erroring on checks like `t.requires_grad for t in tensor_args`. Alternative fix would be to 1) Pass `torch.zeros_like({*}.weight)` to the kernel when `bias=False` and filter `tensor_args` as appropriate 2) Fix `torch._transformer_encoder_layer_fwd` to take `Optional<Tensor>` for biases and fix the kernels as appropriate Let me know if these approaches is preferable [ghstack-poisoned]

Fixes #116385 Don't call `torch._transformer_encoder_layer_fwd` when `bias=False` `bias=False` was not something that `torch._transformer_encoder_layer_fwd` was meant to work with, it was my bad that this wasn't tested as I approved #101687. `bias=False` was causing the `tensor_args` in [`TransformerEncoder`](https://github.com/pytorch/pytorch/blob/a17de2d6455e262f9b514584443ac60cf381bc85/torch/nn/modules/transformer.py#L364-L378)/[`TransformerEncoderLayer`](https://github.com/pytorch/pytorch/blob/a17de2d6455e262f9b514584443ac60cf381bc85/torch/nn/modules/transformer.py#L663-L677) to contain `None`s and error on checks for the fastpath like `t.requires_grad for t in tensor_args`. Alternative fix would be to 1) Pass `torch.zeros_like({*}.weight)` to the kernel when `bias=False` and filter `tensor_args` as appropriate 2) Fix `torch._transformer_encoder_layer_fwd` to take `Optional<Tensor>` for biases and fix the kernels as appropriate Let me know if these approaches are preferable [ghstack-poisoned]

ghstack-source-id: 920f89b Pull Request resolved: #116760

mikaylagawarecki · 2024-01-04T05:07:05Z

torch/testing/_internal/common_modules.py

-                                     batch_first=True,
+                                     batch_first=batch_first,
                                     kwargs_to_batchify={'tgt_key_padding_mask': 0, 'memory_key_padding_mask': 0}),
-                desc='no_batch_dim_batch_first'


It feels like batch_first should not affect no_batch_dim so not entirely sure why we have these tests, but didn't delete them in case they are testing an edge case

I do think it's a good black-box testing surface to cover. IIRC there was a problem at some point where some batch_first setting blew up on no-batch-dim inputs.

jbschlosser

Thanks! Fix looks solid, couple minor testing comments for ModuleInfo tests. As long as things pass, I'm good with it :)

jbschlosser · 2024-01-04T21:18:49Z

torch/testing/_internal/common_modules.py

-                                     batch_first=True,
+                                     batch_first=batch_first,
                                     kwargs_to_batchify={'tgt_key_padding_mask': 0, 'memory_key_padding_mask': 0}),
-                desc='no_batch_dim_batch_first'


I do think it's a good black-box testing surface to cover. IIRC there was a problem at some point where some batch_first setting blew up on no-batch-dim inputs.

jbschlosser · 2024-01-04T21:20:08Z

torch/testing/_internal/common_modules.py

+    # Samples below where we pass reference_fn are for validating the fast path,
+    # since the fast path requires no_grad mode, we run the fast path in .eval()
+    # and no_grad() in the reference_fn and verify that against the results in train mode.
    def fast_path_reference_fn(module, parameters, *args, **kwargs):
-        assert not module.training
-        module = module.train(True)
-        output = module(*args, **kwargs)
-        module = module.train(False)
+        assert module.training
+        module.train(False)
+        with torch.no_grad():
+            output = module(*args, **kwargs)
+        module.train(True)
        return output

-    if not training:
-        for norm_first in (True, False):
+    if training:
+        for norm_first, bias in itertools.product((True, False), (True, False)):
            samples.append(
                ModuleInput(
-                    constructor_input=FunctionInput(4, 2, 8, dropout=0.0, batch_first=True, norm_first=norm_first),
+                    constructor_input=FunctionInput(
+                        4, 2, 8, dropout=0.0, batch_first=True, norm_first=norm_first, bias=bias
+                    ),
                    forward_input=FunctionInput(
                        make_input((2, 3, 4)),
                    ),
-                    reference_fn=fast_path_reference_fn,
-                    desc="fast_path_norm_first" if norm_first else "fast_path"
+                    # fastpath doesn't run when bias=False
+                    reference_fn=fast_path_reference_fn if bias else None,
+                    desc=f'fastpath_{bias}_norm_first_{norm_first}'


I modified this such that the outputs in train mode are compared against a reference that runs the fastpath in eval/no_grad mode

possibly dumb Q: do we need to compare train mode outputs vs. fastpath reference?

mikaylagawarecki · 2024-01-04T21:47:31Z

@pytorchbot merge

pytorchmergebot · 2024-01-04T21:49:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fix TransformerEncoderLayer fastpath for bias=False

6c128bf

[ghstack-poisoned]

mikaylagawarecki added topic: bug fixes topic category release notes: nn release notes category labels Jan 4, 2024

mikaylagawarecki commented Jan 4, 2024

View reviewed changes

mikaylagawarecki changed the title ~~Fix TransformerEncoderLayer fastpath for bias=False~~ Fix TransformerEncoderLayer for bias=False Jan 4, 2024

mikaylagawarecki marked this pull request as ready for review January 4, 2024 04:36

mikaylagawarecki requested review from albanD and jbschlosser as code owners January 4, 2024 04:36

mikaylagawarecki added a commit that referenced this pull request Jan 4, 2024

Fix TransformerEncoderLayer fastpath for bias=False

b784153

ghstack-source-id: 920f89b Pull Request resolved: #116760

mikaylagawarecki commented Jan 4, 2024

View reviewed changes

albanD removed their request for review January 4, 2024 11:05

jbschlosser approved these changes Jan 4, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 4, 2024

pytorchmergebot added the merging label Jan 4, 2024

pytorchmergebot added Merged and removed merging labels Jan 5, 2024

pytorchmergebot closed this in d0cf218 Jan 5, 2024

facebook-github-bot deleted the gh/mikaylagawarecki/172/head branch January 8, 2024 15:23

Fix TransformerEncoderLayer for bias=False #116760

Fix TransformerEncoderLayer for bias=False #116760

Uh oh!

Conversation

mikaylagawarecki commented Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/116760

✅ You can merge normally! (5 Unrelated Failures)

Uh oh!

mikaylagawarecki Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbschlosser Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki commented Jan 4, 2024

Uh oh!

pytorchmergebot commented Jan 4, 2024

Uh oh!

pytorchmergebot commented Jan 4, 2024

Uh oh!

mikaylagawarecki Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbschlosser Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

jbschlosser Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

jbschlosser Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki commented Jan 4, 2024

Uh oh!

pytorchmergebot commented Jan 4, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mikaylagawarecki commented Jan 4, 2024 •

edited

Loading

pytorch-bot bot commented Jan 4, 2024 •

edited

Loading

mikaylagawarecki Jan 4, 2024 •

edited

Loading

mikaylagawarecki Jan 4, 2024 •

edited

Loading