New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

BT enablement on fairseq - fairseq change #4480

Closed

frank-wei wants to merge 8 commits into facebookresearch:main from frank-wei:export-D37082681

frank-wei commented Jun 12, 2022

Summary:
as titled and depends on D36057338
Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position.

In summary:
Accuracy: accuracy loss due to the fp16, the maximum diff is around 0.009. If we set it to fp32, there is no accuracy loss
Perf: the current fairseq has similar speed as vanilla version. After the enablement, the speedup is similar to standalone BT test.
With batch size=64
For V100, the speedup reaches to 1.23x
For A100, the speedup reaches to 1.38x

After enable nested tensor,
For V100, the speedup reaches to 2.46x

Reviewed By: mikekgfb

Differential Revision: D37082681

facebook-github-bot added CLA Signed fb-exported labels

Contributor

facebook-github-bot commented Jun 12, 2022

This pull request was exported from Phabricator. Differential Revision: D37082681

frank-wei force-pushed the export-D37082681 branch from 6569bf9 to 16cffd9 Compare

June 12, 2022 17:29

Contributor

facebook-github-bot commented Jun 12, 2022

This pull request was exported from Phabricator. Differential Revision: D37082681

1 similar comment

Contributor

facebook-github-bot commented Jun 14, 2022

This pull request was exported from Phabricator. Differential Revision: D37082681

frank-wei force-pushed the export-D37082681 branch from 16cffd9 to 3e13d39 Compare

June 14, 2022 06:27


          BT enablement on fairseq - fairseq change (facebookresearch#4480)

354e459

Summary:
Pull Request resolved: facebookresearch#4480

as titled and depends on D36057338
Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position.

In summary:
Accuracy: accuracy loss due to the fp16, the maximum diff is around 0.009. If we set it to fp32, there is no accuracy loss
Perf: the current fairseq has similar speed as vanilla version. After the enablement, the speedup is similar to standalone BT test.
With batch size=64
For V100, the speedup reaches to 1.23x
For A100, the speedup reaches to 1.38x

After enable nested tensor,
For V100, the speedup reaches to 2.46x

Reviewed By: mikekgfb

Differential Revision: D37082681

fbshipit-source-id: 507c22e7fb7ca0d86962adfb85d83fd8e0c6b71a

Contributor

facebook-github-bot commented Jun 14, 2022

This pull request was exported from Phabricator. Differential Revision: D37082681

frank-wei force-pushed the export-D37082681 branch from 89b287b to 354e459 Compare

June 14, 2022 07:45

Wei Wei added 4 commits

June 14, 2022 09:46


          bug fix

30dcb23


          for 5 script tests, skip them if version not >= 1.13.0.dev20220613

ab7c541


          bug fix

a0130fa


          fix linter

c44190f

mikekgfb approved these changes

View reviewed changes


          BT enablement on fairseq - fairseq change (facebookresearch#4480)

6b27c28

Summary:
Pull Request resolved: facebookresearch#4480

as titled and depends on D36057338
Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position.

In summary:
Accuracy: accuracy loss due to the fp16, the maximum diff is around 0.009. If we set it to fp32, there is no accuracy loss
Perf: the current fairseq has similar speed as vanilla version. After the enablement, the speedup is similar to standalone BT test.
With batch size=64
For V100, the speedup reaches to 1.23x
For A100, the speedup reaches to 1.38x

After enable nested tensor,
For V100, the speedup reaches to 2.46x

Reviewed By: mikekgfb

Differential Revision: D37082681

fbshipit-source-id: e7adf3ed6f07ad1e72e0b86cf8f6cedfe2731305

Contributor

facebook-github-bot commented Jun 15, 2022

This pull request was exported from Phabricator. Differential Revision: D37082681

frank-wei force-pushed the export-D37082681 branch from c44190f to 6b27c28 Compare

June 15, 2022 04:06

Wei Wei added 2 commits

June 14, 2022 21:29


          update comments


          Merge branch 'export-D37082681' of https://github.com/frank-wei/fairseq…

427c3f5

… into export-D37082681

dianaml0 reviewed

View reviewed changes

fairseq/modules/transformer_layer.py

-                              attn_mask.to(torch.bool), -1e8 if x.dtype == torch.float32 else -1e4
+                      if self.training:
+                          self.ever_training = True

Contributor

dianaml0 Jun 15, 2022

Why have a variable that is the same as self.training?

Author

frank-wei Jun 15, 2022

That is for the scenario where loading ckpt+continue training. We'd like to prevent this case to go fast path.

tests/test_sequence_generator.py

@@ @@ -113,7 +140,9 @@ def _test_save_and_load(self, scripted_module): @@
               JIT_MSG = "Targeting OSS scriptability for the 1.6 release"
-              @unittest.skipIf(torch.__version__ < "1.6.0", JIT_MSG)
+              @unittest.skipIf(

Contributor

dianaml0 Jun 15, 2022

You can add the necessary pytorch version to the CI testing to make sure this test works and passes.

Author

frank-wei Jun 15, 2022

I can add one nightly version like 1.13.0.dev20220613 as one more test.

Contributor

dianaml0 Jun 15, 2022

That sounds good to me

facebook-github-bot closed this in

3a757d7

Author

frank-wei commented Jun 16, 2022 •

edited

Loading

@dianaml0 The flow creation could be separate from this PR so I landed it first.
I am not allowed to create any flow or yaml file as I met such error:
new_flow -> new_flow (refusing to allow a Personal Access Token to create or update workflow .github/workflows/build_1_13.yml without workflow scope)

Could you help to create a flow with new yaml file? It should be 99% the same with your current build.yml but changed the line 31 to:

 run: pip3 install --pre torch==1.13.0.dev20220613 -f https://download.pytorch.org/whl/nightly/torch_nightly.html

facebook-github-bot added the Reverted label

Contributor

facebook-github-bot commented Jun 22, 2022

This pull request has been reverted by 956fcf4.

lzzk pushed a commit to lzzk/fairseq that referenced this pull request


          BT enablement on fairseq - fairseq change (facebookresearch#4480)

f47d753

Summary:
Pull Request resolved: facebookresearch#4480

as titled and depends on D36057338
Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position.

In summary:
Accuracy: accuracy loss due to the fp16, the maximum diff is around 0.009. If we set it to fp32, there is no accuracy loss
Perf: the current fairseq has similar speed as vanilla version. After the enablement, the speedup is similar to standalone BT test.
With batch size=64
For V100, the speedup reaches to 1.23x
For A100, the speedup reaches to 1.38x

After enable nested tensor,
For V100, the speedup reaches to 2.46x

Reviewed By: mikekgfb

Differential Revision: D37082681

fbshipit-source-id: 984266f850fc30603e48be56e41ac2c67da080f5

DarrenCook mentioned this pull request

Release 0.12.2 is broken due to #4480 and #4513 #4782

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment