-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[transformer] BT enablement on fairseq - pytorch change #79186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 9c62434 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This pull request was exported from Phabricator. Differential Revision: D36057338 |
2 similar comments
This pull request was exported from Phabricator. Differential Revision: D36057338 |
This pull request was exported from Phabricator. Differential Revision: D36057338 |
Summary: X-link: pytorch/pytorch#79186 Pull Request resolved: facebookresearch#4468 as titled Ford the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position. In summary: Accuracy: accuracy loss due to the fp16, the maximum diff is around 0.009. If we set it to fp32, there is no accuracy loss Perf: the current fairseq has similar speed as vanilla version. After the enablement, the speedup is similar to standalone BT test. With batch size=64 For V100, the speedup reaches to 1.23x For A100, the speedup reaches to 1.38x After enable nested tensor, For V100, the speedup reaches to 2.46x Reviewed By: mikekgfb Differential Revision: D36057338 fbshipit-source-id: 229e72e6050bf70ddedcda8f47d158526910557f
This pull request was exported from Phabricator. Differential Revision: D36057338 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused about this - I just see this PR adding a _nested_tensor_from_mask_left_aligned
op. Some questions:
- How does this related to the PR summary? AFAICT just adding this op won't enable its use.
- How does this differ from
_nested_tensor_from_mask
- does it just avoid checks by making the assumption the mask is left-aligned? If so, couldn't we avoid quite a bit of duplication?
The diff is optimized to split into two parts now. Part1 only includes this op change and associated with D36057338. Part2 only includes fairseq change (D37082681) but depend on Part1. The internal CI tests show the op runs well.
It is the front part of "_nested_tensor_from_mask" implementation. But the purpose is to help check the left aligned in advance. |
Summary: The fairseq diff is split into two parts. The first diff (this one) This diff is about creating a mask left align function to check the mask condition for nested tensor. It is necessary for torchscript deployment. The second diff (D37082681) Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position. Perf in summary: Accuracy: accuracy loss due to the fp16, the maximum diff is around 0.009. If we set it to fp32, there is no accuracy loss Perf: the current fairseq has similar speed as vanilla version. After the enablement, the speedup is similar to standalone BT test. With batch size=64 For V100, the speedup reaches to 1.23x For A100, the speedup reaches to 1.38x After enable nested tensor, For V100, the speedup reaches to 2.46x Test Plan: In D37082681 Reviewed By: mikekgfb Differential Revision: D36057338 fbshipit-source-id: 2b824481481b8972d168f2751afa77cc9b3cbe02
This pull request was exported from Phabricator. Differential Revision: D36057338 |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
@pytorchbot successfully started a merge job. Check the current status here |
Hey @frank-wei. |
Summary: Pull Request resolved: #79186 The fairseq diff is split into two parts. The first diff (this one) This diff is about creating a mask left align function to check the mask condition for nested tensor. It is necessary for torchscript deployment. The second diff (D37082681) Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position. Perf in summary: Accuracy: accuracy loss due to the fp16, the maximum diff is around 0.009. If we set it to fp32, there is no accuracy loss Perf: the current fairseq has similar speed as vanilla version. After the enablement, the speedup is similar to standalone BT test. With batch size=64 For V100, the speedup reaches to 1.23x For A100, the speedup reaches to 1.38x After enable nested tensor, For V100, the speedup reaches to 2.46x Test Plan: In D37082681 Reviewed By: mikekgfb Differential Revision: D36057338 fbshipit-source-id: 0ba75c254ccc4b4a29702ab0e18a36b5d0e1d832
The fairseq diff is split into two parts. The first diff (this one) This diff is about creating a mask left align function to check the mask condition for nested tensor. It is necessary for torchscript deployment. The second diff (D37082681) Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position. Reviewed By: mikekgfb Differential Revision: D36057338 Pull Request resolved: #79186 Approved by: https://github.com/erichan1
The fairseq diff is split into two parts.
The first diff (this one)
This diff is about creating a mask left align function to check the mask condition for nested tensor. It is necessary for torchscript deployment.
The second diff (D37082681)
Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position.
Reviewed By: mikekgfb
Differential Revision: D36057338