Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch_first support in MHA and update docs #839

Merged
merged 21 commits into from
Jul 15, 2020

Conversation

zhangguanheng66
Copy link
Contributor

No description provided.

@zhangguanheng66 zhangguanheng66 marked this pull request as draft June 23, 2020 16:00
@codecov
Copy link

codecov bot commented Jun 23, 2020

Codecov Report

Merging #839 into master will increase coverage by 0.00%.
The diff coverage is 87.50%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #839   +/-   ##
=======================================
  Coverage   77.43%   77.44%           
=======================================
  Files          43       44    +1     
  Lines        3045     3055   +10     
=======================================
+ Hits         2358     2366    +8     
- Misses        687      689    +2     
Impacted Files Coverage Δ
torchtext/nn/modules/__init__.py 100.00% <ø> (ø)
torchtext/nn/modules/multiheadattention.py 92.40% <85.71%> (ø)
torchtext/__init__.py 88.00% <100.00%> (ø)
torchtext/nn/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07abf6d...c74a914. Read the comment docs.

r""" A multi-head attention container

Args:
nhead: the number of heads in the multiheadattention model
in_proj_container: A container of multi-head in-projection linear layers (a.k.a nn.Linear).
attention_layer: The attention layer.
attention_layer: The custom attention layer. The input sent from MHA container to the attention layer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also take care of broadcasting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom attention layer needs to take care of broadcasting. Updated the doc to reflect this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd then augment the shape to (..., seq, batch, feature) and explain what that means and also that it's optional, i.e. enough to only handle 3-dim.


Examples::
>>> SDP = torchtext.models.ScaledDotProduct(0.1)
>>> SDP = torchtext.modules.ScaledDotProduct(dropout=0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mirror the pytorch path conventions here?

torchtext.nn and torchtext.nn.functional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other two domains use torchvision/audio.models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this isn't a model, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Will fix it.

@zhangguanheng66 zhangguanheng66 marked this pull request as ready for review July 15, 2020 00:47
@zhangguanheng66 zhangguanheng66 merged commit b021bf9 into pytorch:master Jul 15, 2020
@zhangguanheng66 zhangguanheng66 deleted the revised_mha branch July 16, 2020 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants