Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use functional interface for softmax in attention #14198

Merged
merged 2 commits into from
Nov 30, 2021

Conversation

t-vi
Copy link
Contributor

@t-vi t-vi commented Oct 28, 2021

There are several instances of (ab)using the PyTorch modular interface to compute softmax where it would be more natural to use the functional interface. This patch changes the occurrences I found.

Copy link
Contributor

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Tom.

Let's just remove the torch. prefix for consistency. nn should already be imported.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your PR! I see a few left (in research_projects/bertabs and tests), is there a reason to leave those?

@t-vi
Copy link
Contributor Author

t-vi commented Oct 28, 2021

Probably it's an attention thing. :)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@t-vi
Copy link
Contributor Author

t-vi commented Nov 28, 2021

Closing this as stale.

@t-vi t-vi closed this Nov 28, 2021
@sgugger
Copy link
Collaborator

sgugger commented Nov 28, 2021

Out of curiosity, why didn't you want to fix the last two instances and closed your PR instead?

@t-vi
Copy link
Contributor Author

t-vi commented Nov 30, 2021

@sgugger To be honest, I'd have preferred to not comment on this further. But as you asked:

So from my side, this is what happened

  • I use transformers as a reference for some custom implementation (i.e. I don't plan on using transformers itself) and notice a dubious code pattern of instantiating Softmax and immediately using it.

  • Out of courtesy, I submit a PR. To make the review worthwhile, I look for other instances of the identical pattern and fix these, too.
    At this point, I submitted a patch that looked "very low risk (because I never say obviously correct), easy to review" to me.
    It probably doesn't fix all dubious patterns in transformers, but it takes out 28 instances spread across as many files.

  • The patch is promptly approved by you and stas. Thank you!

Then instead of merging:

  • You point to two other directories that you say contain more of it (I still don't know what you mean, there are uses of the modular Softmax, but in a perfectly OK way to me, I didn't want to audit your entire codebase for dubious patterns).
  • Nothing happens for a month.
  • Now your bot says the PR is stale and places the burden of moving it along on me.

I agree with the bot that the PR isn't moving with the expected speed and but I don't want to spend more time with it. I thought that searching for the same patterns across your codebase and fixing the other 27 places would have been a reasonable trade-off between "don't submit 28 identical one-liners" and "don't make it more complicated than it needs to be", obviously, you did not agree. That is OK, but I don't want to do the extra work that would be needed to make this patch acceptable to you.

So you work at scale and process hundreds of patches any given day and I am not saying your process isn't adequate. It is just not for me.

@sgugger
Copy link
Collaborator

sgugger commented Nov 30, 2021

There seems to have been a misunderstanding here and I just wish you had either told us that the remaining instances you had found were fine in your book, or that you didn't want to work further on this PR. It was just a suggestion on my side and I never said the PR would not be merged if you didn't include those last two instances. I apologize if my comment upset you.

If you want to reopen your PR, we'll be happy to merge it.

@t-vi t-vi reopened this Nov 30, 2021
@t-vi
Copy link
Contributor Author

t-vi commented Nov 30, 2021

No worries, there is nothing wrong with it nor with your comment, it's just that I don't want to change this patch anymore. (And maybe I wasn't in the mood for your bot comment, but hey.)
If it's still useful, I'm happy to have it merged, if not, it's OK too.

@sgugger sgugger merged commit 6ed9882 into huggingface:master Nov 30, 2021
@sgugger
Copy link
Collaborator

sgugger commented Nov 30, 2021

The bot is there to remind us when we forget PRs for a long time like this one, sorry about it :-)
Thanks again for your contribution!

@t-vi
Copy link
Contributor Author

t-vi commented Nov 30, 2021

Thank you! You're awesome!

Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
* use functional interface instead of instantiating module and immediately calling it

* fix torch.nn.functional to nn.functional. Thank you Stas!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants