Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagger: use unnormalized probabilities for inference #10197

Merged
merged 8 commits into from
Mar 15, 2022

Conversation

danieldk
Copy link
Contributor

@danieldk danieldk commented Feb 3, 2022

Description

Using unnormalized softmax avoids use of the relatively expensive exp function,
which can significantly speed up non-transformer models (e.g. I got a speedup
of 27% on a German tagging + parsing pipeline).

Types of change

Performance improvement

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

Draft since this requires explosion/thinc#583 and a new Thinc version.

Using unnormalized softmax avoids use of the relatively expensive exp function,
which can significantly speed up non-transformer models (e.g. I got a speedup
of 27% on a German tagging + parsing pipeline).
@danieldk danieldk added enhancement Feature requests and improvements feat / tagger Feature: Part-of-speech tagger perf / speed Performance: speed and removed enhancement Feature requests and improvements labels Feb 3, 2022
@adrianeboyd
Copy link
Contributor

adrianeboyd commented Feb 3, 2022

There are users who use the scores directly and (I assume) would be expecting them to be normalized. I'm not saying we necessarily shouldn't do this, but this proposal doesn't give these users any way to control this behavior, does it?

@danieldk
Copy link
Contributor Author

danieldk commented Feb 3, 2022

There are users who use the scores directly and (I assume) would be expecting them to be normalized. I'm not saying we necessarily shouldn't do this, but this proposal doesn't give these users any way to control this behavior, does it?

We could make this configurable. Maybe we should also only provide this functionality in a Tagger.v2 so that if anyone uses the probabilities it doesn't break for them (and they could opt-in to normalized probabilities with Tagger.v2)?

@adrianeboyd
Copy link
Contributor

That sounds like a reasonable proposal.

Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's a good idea to make this configurable in a new version of the Tagger.

I find it slightly unintuitive to go from a v1 with always normalization, to a v2 with default normalization off. It might still catch some users off-guard that the behaviour changes between the two versions when you rely on defaults. But then again it is documented in the docs, and I can see why this would be beneficial in most cases / for most users. So I'm leaning towards keeping it like the PR has it currently :-)

website/docs/api/architectures.md Outdated Show resolved Hide resolved
@svlandeg svlandeg added the v3.3 Related to v3.3 label Feb 16, 2022
@svlandeg svlandeg changed the base branch from develop to master February 16, 2022 14:48
@danieldk danieldk marked this pull request as ready for review March 14, 2022 12:06
@adrianeboyd adrianeboyd merged commit e5debc6 into explosion:master Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / tagger Feature: Part-of-speech tagger perf / speed Performance: speed v3.3 Related to v3.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants