-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batching in TokenClassificationPipeline #11251
Conversation
f8bfd17
to
0f492f7
Compare
FYI there is also work done on this pipeline in #10568 if you want to give it a look! It doesn't concern batching, however. |
Thanks, let me check that out. |
Please review this @LysandreJik , @Narsil , @joshdevins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the intent of this PR, but I think it's most likely to hit users in their own foot without them even realizing. I'm disapproving this PR for that reason (and it actually raises questions about other pipelines implementations)
- Pipelines are aimed at inference, and in inference, batching is almost always detrimental. I've modified the proposed example with a slightly less nice example to display the problem:
https://gist.github.com/Narsil/ee5c09875e74fa6f018dc6d014f6c06c.
Model batch size 1
Device: CPU (i7-4790)
No. examples: 200
Time taken: 10.693870067596436
Model batch size 2
Device: CPU (batched)
No. examples: 200
Time taken: 10.754503965377808
Model batch size 2
Device: CPU (batched 2nd)
No. examples: 200
Time taken: 34.8472113609314
-------------------------------------------------- (GTX 970)
Model batch size 1
Device: GPU
No. examples: 200
Time taken: 0.7129480838775635
Model batch size 2
Device: GPU (batched)
No. examples: 200
Time taken: 0.708709716796875
Model batch size 2
Device: GPU (batched 2nd)
No. examples: 200
Time taken: 8.97895359992981
The core of the issue is the extra padding tokens created while running inference. Those are very bad for overall efficiency, and in a live system it is most likely to occur (from experience, I can say that it can be overwhelmingly bad). It's almost never worth it to attempt batching in a live production, unless you're very sure about the alignment problem.
- That being noted, other pipelines do sometimes do batching, and it can be effective if used properly, even if it is very hard at the pipeline level (because we're receiving strings and not TOKENS, which should be used for nice alignment)
@@ -30,6 +31,7 @@ class TokenClassificationArgumentHandler(ArgumentHandler): | |||
|
|||
def __call__(self, inputs: Union[str, List[str]], **kwargs): | |||
|
|||
model_batch_size = kwargs.get("model_batch_size", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are actually using this as a boolean, not an int.
input_ids = tokens["input_ids"].cpu().numpy()[0] | ||
if self.framework == "tf": | ||
if model_batch_size > 1: | ||
warnings.warn("The `model_batch_size` argument is not supported for Tensorflow models. Ignoring") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're actually breaking tensorflow, right ? because you're not checking all sentences anymore.
Closing this PR based on @Narsil's review. Thanks |
What does this PR do?
Currently, the NER pipeline in transformers iterates through the list of input sentences and processes them, sequentially.
This PR adds batching support in the pipeline to decrease latency and use GPU more efficiently.
Relevant Issue :- #11244
Benchmark Report
Without Batching (CPU)
Device: CPU
No. examples: 1000
Time taken: 283.27826976776123
Device: GPU
No. examples: 1000
Time taken: 17.89318561553955
Please check the benchmark gist here
Without Batching (CPU)
Device: CPU
No. examples: 1000
Batch Size: 512
Time taken: 121.81582999229431
Device: GPU
No. examples: 1000
Batch Size: 512
Time taken: 2.780881404876709
Please check the benchmark gist here
Before submitting
Pull Request section?
to it if that's the case. - Batching in NER pipeline #11244
documentation guidelines, and
here are tips on formatting docstrings.