Add batching in TokenClassificationPipeline #11251

parakalan · 2021-04-14T16:53:46Z

What does this PR do?

Currently, the NER pipeline in transformers iterates through the list of input sentences and processes them, sequentially.
This PR adds batching support in the pipeline to decrease latency and use GPU more efficiently.

Relevant Issue :- #11244

Benchmark Report

Without Batching (CPU)

Device: CPU
No. examples: 1000
Time taken: 283.27826976776123

Device: GPU
No. examples: 1000
Time taken: 17.89318561553955

Please check the benchmark gist here

Without Batching (CPU)

Device: CPU
No. examples: 1000
Batch Size: 512
Time taken: 121.81582999229431

Device: GPU
No. examples: 1000
Batch Size: 512
Time taken: 2.780881404876709

Please check the benchmark gist here

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. - Batching in NER pipeline #11244
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

LysandreJik · 2021-04-15T19:24:55Z

FYI there is also work done on this pipeline in #10568 if you want to give it a look! It doesn't concern batching, however.

parakalan · 2021-04-16T04:12:04Z

Thanks, let me check that out.

parakalan · 2021-04-16T06:00:01Z

Please review this @LysandreJik , @Narsil , @joshdevins

Narsil

I appreciate the intent of this PR, but I think it's most likely to hit users in their own foot without them even realizing. I'm disapproving this PR for that reason (and it actually raises questions about other pipelines implementations)

Pipelines are aimed at inference, and in inference, batching is almost always detrimental. I've modified the proposed example with a slightly less nice example to display the problem:
https://gist.github.com/Narsil/ee5c09875e74fa6f018dc6d014f6c06c.

Model batch size 1  
Device: CPU (i7-4790)
No. examples: 200
Time taken: 10.693870067596436
Model batch size 2
Device: CPU (batched)
No. examples: 200
Time taken: 10.754503965377808
Model batch size 2
Device: CPU (batched 2nd)
No. examples: 200
Time taken: 34.8472113609314
-------------------------------------------------- (GTX 970)
Model batch size 1
Device: GPU
No. examples: 200
Time taken: 0.7129480838775635
Model batch size 2
Device: GPU (batched)
No. examples: 200
Time taken: 0.708709716796875
Model batch size 2
Device: GPU (batched 2nd)
No. examples: 200
Time taken: 8.97895359992981

The core of the issue is the extra padding tokens created while running inference. Those are very bad for overall efficiency, and in a live system it is most likely to occur (from experience, I can say that it can be overwhelmingly bad). It's almost never worth it to attempt batching in a live production, unless you're very sure about the alignment problem.

That being noted, other pipelines do sometimes do batching, and it can be effective if used properly, even if it is very hard at the pipeline level (because we're receiving strings and not TOKENS, which should be used for nice alignment)

Narsil · 2021-04-16T09:05:02Z

src/transformers/pipelines/token_classification.py

@@ -30,6 +31,7 @@ class TokenClassificationArgumentHandler(ArgumentHandler):

    def __call__(self, inputs: Union[str, List[str]], **kwargs):

+        model_batch_size = kwargs.get("model_batch_size", 1)


You are actually using this as a boolean, not an int.

Narsil · 2021-04-16T09:06:12Z

src/transformers/pipelines/token_classification.py

-                        input_ids = tokens["input_ids"].cpu().numpy()[0]
+            if self.framework == "tf":
+                if model_batch_size > 1:
+                    warnings.warn("The `model_batch_size` argument is not supported for Tensorflow models. Ignoring")


You're actually breaking tensorflow, right ? because you're not checking all sentences anymore.

parakalan · 2021-05-13T18:44:20Z

Closing this PR based on @Narsil's review. Thanks

Sudharsan Thirumalai added 2 commits April 14, 2021 08:46

Close open files to suppress ResourceWarning

2a55271

Merge branch 'master' of https://github.com/huggingface/transformers

cad1ea4

This was referenced Apr 15, 2021

Batching in NER pipeline #11244

Closed

NER Pipeline Issue #8942

Closed

parakalan changed the title ~~WIP: Add batching in TokenClassificationPipeline~~ Add batching in TokenClassificationPipeline Apr 15, 2021

Sudharsan Thirumalai added 7 commits April 15, 2021 23:46

Merge branch 'master' of https://github.com/huggingface/transformers

f9c8d2b

Add batching in TokenClassificationPipeline

5a965ec

Fix TokenClassificationArgumentHandler test case

04bd7cf

Fix failing test case | Add a test for batching

56e138c

Fix NoneType bug

f67ee50

Fix batch_size argument for tensorflow models.

a4ca5c4

Fix formatting

0f492f7

parakalan force-pushed the pipeline-batching branch from f8bfd17 to 0f492f7 Compare April 15, 2021 18:17

Narsil reviewed Apr 16, 2021

View reviewed changes

Narsil mentioned this pull request Apr 20, 2021

Adding AutomaticSpeechRecognitionPipeline. #11337

Merged

5 tasks

parakalan closed this May 13, 2021

LysandreJik mentioned this pull request May 19, 2021

Add batching to pipelines #11761

Closed

NielsRogge mentioned this pull request Jun 16, 2021

Batched pipeline for NER #12195

Closed

This was referenced Aug 23, 2021

Implement a batch_size parameter in the pipeline object #13141

Closed

[Large PR] Entire rework of pipelines. #13308

Merged

Narsil mentioned this pull request Oct 11, 2021

Adding batch_size support for (almost) all pipelines #13724

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batching in TokenClassificationPipeline #11251

Add batching in TokenClassificationPipeline #11251

parakalan commented Apr 14, 2021 •

edited

Loading

LysandreJik commented Apr 15, 2021

parakalan commented Apr 16, 2021

parakalan commented Apr 16, 2021

Narsil left a comment •

edited

Loading

Narsil Apr 16, 2021

Narsil Apr 16, 2021

parakalan commented May 13, 2021

		@@ -30,6 +31,7 @@ class TokenClassificationArgumentHandler(ArgumentHandler):

		def __call__(self, inputs: Union[str, List[str]], **kwargs):

		model_batch_size = kwargs.get("model_batch_size", 1)

Add batching in TokenClassificationPipeline #11251

Add batching in TokenClassificationPipeline #11251

Conversation

parakalan commented Apr 14, 2021 • edited Loading

What does this PR do?

Benchmark Report

Without Batching (CPU)

Without Batching (CPU)

Before submitting

LysandreJik commented Apr 15, 2021

parakalan commented Apr 16, 2021

parakalan commented Apr 16, 2021

Narsil left a comment • edited Loading

Choose a reason for hiding this comment

Narsil Apr 16, 2021

Choose a reason for hiding this comment

Narsil Apr 16, 2021

Choose a reason for hiding this comment

parakalan commented May 13, 2021

parakalan commented Apr 14, 2021 •

edited

Loading

Narsil left a comment •

edited

Loading