-
Notifications
You must be signed in to change notification settings - Fork 26.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding batch_size
support for (almost) all pipelines
#13724
Adding batch_size
support for (almost) all pipelines
#13724
Conversation
batch_size
support for (almost) all pipelinesbatch_size
support for (almost) all pipelines
354bb4c
to
ecedb2e
Compare
batch_size
support for (almost) all pipelinesbatch_size
support for (almost) all pipelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fantastic PR and write-up! Thanks for doing all of the work.
The code looks okay to me, but there are a lot of small changes across pipelines - would it be possible to add comments where those changes are unintuitive so that we may better understand the need addressed? I added comments where I think those would be helpful.
The test changes are clean. Thanks for adding this layer which should make testing simpler for new pipelines.
Finally, the write-up is great, it would be ideal to add it to the documentation. Can you add it to the pipeline RST document?
src/transformers/pipelines/base.py
Outdated
k: element[self._unbatch_index].unsqueeze(0) | ||
if isinstance(element[self._unbatch_index], torch.Tensor) | ||
else np.expand_dims(element[self._unbatch_index], 0) | ||
if isinstance(element[self._unbatch_index], np.ndarray) | ||
else element[self._unbatch_index] | ||
for k, element in self._unbatch_data.items() | ||
if k != "past_key_values" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof this is a tough one to understand, it would be nice to spread it over different lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote it, hopefully it's better now, can you confirm ?
src/transformers/pipelines/base.py
Outdated
raise ValueError("Pipeline without tokenizer or feature_extractor cannot do batching") | ||
if tokenizer is not None: | ||
if tokenizer.pad_token_id is None: | ||
raise ValueError("Pipeline with tokenizer without pad_token cannot do batching") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice error raising! It would be nice to show how to attribute a padding token in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a padding_token that simply ? Wouldn't it be erasing an existing (likely used) token ?
Not sure what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally show how to do this with the following:
model.config.pad_token_id = model.config.eos_token_id
I think this is particularly important for the pipeline as users don't necessarily understand what/how to change the underlying model's attributes, so printing an example of that in the console would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for all your work on this!
src/transformers/pipelines/base.py
Outdated
self._unbatch_index = None | ||
self._unbatch_data = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those other variables, I would prefer unpack_xxx
to unbatch
personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I make the switch, I will make the switch for all variables so unpack_size
as I consider that these are completely linked, so using similar name is important.
I am fine with the name, even if I feel we loose the connection to the batch
concept.
Is that what you are implying ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With respect to other comment, I updated everything to loader_batch_*
which is better I think.
376923a
to
e2d6a6a
Compare
output_ids = self.model.generate(**model_inputs, **generate_kwargs) | ||
if self.model.config.is_encoder_decoder: | ||
start_position = 1 | ||
else: | ||
start_position = n | ||
return {"output_ids": output_ids[0, start_position:], "conversation": conversation} | ||
return {"output_ids": output_ids[:, start_position:], "conversation": conversation} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are changing the inference between forward
and postprocess
the have the batch in the tensors so batch/unbatch can happen.
@@ -204,26 +204,29 @@ def _forward(self, model_inputs): | |||
offset_mapping = model_inputs.pop("offset_mapping", None) | |||
sentence = model_inputs.pop("sentence") | |||
if self.framework == "tf": | |||
outputs = self.model(model_inputs.data)[0][0] | |||
logits = self.model(model_inputs.data)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are changing the inference between forward
and postprocess
the have the batch in the tensors so batch/unbatch can happen.
sentence = model_outputs["sentence"] | ||
input_ids = model_outputs["input_ids"][0] | ||
offset_mapping = model_outputs["offset_mapping"][0] if model_outputs["offset_mapping"] is not None else None | ||
special_tokens_mask = model_outputs["special_tokens_mask"][0].numpy() | ||
|
||
scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True) | ||
maxes = np.max(logits, axis=-1, keepdims=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logits trick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding that
filename = dataset[0]["file"] | ||
output = audio_classifier(filename) | ||
audio = dataset[0]["audio"]["array"] | ||
output = audio_classifier(audio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not relying on a filename anymore since the tests don't run ffmpeg anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thank you @Narsil. If it works for you, I would like for this PR to be merged after the v4.12.0 release (tomorrow Thursday) so that it gets a bit of testing on master
before being set in stone.
src/transformers/pipelines/base.py
Outdated
raise ValueError("Pipeline without tokenizer or feature_extractor cannot do batching") | ||
if tokenizer is not None: | ||
if tokenizer.pad_token_id is None: | ||
raise ValueError("Pipeline with tokenizer without pad_token cannot do batching") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally show how to do this with the following:
model.config.pad_token_id = model.config.eos_token_id
I think this is particularly important for the pipeline as users don't necessarily understand what/how to change the underlying model's attributes, so printing an example of that in the console would be helpful
sentence = model_outputs["sentence"] | ||
input_ids = model_outputs["input_ids"][0] | ||
offset_mapping = model_outputs["offset_mapping"][0] if model_outputs["offset_mapping"] is not None else None | ||
special_tokens_mask = model_outputs["special_tokens_mask"][0].numpy() | ||
|
||
scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True) | ||
maxes = np.max(logits, axis=-1, keepdims=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding that
- Not `zero-shot` (it's already passing stuff as batched so trickier) - Not `QA` (preprocess uses squad features, we need to switch to real tensors at this boundary.
and adressing comments.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
softmax trick.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
09a1db8
to
5cd831c
Compare
Release done, merging. |
…3724) * Tentative enabling of `batch_size` for pipelines. * Add systematic test for pipeline batching. * Enabling batch_size on almost all pipelines - Not `zero-shot` (it's already passing stuff as batched so trickier) - Not `QA` (preprocess uses squad features, we need to switch to real tensors at this boundary. * Adding `min_length_for_response` for conversational. * Making CTC, speech mappings avaiable regardless of framework. * Attempt at fixing automatic tests (ffmpeg not enabled for fast tests) * Removing ffmpeg dependency in tests. * Small fixes. * Slight cleanup. * Adding docs and adressing comments. * Quality. * Update docs/source/main_classes/pipelines.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/question_answering.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/zero_shot_classification.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Improving docs. * Update docs/source/main_classes/pipelines.rst Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> * N -> oberved_batch_size softmax trick. * Follow `padding_side`. * Supporting image pipeline batching (and padding). * Rename `unbatch` -> `loader_batch`. * unbatch_size forgot. * Custom padding for offset mappings. * Attempt to remove librosa. * Adding require_audio. * torchaudio. * Back to using datasets librosa. * Adding help to set a pad_token on the tokenizer. * Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Quality. Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
What does this PR do?
When running pipeline on a dataset, with a small model (relative to the GPU). It can be good to be able to batch
the
forward
pass for performance.This PR addresses this by adding
batch_size
argument.This PR contains
question-answering
andzero-shot-classification
. They are trickier because they already use batching with candidate labels and question features. The full solution would involve moving the iterator to real N [hypothesis, template] and batching there, and having another iterator on top that recreates the currentzero-shot
/question-answering
results. Should we add that capabilities, at least for these 2 pipelines we would have a much better idea of alignement.The good example (https://gist.github.com/Narsil/4e1c36d7cf8477e5c1d580585860810e):
This code was executed on GTX 970 (and Titan RTX with similar conclusions), model is
distilbert-base-uncased-finetuned-sst-2-english
(250Mo bin file)The old pipelines GPU method of iteration is excluded because it's an order of magnitude slower in all cases.
This seems promising !
However, this has:
Let's look at another example, which might (or not) be a bit more realistic:
Using varying size inputs (https://gist.github.com/Narsil/de88b2d7c242c29772a61af56a5c8270)
Here we can see, no speedup was achieved, and we actually crashed for large batch size.
This is entirely due to non alignment.
The problem can even be made worse, when you have large batch sizes, and RARE very long sentences (https://gist.github.com/Narsil/357519fd385d864bfec3caf5aa8df575).
Here we are actually 5x SLOWER on the batch_size=64 than on the non batched version. That is because the rare long sentence is so long, it actually forces the whole batch to be pad to its sequence length, and use much more memory and processing power (the padding tokens ARE processed by the GPU, they just don't influence the end result).
For users, a rule of thumb is:
There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Which is why for now:
tokenizer
/feature_processor
that don't have a padding mecanism (if they require it).It would be ideal if
pipelines
could start taking that responsibility on its shoulders and start batching dynamically for users but it's a hard problem right now:pipelines
would have to count them, do some kind of batch exclusion mecanism.Some other links/issues/discussions:
#11251
https://discuss.huggingface.co/t/how-to-change-the-batch-size-in-a-pipeline/8738
https://discuss.huggingface.co/t/how-to-make-pipeline-automatically-scale/7432
#13141
#12195
https://gist.github.com/Narsil/ee5c09875e74fa6f018dc6d014f6c06c
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@LysandreJik @sgugger
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.