[Large PR] Entire rework of pipelines. #13308

Narsil · 2021-08-27T16:21:17Z

What does this PR do?

tl;dr: Make pipeline code much more consistent and enable large speedups with GPU inference.

GPU pipeline

Currently the way pipeline are setup, it's kind of hard to keep the GPU busy 100% because we're not enabling the use of DataLoader (on pytorch), which is necessary to keep CPU working on next items to tokenize, while processing an item on GPU.

We cannot realistically use the current API to maximize utilization:

for item in dataset:
    # item == "This is some test" for instance
    output = pipe(item)   
    # output == {"label": "POSITIVE", "score": 0,99}

So we need to change up the API to something closer to what DataLoader does, which is use an iterable, which enables to have worker CPU threads process next items while the GPU is busy on the current one, meaning we're now using 100% of the GPU.

for output in pipe(dataset):
    # output == {"label": "POSITIVE", "score": 0,99}
    pass

In order to make that change possible, we need to separate better what happens on the CPU vs the GPU.
The proposed way is to split the call of pipeline into 3 distinct function calls

preprocess: in charge of taking the original pipeline input, and output a dict of everything necessary to do model(**model_inputs) for instance (or a generate call, but stuff that will really involve the GPU.
forward: In most cases it's a simple function call to the model forward method, but can be more complex depending on the pipeline. It needs to be separate from the other 2 because this is where the GPU might be used. so we can encapsulate more logic around this in the base class (no_grad, sending and retrieving tensors to/from GPU etc..)
postprocess: Usually links to processing the logits into something more user-friendly for the task at hand, again usually pretty fast and should happen on CPU (but should be so fast it does not matter really to have a separate thread for this).

In order to increase consistency across pipelines, ALL pipelines will have to implement the 3 methods, and should have a __call__ method (with exceptions discussed in consistency).
They should be readable on their own too, meaning, the outputs of preprocess should be exactly what is sent to forward and what is returned by forward exactly the inputs of preprocess. So:

model_inputs = pipe.preprocess(item)
model_outputs = pipe.forward(item)
outputs = pipe.postprocess(model_outputs)

will always be perfectly valid, even if not the most efficient.

Consistency of pipelines

Right now, pipelines are quite inconsistent in their returned outputs.

Some have parameters to change the output format (this is fine)
Most pipelines accept lists of items, and will return a list of outputs but:
- Some will return a single item only if the input was a list of a single item (regardless of what the inputs originally was)
- Some will do it better and return single item only if single item was sent
Some will use lists as batching, some will not, leading to slowdowns at best, OOM errors on large lists, and overall pretty poor efficiency on GPU (more info: Implement a batch_size parameter in the pipeline object #13141, Add batching in TokenClassificationPipeline #11251, Add batching in TokenClassificationPipeline #11251)

Batching on GPU seems like what is speeding up, things, but really it's not at inference times, batching in ML is used because of gradients and it's necessary for the gradient descent to be smooth, the speed part of the GPU is really linked to overall GPU usage, using DataLoader is the key part here. Nonetheless, sometimes, depending on actual hardware, pipeline, and input data, batching can be used efficiently, so the new design should enable that. However, it shouldn't be done the way it's currently setup, which is some pipelines do, some don't and no consistency overall, it should be done on a different layer than dataprocessing part of the pipeline.

Because of the inconsitencies mentionned above, this refactor will include some __call__ methods to change the return type based on what was previously there so (preprocess, forward and postprocess are mostly pure, while __call__ will handle backwards compatibilty)

Parameter handling

Another cause of concern for pipelines was parameter handling. Most parameters were sent to __call__ method, but some where sent to __init__. Some in both.

That meant that you would have to look though the docs to guess if you needed to to

pipe = pipeline(....., num_beams=2)
outputs = pipe(item)
# or
pipe = pipeline(....)
outputs = pipe(item, num_beams=2)

The goal in this PR, was to make that explicit, so BOTH will be supported and have the exact same behavior.
In order to do that, we introduced a new mandatory method set_parameters which would be called both in __call__ and __init__ in the same way so that it would always work.

Because this new set_parameters is a standard method, we can use it to properly discard unexpected keyword with a real errors instead of just ignoring it.
Because __init__ and __call__ are now base class only (roughly), we can capture parameters much better, meaning we don't have extra layer of parameter guessing (is it tokenization param, model param, pipeline param ?). Each method will capture everything it needs and pass on the rest, the ultimate method in the chain is set_parameters which might be specific parameters, or accept everything (like **generate_kwargs, so utlimately generate will have the final word).
Because set_parameters will be called at least 2 times and we don't know which one will have actual real values, it needs to be done in a somewhat odd way. The ways most pipelines will do, is simply have a default argument to None, so if the argument is None we know that the caller didn't supply this argument so we don't override it (the default one is defined in the __init__ if dynamic or directly in the class if static. This however does not work when None is a valid choice for some parameter, this is true only for zero-shot-classification test, where we specially test that we raise some error when passing None as a value (so it can probably be changed, but will be backward incompatible regarding tests). For those, more complex logic is required.
Because we're now using self as the holder for parameters that means that using threading mecanisms to run the pipelines might lead to some oddities (but people most likely aren't using 1 pipeline on different threads, most likely shouldn't be at least). Other options are possible but would passing them though all 3 functions preprocess, forward and postprocess reducing readability IMHO, for debattable gains.

Results

Currently we're sitting here performance wise

bench code

from transformers import pipeline
from transformers.pipelines.base import KeyDataset
import datasets
import tqdm

pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")

print("New style of pipeline")
for out in tqdm.tqdm(pipe(KeyDataset(dataset, "file"))):
    pass

print("Old style of pipeline")
for item in tqdm.tqdm(dataset):
    out = pipe(item["file"])

Speed (done on old suffering GTX 970):

Backward compatibility

We're currently sitting at 100% backward compatibility regarding tests. We're not however 100% backward compatible.
By fixing the inconsistencies of pipelines, we will break any code that was using parameters wrong (as they will suddenly start working or crashing because they're invalid).

Tensorflow

I mentionned DataLoader which will be used to great effectiveness on Pytorch + list inputs or Dataset input. (on single inference on GPU + pt, you will get a warning, prompting you to use more efficient methods)

On tensorflow however, more work is needed to make it faster there too. At the very least we shouldn't degrade performance too much, this has to be checked (both GPU and CPU). Ideally we would have a similar mecanism than DataLoader to maximise efficiency on GPU tensorflow.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Narsil · 2021-08-27T16:22:49Z

tests/test_pipelines_conversational.py

        self.assertEqual(
            inputs["input_ids"].tolist(), [[31373, 50256, 17250, 612, 0, 50256, 4919, 389, 345, 5633, 50256]]
        )

-        inputs = conversation_agent._parse_and_tokenize([conversation_1, conversation_2])


Not doable anymore, simple function can only handle single items at once

fine by me!

sgugger

This is a great refactor and I love how clean the code becomes! I've left a few comments suggestions. Some of it is linked to older code moved around, but let's take this opportunity to make it cleaner!

sgugger · 2021-08-31T12:22:40Z

src/transformers/pipelines/base.py

@@ -50,6 +55,11 @@
 logger = logging.get_logger(__name__)


+def collate_fn(items):
+    assert len(items) == 1, "This collate_fn is meant to be used with batch_size=1"


Nit: we are moving away from asserts in the code base, so please use and if xxx, raise ValueError.

Got it ! Any reason why ? (Just curious if there's a background on that decision)

See internal slack. The main gist is: asserts are more meant for debugging and will be ignored if you use python in optimized mode. This is not a debugging statement, so it should not use an assert.

src/transformers/pipelines/base.py

sgugger · 2021-08-31T12:27:46Z

src/transformers/pipelines/base.py

+        if self.call_count > 10 and self.framework == "pt" and self.device.type == "cuda":
+            warnings.warn(
+                "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset",
+                UserWarning,
+            )


Nice check!

Could eventually link to a doc showing how to use with a Dataset

Yes that would be ideal.

src/transformers/pipelines/question_answering.py

src/transformers/pipelines/table_question_answering.py

src/transformers/pipelines/text_generation.py

src/transformers/pipelines/token_classification.py

LysandreJik

This looks great! I'm not approving yet because I want to take a bit of time to play with the new pipeline.

LysandreJik · 2021-09-01T15:49:30Z

src/transformers/pipelines/audio_classification.py

+        return super().__call__(inputs, **kwargs)
+
+    def set_parameters(self, top_k=None, **kwargs):
+        # No parameters on this pipeline right now


There are some parameters on this pipeline

From what I understand this method is going to be called in __init__ and __call__; therefore I see no point in having the user call it themselves. I would put that method as private to clarify its role!

I understand why you would want it to be private. I have something in the back of my head so it should stay public, but not solid argument right now.

Okay, reflecting on this maybe forward should also become _forward too.
The reason is, the mother class defines infer_forward that would be a better target for users to call themselves.
It takes care of torch.no_grad, training=False, sending back and from the GPU.

It wouldn't break the logic that preprocess, _forward, postprocess should work out of the box for the pipelines writers/maintainers but would at least hint users to not use directly these functions. Maybe documenting that might be a good thing.

LysandreJik · 2021-09-01T16:09:09Z

src/transformers/pipelines/base.py

+def collate_fn(items):
+    if len(items) != 1:
+        raise ValueError("This collate_fn is meant to be used with batch_size=1")
+    return items[0]


Would it be possible to have its name be a bit more explicit? I understand where it's used thanks to its name, but I don't understand what it does

So no_collate_fn ? :d

LysandreJik · 2021-09-01T16:12:04Z

src/transformers/pipelines/base.py

+        return model_outputs
+
+    def get_iterator(self, inputs, num_workers: int):
+        os.environ["TOKENIZERS_PARALLELISM"] = "false"


(nitpick) This has potentially undesirable side effects in the user's setup as it sets this environment variable for the duration of the runtime

True, which way would you recommend, only modify it if not set, otherwise don't touch it ?

Or never touch it ?

Or unset it sometime later

I think unsetting it sometime later is fine - if it's not obvious how to do it in a clean way I would ignore my comment and just let it be :)

LysandreJik · 2021-09-01T16:12:49Z

src/transformers/pipelines/base.py

+        if self.call_count > 10 and self.framework == "pt" and self.device.type == "cuda":
+            warnings.warn(
+                "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset",
+                UserWarning,
+            )


Could eventually link to a doc showing how to use with a Dataset

LysandreJik

This is great, tested extensively! I think it would be great to do two things before merging:

I see no tests are added for the introduced changes. I think it's great that the past behavior still works, but it would be nice to test the added functionalities as well; for example the preprocess, forward, set_parameters, postprocess methods, as well as the batching ability, the possibility to consume a Dataset, etc.
A feature without docs is a feature no-one knows about :) Would be nice to add a doc explaining how it works. The PR description can definitely be re-used!

LysandreJik · 2021-09-03T09:31:26Z

src/transformers/pipelines/audio_classification.py

+        if top_k is not None:
+            self.top_k = top_k
+        if self.top_k > self.model.config.num_labels:
+            self.top_k = self.model.config.num_labels


The only issue I'm seeing with this method is that it has side effects on the pipeline state, which is surprising in some cases (e.g., on __call__).

For example here:

>>> pp = pipeline("text-generation", max_length=2)

You would expect all generations coming from this pipeline to have a max_length of 2. And this works well.

When using the same pipeline for inference, with a max_length override, this too works as one would expect:

>>> pp(["ok", "nice"], max_length=8) Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. [[{'generated_text': 'ok\n\nAce is my friend'}], [{'generated_text': "nice will always tell you, 'I"}]

What is unexpected is that now the default max length changed to 8; when this should IMO be an override and not set:

>>> pp(["ok", "nice"]) Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. [[{'generated_text': 'ok and Aarhus in Copenhagen.'}], [{'generated_text': 'nice for a while at first. I'}]

You are entirely right ! I think we need to change that, and hopefully this will also help improve the cleanliness there.

Thanks I think pointing that out gave me an idea on how to dramatically improve this.

LysandreJik · 2021-09-03T09:34:05Z

src/transformers/pipelines/base.py

+        return model_outputs
+
+    def get_iterator(self, inputs, num_workers: int):
+        os.environ["TOKENIZERS_PARALLELISM"] = "false"


I think unsetting it sometime later is fine - if it's not obvious how to do it in a clean way I would ignore my comment and just let it be :)

src/transformers/pipelines/base.py

LysandreJik · 2021-09-03T09:37:08Z

src/transformers/pipelines/base.py

+        dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=1, collate_fn=collate_fn)
+        model_iterator = PipelineIterator(dataloader, self.infer_forward)
+        final_iterator = PipelineIterator(model_iterator, self.postprocess)
+        return final_iterator


This could also return list(final_iterator) to return the computed values

That would defeat the principle of having an iterator (as calling list for force to generate all outputs first leading to RAM issues on very large datasets at the very least).
It would also be great to have infinite generators be supported (like waiting on a queue that would be great in an API :))

The idea is to return something you can iterate on, on an item basis.

for item in iterator

Ah, my bad :)

LysandreJik · 2021-09-03T09:37:53Z

src/transformers/pipelines/base.py

+                final_iterator = self.get_iterator(inputs, num_workers)
+                outputs = [output for output in final_iterator]
+                return outputs


With above comment this would be simplified to return self.get_iterator(inputs, num_workers) (with a potential name change to the get_iterator method)

patrickvonplaten · 2021-09-07T17:15:14Z

src/transformers/pipelines/base.py

-            inputs: dict holding all the keyword arguments for required by the model forward method.
-            return_tensors: Whether to return native framework (pt/tf) tensors rather than numpy array
+    @abstractmethod
+    def preprocess(self, input_: Any, **preprocess_parameters: Dict) -> Dict[str, GenericTensor]:


I would make this method private as well -> why should we keep it public if forward shouldn't be called?

forward should be called. It's just that forward is defined in base vs subclasses, just because of the ensure_on_device + no_grad wrappers that are important.

_forward on the other hand does not contain those guards.

preprocess > forward > postprocess : valid and recommended
preprocess > _forward > postprocess : valid but slower (no grad guards) + no GPU out of the box. (Important to keep readability within pipeline to a max.)

patrickvonplaten · 2021-09-07T17:20:07Z

src/transformers/pipelines/base.py

+        return self._ensure_tensor_on_device(inputs)
+
+    def _ensure_tensor_on_device(self, inputs):
+        if isinstance(inputs, dict):


why not:

Suggested change

if isinstance(inputs, dict):

if isinstance(inputs, dict) or isinstance(inputs, list):

return self.ensure_tensor_on_cpu(inputs)

?

why do dict and list always have to be on CPU? Why is that?
Also do we really need that function? It's not really intuitive for me that ensure_tensor_on_device puts all lists and dicts on CPU and only tensors to self.device.

Couldn't we just have a:

if isinstance(inputs, dict) or isinstance(inputs, list): return self.ensure_tensor_on_cpu(inputs) elif isinstance(inputs, torch.Tensor): return inputs.to(self.device) else: return inputs

instead of having this function at all?

Bad copy&past, it should just be recursive.

src/transformers/pipelines/base.py

patrickvonplaten · 2021-09-07T17:25:05Z

src/transformers/pipelines/base.py

+        return model_outputs
+
+    def get_iterator(self, inputs, num_workers: int, preprocess_params, forward_params, postprocess_params):
+        if "TOKENIZERS_PARALLELISM" not in os.environ:


maybe add a else that if it's true there could be problems with dataloader multithreading?

It's it's already set, and problems do occur, tokenizers will already yell. Definitely could yell too...

patrickvonplaten · 2021-09-07T17:31:49Z

src/transformers/pipelines/base.py

+        preprocess_params, forward_params, postprocess_params = self._sanitize_parameters(**kwargs)
+
+        # Fuse __init__ params and __call__ params without modifying the __init__ ones.
+        preprocess_params = {**self._preprocess_params, **preprocess_params}


this means that the call params would everwrite the init params no? Think this is good! Does it maybe make sense to add a warning though when this happens? E.g. something like:

if self._preprocess_params.keys() & preprocess_params.keys(): logger.warning(f"The parameters {self._preprocess_params.keys() & preprocess_params.keys()} have been overwritten by the passed parameters")

Yes, it will (but it will only locally override, contrary to the previous implementation).

pipe = pipeline("text-generation", max_length=20) pipe("Something") # Using max_length=20 pipe("else", max_length=100) # using max_length=100 pipe("again") # using max_length=20

I am not a big fan of high verbosity personally but it definitely could be added.
IMO it's expected behavior, so no reason to yell.

patrickvonplaten · 2021-09-07T17:47:07Z

src/transformers/pipelines/conversational.py

+
+        if "max_length" in generate_kwargs:
+            forward_params["max_length"] = generate_kwargs["max_length"]
+            # self.max_length = generate_kwargs.get("max_length", self.model.config.max_length)


delete the comment?

patrickvonplaten · 2021-09-07T18:02:59Z

tests/test_pipelines_fill_mask.py

@@ -186,7 +186,7 @@ def run_pipeline_test(self, model, tokenizer, feature_extractor):
            ],
        )

-        outputs = fill_masker([f"This is a {tokenizer.mask_token}", f"Another {tokenizer.mask_token}"])
+        outputs = fill_masker([f"This is a {tokenizer.mask_token}", f"Another {tokenizer.mask_token} great test."])


why is this added here?

But the test only checks for type so should be fine

Because the length was too small I guess.

I don't like having this change in there though.

patrickvonplaten

Amazing work @Narsil ! I really like that all pipelines have to obey the "preprocess", "forward", "postprocess" design now!

I think we should also update the docs now to showcase how to use an iterator pipeline in practice no?

LysandreJik

This looks great! LGTM.

Feel free to merge when ready.

LysandreJik · 2021-09-08T16:16:17Z

src/transformers/pipelines/base.py

-            return predictions
+                    model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
+                    model_outputs = self._forward(model_inputs, **forward_params)
+                    model_inputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))


Unnecessary variable init

Suggested change

model_inputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))

self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))

I changed to model_outputs and I would like to keep the code as mutable free as possible even though the variables are actually mutated to it is redundant to reaffect the variable:

https://stackoverflow.com/questions/59560043/what-is-the-difference-between-model-todevice-and-model-model-todevice

LysandreJik · 2021-09-08T16:19:04Z

src/transformers/pipelines/base.py

+        dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=1, collate_fn=collate_fn)
+        model_iterator = PipelineIterator(dataloader, self.infer_forward)
+        final_iterator = PipelineIterator(model_iterator, self.postprocess)
+        return final_iterator


Ah, my bad :)

LysandreJik · 2021-09-08T16:19:53Z

src/transformers/pipelines/conversational.py

+        max_length = self.max_length
+        if self.framework == "pt":
+            model_inputs = self.ensure_tensor_on_device(**model_inputs)
+        n = model_inputs["input_ids"].shape[1]


sgugger

Nice doc additions!

docs/source/add_new_pipeline.rst

LysandreJik

Thank you for adding a very welcome documentation!

docs/source/add_new_pipeline.rst

LysandreJik · 2021-09-09T17:39:10Z

docs/source/add_new_pipeline.rst

+:obj:`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
+time `pipeline(...., maybe_arg=4)` or at call time `pipe = pipeline(...); output = pipe(...., maybe_arg=4)`.
+
+The returns of `_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to :obj:`preprocess`,
+:obj:`_forward` and :obj:`postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
+allows to keep the default arguments in the function definition which is always more "natural".
+
+Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
+without requiring users to understand new kind of objects. It's also relatively common to support many different types
+of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)


I don't exactly understand how this is supposed to be done for _sznitize_parameters - is there a pipeline that could be linked to show an example?

I added an example on how to add a default parameters + edit _sanitize_parameters.

docs/source/main_classes/pipelines.rst

Enabling dataset iteration on pipelines. Unifying parameters under `set_parameters` function. Small fix. Last fixes after rebase Remove print. Fixing text2text `generate_kwargs` No more `self.max_length`. Fixing tf only conversational. Consistency in start/stop index over TF/PT. Speeding up drastically on TF (nasty bug where max_length would increase a ton.) Adding test for support for non fast tokenizers. Fixign GPU usage on zero-shot. Fix working on Tf. Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Small cleanup. Remove all asserts + simple format.

Instead of overriding implicitly internal state, we moved to real named arguments on every `preprocess`, `_forward`, `postprocess` function. Instead `_sanitize_parameters` will be used to split all kwargs of both __init__ and __call__ into the 3 kinds of named parameters.

Narsil · 2021-09-10T12:33:34Z

docs/source/add_new_pipeline.rst

+A classic example would be a :obj:`top_k` argument in the post processing in classification tasks.
+
+.. code-block::
+
+    >>> pipe = pipeline("my-new-task")
+    >>> pipe("This is a test")
+    [{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
+    {"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
+
+    >>> pipe("This is a test", top_k=2)
+    [{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
+
+In order to achieve that, we'll update our :obj:`postprocess` method with a default parameter to :obj:`5`. and edit
+:obj:`_sanitize_parameters` to allow this new parameter.
+
+
+.. code-block::
+
+
+        def postprocess(self, model_outputs, top_k=5)
+            best_class = model_outputs["logits"].softmax(-1)
+            # Add logic to handle top_k
+            return best_class
+
+        def _sanitize_parameters(self, **kwargs)
+            preprocess_kwargs = {}
+            if "maybe_arg" in kwargs:
+                preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
+
+            postprocess_kwargs = {}
+            if "top_k" in kwargs:
+                preprocess_kwargs["top_k"] = kwargs["top_k"]
+            return preprocess_kwargs, {}, postprocess_kwargs


@LysandreJik

xegulon · 2021-09-22T09:35:46Z

What about integrating this idea into this rework?

Narsil · 2021-09-22T11:48:34Z

@xegulon It would be a great addition, we already have similar functionnality within HF.

The code is not open source just because it's messy and wouldn't fit transformers requirements (backward compatiblity and maintaining this is out of scope in our opinion) but we do reuse most tools that we provide (like export_onnx), so it's mostly plumbing.

If we can find something clean enough, it's probable it would be a welcome addition.

Few caveats to mention:

Using ONNX in fully optimized mode makes it hardware dependent (you HAVE to run on similar hardware as where the optimized file was created).
Using quantization might lead to performance drop (but also huge speedup).
Using ONNX with fancy methods like generate is much harder to do to keep performance (you have to take care of past_key_values).
Using ONNX with generate and running on GPU is actually counterproductive because we can't run the beam search directly on GPU tensors (that's an ORT limitation). So there's a lot of back&forth between GPU and CPU which is bad for performance. (We also tried the beam_search proposed by ORT but didn't find it was worth it as implementation differs significantly from transformers.)

With those caveats in mind, feel free to add a PR, it would be a welcome addition if we manage to make it readable and orthogonal (the new refactor should help for sure).
Try to make the PR small and more like PoC so everyone could weigh in in terms of design (most notably transformers core maintainers)

xloem · 2021-09-28T12:56:29Z

Hey, it's really great to see work on general code organisation to any degree. Thanks for your work.

It looks like this PR introduced a bug around completing empty prompts:

transformers.pipeline('text-generation')('')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 150, in __call__
    return super().__call__(text_inputs, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 915, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 922, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 871, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 162, in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, **generate_kwargs)  # BS x SL
  File "/home/user/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/generation_utils.py", line 1016, in generate
    return self.sample(
  File "/home/user/.local/lib/python3.8/site-packages/transformers/generation_utils.py", line 1529, in sample
    outputs = self(
  File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 949, in forward
    transformer_outputs = self.transformer(
  File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 673, in forward
    input_ids = input_ids.view(-1, input_shape[-1])
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

Hecim1984 · 2021-09-28T16:08:13Z

🤑💪🏻

ucas010 · 2023-01-28T06:53:06Z

how to use my own dataset, that is txt file ,per line is the input for NER model
could you pls help me ?

xloem · 2023-01-28T07:29:45Z

how to use my own dataset, that is txt file ,per line is the input for NER model
could you pls help me ?

the example scripts want itin jsonlines or csv. https://huggingface.co/docs/transformers/run_scripts#use-a-custom-dataset . you can use a tool to convert to jsonlines. it takes some patience to figure out a way to do each step, and then it works.

Narsil commented Aug 27, 2021

View reviewed changes

Narsil requested a review from LysandreJik August 30, 2021 06:50

Narsil mentioned this pull request Aug 30, 2021

Add support for T5 models in Zero Shot Classification pipeline #12835

Closed

5 tasks

sgugger approved these changes Aug 31, 2021

View reviewed changes

Narsil force-pushed the iterable_pipelines branch 2 times, most recently from cd6e659 to 11e832a Compare September 1, 2021 12:29

LysandreJik reviewed Sep 2, 2021

View reviewed changes

LysandreJik requested review from NielsRogge, patil-suraj and patrickvonplaten September 2, 2021 07:30

Narsil force-pushed the iterable_pipelines branch from 54f2eb0 to 81cbbbd Compare September 2, 2021 09:19

Narsil mentioned this pull request Sep 2, 2021

Zero-shot classification pipeline truncation support #13381

Closed

LysandreJik reviewed Sep 3, 2021

View reviewed changes

patrickvonplaten reviewed Sep 7, 2021

View reviewed changes

src/transformers/pipelines/base.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Sep 7, 2021

View reviewed changes

patrickvonplaten approved these changes Sep 7, 2021

View reviewed changes

Narsil mentioned this pull request Sep 8, 2021

Add guide on how to add a new task huggingface/huggingface_hub#325

Merged

LysandreJik approved these changes Sep 8, 2021

View reviewed changes

sgugger approved these changes Sep 9, 2021

View reviewed changes

docs/source/add_new_pipeline.rst Outdated Show resolved Hide resolved

docs/source/add_new_pipeline.rst Outdated Show resolved Hide resolved

docs/source/add_new_pipeline.rst Outdated Show resolved Hide resolved

LysandreJik approved these changes Sep 9, 2021

View reviewed changes

Narsil added 5 commits September 10, 2021 10:43

Fixing audio-classification for large PR.

dc74add

Overly explicity null checking.

3fc7771

Encapsulating GPU/CPU pytorch manipulation directly within base.py.

d2a00ea

Narsil added 13 commits September 10, 2021 10:43

Last fix.

64ce4e9

Small cleanup of tensor moving.

fa4e499

is not None.

be7eb5c

Adding a bunch of docs + a iteration test.

e9be7d8

Fixing doc style.

cf176ef

KeyDataset = None guard.

3d71679

RRemoving the Cuda test for pipelines (was testing).

3646f96

Even more simple iteration test.

f22f76a

Correct import .

60ecef2

Long day.

5c65546

Fixes in docs.

e980560

[WIP] migrating object detection.

b04d5d3

Fixed the target_size bug.

63e6ed5

Narsil force-pushed the iterable_pipelines branch from 0a00f60 to 63e6ed5 Compare September 10, 2021 09:32

Narsil added 3 commits September 10, 2021 11:39

Fixup.

1d19742

Bad variable name.

1e1ba5d

Fixing ensure_on_device respects original ModelOutput.

fd5e2fc

Narsil commented Sep 10, 2021

View reviewed changes

Narsil merged commit c63fcab into huggingface:master Sep 10, 2021

KoichiYasuoka mentioned this pull request Sep 13, 2021

InferenceApi for TokenClassification does not work huggingface/huggingface_hub#344

Closed

xloem mentioned this pull request Sep 28, 2021

Empty prompts failing in dev sources #13774

Closed

thigm85 mentioned this pull request Oct 1, 2021

Create a convenience function to feed df to a Vespa app vespa-engine/pyvespa#213

Merged

tscholak mentioned this pull request Oct 14, 2021

Inference on new databases ServiceNow/picard#3

Closed

Narsil deleted the iterable_pipelines branch January 28, 2023 09:49

	if isinstance(inputs, dict):
	if isinstance(inputs, dict) or isinstance(inputs, list):
	return self.ensure_tensor_on_cpu(inputs)

	model_inputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))
	self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))

[Large PR] Entire rework of pipelines. #13308

[Large PR] Entire rework of pipelines. #13308

Conversation

Narsil commented Aug 27, 2021 • edited

What does this PR do?

GPU pipeline

Consistency of pipelines

Parameter handling

Results

Backward compatibility

Tensorflow

Before submitting

Who can review?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xegulon commented Sep 22, 2021

Narsil commented Sep 22, 2021

xloem commented Sep 28, 2021 • edited

Hecim1984 commented Sep 28, 2021

ucas010 commented Jan 28, 2023

Narsil commented Aug 27, 2021 •

edited

xloem commented Sep 28, 2021 •

edited