Text Classification in Hosted Inference API with Multiple Inputs #310

merveenoyan · 2022-01-02T10:21:19Z

Some of the text classification tasks take two inputs just like similarity models, but since the problem is text classification, widget takes only one text, which confuses users on how to pass their text to model in hosted inference API. See this question in the forum.
Couple of other example models: Cross Encoder QNLI even though these models are based on similarity, they usually return entailment/not entailment
Cross Encoder MSMARCO Passage Ranking

Solution is letting user input as many text as they want (like similarity widget) yet keeping the class labels without having an additional pipeline. A similar widget is zero shot classification but it takes possible class names and not multiple text input.

maybe this is relevant for cc: @osanseviero

Narsil · 2022-01-03T13:49:49Z

pipeline-wise, it seems that this task is more like pair-wise classification, NOT pure classification.

zero-shot uses under the hood entailment to do the job, but it does not seems like the pipeline will be reusable (since 1 sentence, comma-separated labels -> classification outputs) is not really what is desired here (couple of sentences -> Entailment/not entailment/neutral classification).

Couple of notes/questions:

How many models require this to be showcased properly ? (Or how popular are they too?, just to gauge a sense of variability in what would be a nice showcase for those models).
If we send more than two sentences, what is the expected output ?
Maybe we need to keep in mind backward compatiliby vs Zero-shot which I think is the default pipeline for these models (mnli variants at least).
There is a way to make zero-shot work on exactly the same input with hypothesis parameter and by enabling multi_label. (Definitely not ideal, just food for though)

merveenoyan · 2022-01-03T17:54:17Z

Hello @Narsil 🤗
Yes, it's just there's this family of tasks in text classification called GLUE and it's way too general (reason why we needed zero shot classification, I guess). For QNLI ones we can simply change the name of the input texts in the widget imo, it only takes a question and a context. There's another task called QQP that assesses if one question is a paraphrase of another, takes two separate inputs. Another one is MRPC which again takes two texts and assesses if one is a paraphrase of another. I don't know what was the use case of the user in the forum but probably one of these.
TLDR; some of the GLUE tasks take one input (which is covered), some take two inputs (not covered) and others are covered under zero shot. The similarity based ones are not text classification technically (the MSMARCO I put above) so it's okay if we don't cover it for now imo, I think it's because the way these similarity models work (they're not actually classification models).
The ones that take two inputs can be solved with same pipeline but we should change the name of the input according to task itself, somehow.

Narsil · 2022-01-04T08:54:14Z

I see !

Right now I don't see anything around having a new pipeline for processing texts two at a time in a classification manner (text-classification cannot be realistically extended, since pipe(["text1", "text2"]) is already defined and just means classify two texts). I am also not in favor of mixing argument types all the time.

sentence-similarity fits the bill perfectly and already exists as a widget, @osanseviero wdyt, should we add it to transformers ? IMHO it seems like the best course of action.

osanseviero · 2022-01-04T09:22:50Z

sentence-similarity fits the bill perfectly and already exists as a widget, @osanseviero wdyt, should we add it to transformers ? IMHO it seems like the best course of action.

Just as a note, I see many models are using Crossencoder which is a sentence-transformers class, so maybe we should also consider moving some models to sentence-transformers. E.g.

Anyways, I think sentence-similarity name is a bit strange for this case, no? I think the pipeline should have a different name although we can reuse the widget. My main question is if the inference of all these models that are currently not supported consistent. That is, if all models expect the two inputs in the same way.

merveenoyan · 2022-01-05T11:38:08Z

@osanseviero QQP and MRPC models answer if one sentence is a paraphrase of another so they're irrelevant with sentence-similarity yet take two inputs (they're in GLUE).

merveenoyan · 2022-02-24T10:40:29Z

It was raised by another user in this forum question. Two sentences is just fine but with more sentences it might be problematic.

adrinjalali · 2022-03-16T13:52:03Z

@osanseviero wanna move this to https://github.com/huggingface/hub-docs?

osanseviero · 2022-03-16T14:45:24Z

Yes please, I don't have settings access in that repo unfortunately

adrinjalali · 2022-03-16T15:06:41Z

@osanseviero that should be fixed now.

osanseviero · 2023-11-21T21:55:12Z

Hi all! I'll close this issue as we have not received more requests for this and there are no new models, as far as I know, for this use case. The user in the forum worked around it by creating their own Pipeline class.

adrinjalali transferred this issue from huggingface/huggingface_hub Mar 16, 2022

osanseviero transferred this issue from huggingface/hub-docs Nov 21, 2023

osanseviero closed this as completed Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Classification in Hosted Inference API with Multiple Inputs #310

Text Classification in Hosted Inference API with Multiple Inputs #310

merveenoyan commented Jan 2, 2022 •

edited

Narsil commented Jan 3, 2022

merveenoyan commented Jan 3, 2022 •

edited

Narsil commented Jan 4, 2022

osanseviero commented Jan 4, 2022

merveenoyan commented Jan 5, 2022

merveenoyan commented Feb 24, 2022

adrinjalali commented Mar 16, 2022

osanseviero commented Mar 16, 2022

adrinjalali commented Mar 16, 2022

osanseviero commented Nov 21, 2023

Text Classification in Hosted Inference API with Multiple Inputs #310

Text Classification in Hosted Inference API with Multiple Inputs #310

Comments

merveenoyan commented Jan 2, 2022 • edited

Narsil commented Jan 3, 2022

merveenoyan commented Jan 3, 2022 • edited

Narsil commented Jan 4, 2022

osanseviero commented Jan 4, 2022

merveenoyan commented Jan 5, 2022

merveenoyan commented Feb 24, 2022

adrinjalali commented Mar 16, 2022

osanseviero commented Mar 16, 2022

adrinjalali commented Mar 16, 2022

osanseviero commented Nov 21, 2023

merveenoyan commented Jan 2, 2022 •

edited

merveenoyan commented Jan 3, 2022 •

edited