New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text Classification in Hosted Inference API with Multiple Inputs #310
Comments
pipeline-wise, it seems that this task is more like pair-wise classification, NOT pure classification. zero-shot uses under the hood entailment to do the job, but it does not seems like the pipeline will be reusable (since 1 sentence, comma-separated labels -> classification outputs) is not really what is desired here (couple of sentences -> Entailment/not entailment/neutral classification). Couple of notes/questions:
|
Hello @Narsil 🤗 |
I see ! Right now I don't see anything around having a new pipeline for processing texts two at a time in a classification manner (
|
Just as a note, I see many models are using Crossencoder which is a
Anyways, I think |
@osanseviero QQP and MRPC models answer if one sentence is a paraphrase of another so they're irrelevant with |
It was raised by another user in this forum question. Two sentences is just fine but with more sentences it might be problematic. |
@osanseviero wanna move this to https://github.com/huggingface/hub-docs? |
Yes please, I don't have settings access in that repo unfortunately |
@osanseviero that should be fixed now. |
Hi all! I'll close this issue as we have not received more requests for this and there are no new models, as far as I know, for this use case. The user in the forum worked around it by creating their own |
Some of the text classification tasks take two inputs just like similarity models, but since the problem is text classification, widget takes only one text, which confuses users on how to pass their text to model in hosted inference API. See this question in the forum.
Couple of other example models: Cross Encoder QNLI even though these models are based on similarity, they usually return entailment/not entailment
Cross Encoder MSMARCO Passage Ranking
Solution is letting user input as many text as they want (like similarity widget) yet keeping the class labels without having an additional pipeline. A similar widget is zero shot classification but it takes possible class names and not multiple text input.
maybe this is relevant for cc: @osanseviero
The text was updated successfully, but these errors were encountered: