Update validation for NLP tasks #59

osanseviero · 2021-05-28T14:31:55Z

Summary of changes:

Aligned max length with max_new_tokens in docs.
Changed max_length to max_new_tokens for text-generation, which I think would be the right param.
Added return_full_text and num_return_sequences for text-generation
Added summarization, table-question-answering, and text2text-generation validation.
Fixed conversational. It was using params instead of inputs, and the inputs are others.

Let me know if you disagree with any and I can revert :)

Narsil

I'm fine with enabling max_new_tokens by default instead of max_length however, I think that it should get its own validation class.

For conversation, we were missing the input validation for sure, but the parameters still do exist (but they are NOT the same as text anymore though, return_num_sequences is impossible for instance, at least in transformers.)

I know it's cumbersome to write many repetitive tests, but overall I think it works better than parametrizing, as parametrization leaves more room for silently failing tests, and reading tests when they actually are failing is sometimes hard to untangle and update (especially when the tests you want to modify need to significantly differ from what they are tied to).

ofc all this is a taste matter, so consider the remarks about testing as NITs

api-inference-community/tests/test_nlp.py

api-inference-community/api_inference_community/validation.py

osanseviero · 2021-05-28T15:43:39Z

For conversation, I don't see any params in the docs. I'm trying to keep them in sync. As far as I understand, all of these seq2seq models call a generate function in Transformers, so I'm not sure which would be the eligible params for conversation.

Narsil · 2021-05-28T15:51:25Z

top_p,top_k, repetition_penalty, max_time etc... are valid.

You are correct, it's using generate, it's either full decoder (DialoGPT-large) or encoder-decoder (Blenderbot).

The only reason it's not entirely the text-generation parameters, it's because of the return type which is a dictionary and so num_return_sequences would require returning a list.
We definitely can add them to the docs.

osanseviero · 2021-05-28T17:02:50Z

Updates

Changed to a shared params class for the generation ones with specific params in subclasses
Reverted max_tokens to be consistent with the 3 types.
Added param validation for conversational.
De-parameterized some of the tests
Added validation for max_length > min_length.

I'll send another PR for updating docs

Narsil

LGTM

Add remaining tasks

eebc855

osanseviero requested a review from Narsil May 28, 2021 14:32

Narsil requested changes May 28, 2021

View reviewed changes

api-inference-community/tests/test_nlp.py Outdated Show resolved Hide resolved

api-inference-community/api_inference_community/validation.py Outdated Show resolved Hide resolved

Change validation

4c94930

Resolve reviewer notes

32180bf

osanseviero requested a review from Narsil May 28, 2021 16:37

Narsil approved these changes May 28, 2021

View reviewed changes

osanseviero merged commit 165ebbe into main May 28, 2021

osanseviero deleted the typings branch May 28, 2021 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update validation for NLP tasks #59

Update validation for NLP tasks #59

osanseviero commented May 28, 2021 •

edited

Loading

Narsil left a comment

osanseviero commented May 28, 2021 •

edited

Loading

Narsil commented May 28, 2021

osanseviero commented May 28, 2021

Narsil left a comment

Update validation for NLP tasks #59

Update validation for NLP tasks #59

Conversation

osanseviero commented May 28, 2021 • edited Loading

Narsil left a comment

Choose a reason for hiding this comment

osanseviero commented May 28, 2021 • edited Loading

Narsil commented May 28, 2021

osanseviero commented May 28, 2021

Narsil left a comment

Choose a reason for hiding this comment

osanseviero commented May 28, 2021 •

edited

Loading

osanseviero commented May 28, 2021 •

edited

Loading