Default truncation to second
for text similarity
#713
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NLP models have 3 truncation settings:
FIRST
,SECOND
andNONE
FIRST
means truncate the first input. In most cases there is only 1 input (e.g for text embeddings) so this is a sensible default.SECOND
means truncate the second input. Task types with 2 inputs are extractive question answering where the question is one input and the context the other. Text Similarity takes has 2 inputs.NONE
means don't truncate and window the input.For text similarity the first input is usually the shorter input, for example it might be the query text in a rerank operation. In this situation it is better to truncate the second input. This change makes that the default.