Default truncation to `second` for text similarity #713

davidkyle · 2024-07-25T11:02:49Z

NLP models have 3 truncation settings: FIRST, SECOND and NONE

FIRST means truncate the first input. In most cases there is only 1 input (e.g for text embeddings) so this is a sensible default.
SECOND means truncate the second input. Task types with 2 inputs are extractive question answering where the question is one input and the context the other. Text Similarity takes has 2 inputs.
NONE means don't truncate and window the input.

For text similarity the first input is usually the shorter input, for example it might be the query text in a rerank operation. In this situation it is better to truncate the second input. This change makes that the default.

pquentin

Thanks! LGTM. Just need to remove the extra print.

eland/ml/pytorch/transformers.py

Co-authored-by: Quentin Pradet <quentin.pradet@gmail.com>

Default truncation to second for text similarity

012988f

davidkyle added bug Something isn't working topic:NLP Issue or PR about NLP model support and eland_import_hub_model labels Jul 25, 2024

davidkyle requested a review from pquentin July 26, 2024 10:13

pquentin approved these changes Jul 30, 2024

View reviewed changes

eland/ml/pytorch/transformers.py Outdated Show resolved Hide resolved

Update eland/ml/pytorch/transformers.py

adb2601

Co-authored-by: Quentin Pradet <quentin.pradet@gmail.com>

davidkyle merged commit fd8886d into elastic:main Aug 5, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default truncation to `second` for text similarity #713

Default truncation to `second` for text similarity #713

davidkyle commented Jul 25, 2024 •

edited

Loading

pquentin left a comment

Default truncation to second for text similarity #713

Default truncation to second for text similarity #713

Conversation

davidkyle commented Jul 25, 2024 • edited Loading

pquentin left a comment

Choose a reason for hiding this comment

Default truncation to `second` for text similarity #713

Default truncation to `second` for text similarity #713

davidkyle commented Jul 25, 2024 •

edited

Loading