-
Notifications
You must be signed in to change notification settings - Fork 188
Description
Is your feature request related to a problem?
Yes! I just had a discussion with @ylwu-amzn where we were discussing how documents are embedded. I (and many others I have talked to) were under the impression that when you send a document larger than the token input of a model there was something like pooling going on under the hood. This seems to not be the case however and documents larger than the token limit are simply truncated.
What solution would you like?
There should be a flag to enable/disable document truncation. Transparently truncating data as it's being embedded has catastrophic consequences. Documents that are over the limit may simply never be returned depending on where the document was truncated.
This should probably be configurable via the ML commons settings. We may also want to enable pooling as an alternative. Eg:
PUT _cluster/settings
{
"persistent": {
"plugins": {
"ml_commons": {
"embedding_auto_truncation": "true",
"embedding_pooling": "false"
}
}
}
}
What alternatives have you considered?
I am not sure how else we can protect people from the misunderstanding that embedding models have a maximum input token length.
Do you have any additional context?
Add any other context or screenshots about the feature request here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status