This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
having multiple vectors per document which can be searched in the same knn operation #221
Labels
question
Further information is requested
Transformer models are limited to 512 tokens but may provide high quality embeddings for semantic search compared to classical word embeddings.
For long documents (over 512 tokens), it's usual to split them in blocks < 512 tokens and work at the level of a single block.
My use case is to perform a semantic search across those long documents and find the most semantically related one.
In the current implementation of KNN in open distro, we can provide several vectors per document but :
I have thought to 2 workarounds:
Is there another way to manage long documents ?
The text was updated successfully, but these errors were encountered: