Not accurate with long sentences #13

pradeepdev-1995 · 2020-08-24T05:47:37Z

The txtai library performs less accurately when the given input matching texts are too long.

davidmezzetti · 2020-08-24T12:42:08Z

Thank you for using txtai and trying it out on your data. It is a common with similarity search to run into issues with accuracy when there is large variability in the length of content. Without knowing the exact data you're working with, here are some general ideas to try:

Try different Sentence Transformers models: https://huggingface.co/models?search=sentence-transformers
- For example, try bert-base-nli-stsb-mean-tokens
Train a custom Sentence Transformers model against your data: https://github.com/UKPLab/sentence-transformers#model-training-from-scratch
Try word embeddings vs transformer models. Word embedding models have different ways to average the embeddings together, such as BM25. BM25 factors in the length of the content as part of it's scoring algorithm.

davidmezzetti closed this as completed Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not accurate with long sentences #13

Not accurate with long sentences #13

pradeepdev-1995 commented Aug 24, 2020

davidmezzetti commented Aug 24, 2020

Not accurate with long sentences #13

Not accurate with long sentences #13

Comments

pradeepdev-1995 commented Aug 24, 2020

davidmezzetti commented Aug 24, 2020