search(q, index="dense") returns values with high scores even if there should not be a match

I am using product titles as my corpus. I embed them using:

```python
    embeddings = txtai.Embeddings(
        defaults=False,
        normalize=True,
        indexes={
            "keyword": {
                "keyword": True
            },
            "dense": {
                "path": "NetherlandsForensicInstitute/robbert-2022-dutch-sentence-transformers"
            }
        }
    )
```

Then index as normal. 

When I do a search for words which are 100% unrelated to the corpus. The `dense` index almost always returns products with scores between `0.10` and `0.35`. Sometimes `0.50`. But a fully correct and matching products would get `0.60`.

Are where more finetune methods to dive deeper and find if the match is making sense?

I also use the BM25 matcher and weight the results, but it's not ideal, as the dense index gives a very high score to bad results.









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

search(q, index="dense") returns values with high scores even if there should not be a match #908

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

search(q, index="dense") returns values with high scores even if there should not be a match #908

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions