Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

About the sparse search result compare with lucene searcher ? #34

Open
svjack opened this issue Apr 1, 2021 · 0 comments
Open

About the sparse search result compare with lucene searcher ? #34

svjack opened this issue Apr 1, 2021 · 0 comments

Comments

@svjack
Copy link

svjack commented Apr 1, 2021

Thanks to provide this convenient toolkit
I retrieve bm25 and tfidf sparse vector from lucene indexer (provide by pyserini)
and use this project to generate sparse indexer to search.
i find that these indexer can not beat original lucene search results.
(this problem seems not have much effect on tiny datasets or semantic disperse datasets,
but with the dataset become larger, the shortcomings seems can not be omitted which is the situation to use this project.)

This is not the problem of your clustering search algorithm. But the sparse feature itself.
And if i use SVD to decrease the dimension of sparse data, it can only maintain topic level feature.
So i don’t understand the truly usage of sparse feature except calculate some search scores(like bm25)
Because they seems weak than truly lexicon based score (bm25) and dense semantic similarity based on BERT
sentence embedding (like Sentence-Transformers)

Can you provide some truly awesome text sparse feature construction reference materials that can use this project in
a suitable way ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant