-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bm25 and cross-language searching #260
Comments
I'll try to answer this given that I opened the original BM25/BM25+ pull request for MiniSearch. MiniSearch searches in approximately two stages: matching and ranking. (This is a bit of a simplification; for this explanation I will ignore features like filtering and boosting). The first step is matching. MiniSearch implements a fuzzy search algorithm that looks for words that are textually similar to the words in the query. All documents that match the query in some way are collected. The second step is ranking. The goal is to show the matching documents in order of relevance; which documents match best? BM25 and BM25+ are ranking algorithms. They do not generate search results, they only (re-)order them. Cross-language searching (finding "bike" when you search for "vélo") needs to happen during the matching phase. Unless you provide your own translations, this is not something MiniSearch can do. MiniSearch provides fuzzy text-based matching, but the strings "bike" and "vélo" are not similar and will not match. You could:
|
I do not have much else to add to @rolftimmermans 's great answer. @imdoge can I close the issue, or do you have further questions? |
No further questions, thank you for the answer. |
I noticed that MiniSearch has implemented a JavaScript version of BM25. I'm wondering why MiniSearch does not support cross-language searching. Recently, I have been using Python to debug RAG-related applications, such as llamaIndex and LangChain. These libraries' BM25 searches can perform cross-language searching.
However, I am looking for a JavaScript version of BM25 search and found MiniSearch, which is an excellent library, but it doesn't support cross-language searching. Could you explain why this is the case?
P.S For example: If the data is "bike," searching for "vélo."
thanks~
The text was updated successfully, but these errors were encountered: