-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue to track v5.0.0 beta feedback #142
Comments
Here is one case that might need tweaking before releasing Here is a spec reproducing the issue: it('ranks prefix matches higher than fuzzy matches with the same distance, all else being equal', () => {
const ms = new MiniSearch({ fields: ['text'] })
const documents = [
{ id: 1, text: 'unicorns' },
{ id: 2, text: 'unikorn' }
]
ms.addAll(documents)
expect(ms.documentCount).toEqual(documents.length)
const results = ms.search('unicorn', { fuzzy: 0.2, prefix: true })
expect(results.map(({ id }) => id)).toEqual([1, 2])
}) Also, this is mentioned by at least one user in this issue report (see the comment about order of results in the first message). /cc @rolftimmermans |
The issue above is addressed in #146 by slightly tuning the default weights (in particular, slightly lowering the weight for fuzzy matches to pass all existing test cases plus the new one), but would benefit from more testing with different collections of documents. |
Decision about the issue above was to keep the weights unchanged, the relevant discussion is in #146 |
5.x might be the right time to add an export map? something like this?
https://nodejs.org/api/packages.html#package-entry-points if you want more info or are interested but need help let me know 🤗 I could prepare a PR 👍 |
@daKmoR that does sound very interesting. I think it should be relatively easy to do that for MiniSearch, as the public API is well defined. At the moment, the two things that are publicly exported are PRs would be very welcome, especially when the functionality doesn’t break existing use cases, which seems to be the case here (apart from those two entry points there is not much else that can be imported by users). |
Is this project using semvar? |
Hi @joyously , From version 3 to version 4 the breaking changes are only to the serialization format: if you don’t cache/load the index with |
Ah, of course, I didn't even see the change log. Perhaps you could reference it in the readme if that's the only place these important topics are mentioned. |
That makes sense, I will do so |
Done ✅ |
Version If all is well, I plan to release |
I haven't looked(not sure how), but I was wondering about memory usage. |
@joyously MiniSearch keeps the index in memory, so a huge index could in theory exhaust memory. I am not aware of way to recover from a out-of-memory error from within a library or app. This is not specific to MiniSearch, but to any library. That said, a couple of considerations:
In sum, MiniSearch keeps data in memory, but doesn’t duplicate it, and tries to keep its own data structures compact. That said, if the original collection of documents doesn’t fit in memory, the index will likely not fit either. I routinely use MiniSearch in the browser with collections of tens of thousands of small documents, or thousands of bigger ones. Use cases that cannot fit in memory will necessary need a search server. |
Wouldn't it work to bracket the addition to the index with a
So my code needs to have a separate function to load the documents and return the index, so that the documents go out of scope.
Yes, but it usually needs to be in scope for the life of a web page, so that part is not something to affect. |
Perhaps the index object should have a property of the count of the documents in the index. |
@joyously the number of documents in the index is available via the As for detecting if the app is about to run out of memory, there is a browser API for such feature, but unfortunately it is non-standard, and browser support is still sketchy.
Unfortunately,
Not necessarily. Garbage collection should be able to do its job. The only problem would be if you assign the documents or the index to global variables in a single-page application. But otherwise, memory can be reclaimed as soon as your code does not reference it anymore, with or without a function. A function is often a good way to make the documents go out of scope after the function returns, so if I understand correctly you are right. |
Version |
You didn't make a git tag for this or a GitHub release. The 5.0.0 is referenced in the readme only, in one line for the |
Thanks @joyously , the release was on NPM but I forgot to push the git tag |
This issue is for collecting feedback and possible issues during the beta testing of MiniSearch
v5.0.0
, which includes an improved scoring algorithm.The text was updated successfully, but these errors were encountered: