-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How large can an index be? #76
Comments
Hi @yet-another-dev , Search performance shouldn’t be an issue even with extremely large indexes, as the algorithms used scale independently of the index size (the exception being fuzzy search, but also this shouldn’t be a problem unless one is using large fuzziness factors). In short, you shouldn’t be limited by search performance, if the index fits in memory. Indexing performance depends on the number of documents and their size, so it can get slower with huge collections. Still, I find that re-indexing client side on page load is in most cases the right approach, and serializing/caching the index is only reserved to corner cases. Even in challenging use cases where indexing takes 2 or 3 seconds (this would be the case only on huge collections of documents), I often solve this at the UI level: I perform indexing asynchronously with I routinely use MiniSearch for production applications indexing tens of thousands of documents on the fly on page reload (e.g. products in a product search). Performance has never occurred as an issue, and there is no noticeable lag compared to smaller use cases. These apps are used often on mobile browsers, including rather old smartphones, and we never received issues there either. In sum, I’d say memory is the main limit, but even that is quite farther than one would expect, thanks to the compact index data structure. I hope this provides the info you need. |
I will close this issue for now, but feel free to comment on it if you have more questions or doubts. |
Hi, I am running a test with 5000 400-word documents and indexing with addAllAsync is taking over 10 seconds. Is that the type of performance you would expect indexing a collection of that size or does it suggest that I may be doing something wrong? Thanks. |
Hi @dustfoxer , // this is slow, because it creates a new hash
// on each iteration:
const docById = documents.reduce((byId, document) => (
{ …byId, [document.id]: document }
), {}) On that app it looked like indexing was very slow, but it was this loop instead taking most of the time. That said, consider that Finally, in some applications, the specific way one designs the UI can make the difference too, even without making the indexing faster: sometimes in one-page apps one can render everything and just temporarily disable the search until the indexing is done, and often that is good enough for most users, who might not need to initiate a search immediately. |
Thank you for the detailed response. I've been using a hand-rolled search function until the the index is built. But the actual data set I'm working with is even larger than the tests I've been running so indexing is taking as much as 30 seconds. I guess my use case might just not be a good fit for miniSearch. But i'll keep experimenting. Thanks again. |
Any sense of how large an index's source-collection file size may be before performance becomes an issue?
The text was updated successfully, but these errors were encountered: