-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search hangs when searching a repeated substring with wildcard #270
Comments
It seems that if the search includes a * at the beginning of the query, the search hangs.
|
On my machine that search eventually completes after 12 seconds, something is clearly not right though. The test data looks fairly pathological. Out of interest, how did you discover it? My guess is that the leading wildcard is causing many iterations through the graph that represents the token, I'll need to take a closer look with a debugger to see where its getting stuck though. |
I'm using wildcards to allow substring searches. If there are a lot of repeated items in the result, it will take longer and longer to perform the search. For now, I'm using my own pipeline function to expand all tokens into their substrings so the user never needs to use a wildcard character (I'm using lunr in the context of a loose auto complete search scenario). |
Searches with leading wildcards are considerably more expensive, that said, I wouldn't expect it to take this long. I'll have a look through the relevant code but its a bit involved and I wrote it over a year ago now. Perhaps there is a better way of implementing what you are trying to achieve though, can you give an example of the kind of documents you are searching within, and the kind of query you are trying? Is string in the document multiple tokens merged together into a string? If I understood what you're trying to achieve better I might be able to suggest a more performant way of implement it. |
I'm doing a search auto complete feature where it finds all documents which have a substring of the search term. e.g. the words
and the search term should get the results
because To begin with, I was able to achieve this using the wildcard method (i.e. search for I'm open to ideas as to what's a better way of doing it, but for now I have a working solution which is a pipeline function to expand all terms to their subsets. |
If you have a solution that is working then thats great. I'll still spend a bit of time trying to understand the particular case you originally posted, perhaps there is some optimisation or bug causing the long run time. |
Yes it seems that is quite a nasty bug that is preventing me from using wildcards. Thanks for your help! |
@meltuhamy how exactly did you do the workaround with the pipeline function? I'm afraid that expanding every keyword to subsets would make my index burst. I currently had to disable leading wildcards because of this issue (>20s with leading wildcard compared to <400ms without). |
I've pushed 2.3.3 which should resolve the long search times with some wildcard searches, let me know if it solves the issue, thanks. |
Confirmed this fixes it. Thanks @olivernn ! |
See the jsFiddle here: https://jsfiddle.net/bwe0f44g/
The text was updated successfully, but these errors were encountered: