Search hangs when searching a repeated substring with wildcard #270

meltuhamy · 2017-05-30T10:04:34Z

See the jsFiddle here: https://jsfiddle.net/bwe0f44g/

var idx = lunr(function () {
  this.ref('name')
  this.field('text')
  this.add({
  "name": "long",
  "text": "Zlah_testZlah_Zlah_testZlah_testZlah_testZlah_testZlah_testZlah_testZlah_testZlah_testZlah"
});
})

window.search = function search(){
	var t0 = performance.now();
  var result = idx.search('*Zlah*');
  var t1 = performance.now();
  alert('Found '+result.length+' results in ' + (t1-t0) + 'ms');
}

The text was updated successfully, but these errors were encountered:

meltuhamy · 2017-05-30T10:24:41Z

It seems that if the search includes a * at the beginning of the query, the search hangs.

idx.search('*Zlah*') // hangs
idx.search('Zlah*') // ok

olivernn · 2017-05-30T16:53:38Z

On my machine that search eventually completes after 12 seconds, something is clearly not right though.

The test data looks fairly pathological. Out of interest, how did you discover it?

My guess is that the leading wildcard is causing many iterations through the graph that represents the token, I'll need to take a closer look with a debugger to see where its getting stuck though.

meltuhamy · 2017-05-30T16:59:44Z

I'm using wildcards to allow substring searches. If there are a lot of repeated items in the result, it will take longer and longer to perform the search. For now, I'm using my own pipeline function to expand all tokens into their substrings so the user never needs to use a wildcard character (I'm using lunr in the context of a loose auto complete search scenario).

olivernn · 2017-05-30T17:31:05Z

Searches with leading wildcards are considerably more expensive, that said, I wouldn't expect it to take this long. I'll have a look through the relevant code but its a bit involved and I wrote it over a year ago now.

Perhaps there is a better way of implementing what you are trying to achieve though, can you give an example of the kind of documents you are searching within, and the kind of query you are trying? Is string in the document multiple tokens merged together into a string? If I understood what you're trying to achieve better I might be able to suggest a more performant way of implement it.

meltuhamy · 2017-05-31T08:38:31Z

I'm doing a search auto complete feature where it finds all documents which have a substring of the search term. e.g.

the words

hello world
blah
word
nice
dice

and the search term d

should get the results

hello world
word
dice

because d is a substring of the two reuslts.

To begin with, I was able to achieve this using the wildcard method (i.e. search for *d*) but found the issue above and performance problems.

I'm open to ideas as to what's a better way of doing it, but for now I have a working solution which is a pipeline function to expand all terms to their subsets.

olivernn · 2017-06-03T12:35:14Z

If you have a solution that is working then thats great. I'll still spend a bit of time trying to understand the particular case you originally posted, perhaps there is some optimisation or bug causing the long run time.

meltuhamy · 2017-06-05T08:24:10Z

Yes it seems that is quite a nasty bug that is preventing me from using wildcards. Thanks for your help!

nerumo · 2017-09-25T12:15:43Z

@meltuhamy how exactly did you do the workaround with the pipeline function? I'm afraid that expanding every keyword to subsets would make my index burst.

I currently had to disable leading wildcards because of this issue (>20s with leading wildcard compared to <400ms without).

olivernn · 2018-09-03T19:52:37Z

I've pushed 2.3.3 which should resolve the long search times with some wildcard searches, let me know if it solves the issue, thanks.

meltuhamy · 2018-09-24T10:57:41Z

Confirmed this fixes it. Thanks @olivernn !

meltuhamy changed the title ~~Search hangs when searching a repeated string with wildcard~~ Search hangs when searching a repeated substring with wildcard May 30, 2017

olivernn mentioned this issue Aug 28, 2018

Leading wildcards on certain search terms cause 100% CPU, freezing browser #368

Closed

olivernn closed this as completed Sep 3, 2018

olivernn mentioned this issue Jan 21, 2020

Search is extremely slow #438

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search hangs when searching a repeated substring with wildcard #270

Search hangs when searching a repeated substring with wildcard #270

meltuhamy commented May 30, 2017

meltuhamy commented May 30, 2017

olivernn commented May 30, 2017

meltuhamy commented May 30, 2017

olivernn commented May 30, 2017

meltuhamy commented May 31, 2017

olivernn commented Jun 3, 2017

meltuhamy commented Jun 5, 2017

nerumo commented Sep 25, 2017

olivernn commented Sep 3, 2018

meltuhamy commented Sep 24, 2018

Search hangs when searching a repeated substring with wildcard #270

Search hangs when searching a repeated substring with wildcard #270

Comments

meltuhamy commented May 30, 2017

meltuhamy commented May 30, 2017

olivernn commented May 30, 2017

meltuhamy commented May 30, 2017

olivernn commented May 30, 2017

meltuhamy commented May 31, 2017

olivernn commented Jun 3, 2017

meltuhamy commented Jun 5, 2017

nerumo commented Sep 25, 2017

olivernn commented Sep 3, 2018

meltuhamy commented Sep 24, 2018