Large search indexes can cause Lunr to freeze the page while it creates the index. #859

sashadt · 2016-03-03T21:03:24Z

When using readthedocs theme, search shows "no results" for a few seconds while search is being performed, and then results are loaded. However, few seconds are enough to convince the users nothing is found, and they move on.

For example see: http://docs.datatorrent.com Try searching for "test" and you will see the following for a while:

Perhaps an indicator of search in progress can be added?

waylan · 2016-03-03T21:09:49Z

Perhaps an indicator of search in progress can be added?

That seems reasonable to me. Presumably the problem occurs due to the large number of pages. On a smaller site, this would not be a problem as the wait time would be much less.

sashadt · 2016-03-03T23:56:01Z

We have 50 files with 18,936 lines of Markdown. I don't know if that's considered large, but even for smaller projects running on slower servers it would provide a better user experience. Thanks for looking into it!

waylan · 2016-03-04T00:47:05Z

It appears that the RTD theme includes the default content "Sorry, page not found." on the search results page (see search.html#L18). However, the search tool will also replace any content with <p>No results found</p> (see search.js#L63) which makes the initial filler redundant. It appears that RTD displays nothing. Should we do the same or display a message to wait for results?

d0ugal · 2016-03-04T08:45:35Z

This is quite interesting. I assumed the delay was the length of time it takes to download the search index (about 0.5mb). However, there is a long delay after it is downloaded. So is the delay while lunr.js is processing the index? Meaning the slowness is probably more dependant users machines than the server. Looking a the Lunr change log there are a few performance improvements in 0.5.8 and later (we ship 0.5.7). So I think there are two things we should do:

Add a loading indicator (this could just be static text saying "Searching...") and remove the incorrect text in ReadTheDocs's search.html
Update Lunr.js to 0.6.0

d0ugal · 2016-03-04T09:36:52Z

I just cloned and tested with the DataTorrent docs locally, which confirms the slowness is with the JavaScript as downloading is almost instantaneous.

Even worse, my browser (Chrome) is locked up by the JavaScript for a good few seconds, so I can't actually click anything. I done some very rough measurements to try and find what was making it so slow. My timings below are all rounded because I don't have confidence in my measuring.

The loading of the JSON into Lunr is the culprit. This for loop takes around 8(!!) seconds. More specifically it is the call to index.add(doc), so I timed that for each document (the project has 1022, as we add individual markdown documents and page headers individually to improve the search results). Most of them are very fast (less than 50ms) but two are really slow. The Application Developer Guide takes around 1 second consistently and the Release Notes take around 4.5 seconds consistently.

I don't have time to keep looking now, but we need try and determine what makes these so slow. Is it just because these are long pages? Is there something that causes an issue with Lunr? I tested with both Lunr 0.5.7 and 0.6.0 and got similar results. 0.6.0 might actually be a little slower, but I didn't do enough tests to be confident about that.

waylan · 2016-03-04T13:54:49Z

Even worse, my browser (Chrome) is locked up by the JavaScript for a good few seconds, so I can't actually click anything.

I noticed the same behavior when looking into this as well. Although I didn't big any deeper. The point is that I can confirm this behavior is not isolated.

Add a loading indicator (this could just be static text saying "Searching...") and remove the incorrect text in ReadTheDocs's search.html

I don't see any reason why we can't do this now. I'll push a PR when I get a few minutes unless someone beats me to it.

The search tool will replace the text with "No results found" if that is the case. No reason to display is here. Also, the default text displays when the search is running (especially if the search takes a long time), so it might as well be accurate and indicate that the search is running. Addresses #859.

The loops in Index.add can easily be called hundreds of thousands of times for large documents. Larger documents can perform particularly badly[1] under Chrome. Removing the usage of reduce and filter in favour of inline native for loops can lead to a 20-30% performance increase in Chrome. Firefox also sees a similar but much smaller impovement (~5%). [1]: mkdocs/mkdocs#859 (comment)

d0ugal · 2016-03-06T20:09:47Z

I investigated further. FireFox is much faster than Chrome. I managed to improve the performance under Chrome a bit. See: olivernn/lunr.js#208

The loops in Index.add can easily be called hundreds of thousands of times for large documents. Larger documents can perform particularly badly[1] under Chrome. Removing the usage of reduce and filter in favour of inline native for loops can lead to a 20-30% performance increase in Chrome. Firefox also sees a similar but much smaller impovement (~5%). [1]: mkdocs/mkdocs#859 (comment)

olivernn · 2016-03-08T18:50:03Z

I've pushed @d0ugal's changes in 0.7.0 of lunr, so there should hopefully be some improvements from that change.

Without knowing anything of how mkdocs makes use of lunr, I have a few suggestions that could improve the performance and user experience further:

Pre-build the index and then load this pre-built index
Move searching/building over to a webworker

As an example, the example lunr app makes use loading a pre-built index. I think a member of the Angular.js team wrote up how they improved performance using web workers too.

See mkdocs#859

d0ugal · 2016-03-09T08:32:15Z

Our usage of Lunr is very simple. I think all the relevant code is less than 100 lines. It is a testament to Lunr that it has gotten us this far before we hit an issue like this.

Thanks for the pointers, it sound like those would be both valuable additions. I'll look into building the index, I think that would be the biggest win.

d0ugal · 2016-03-09T09:03:31Z

The main issue for us, is that we are Python based, pre-building the index would be tricky. We would need to shell out to node and call a file. I would be okay with that if it gracefully fell back to the current approach when node isn't available. However, this sort of functionality might be better suited for a plugin (so another candidate for #206).

That maybe means we should look into web workers for now and revisit this later.

pcdinh · 2016-03-16T04:57:56Z

I got the same issue. I have several files only. Every time I visit my mkdocs-based web document, the page is frozen. Firefox warning dialog appeared several times, that allowed me to stop lunr.js

I found that search_index.json response was about 1.2MB. It is fast to download but quite slow to parse in Javascript

waylan · 2016-03-16T15:08:52Z

The main issue for us, is that we are Python based, pre-building the index would be tricky. We would need to shell out to node and call a file.

I had wondered if there was a way to pre-build the index in Python. This is what I found:

The index is just JSON, so technically it should be possible to pre-build the index and serve that rather than the JSON file we serve today. However, there is no published spec for the index; although it is versioned. The index stores the version of lunr.js which was used to create it and an error is raised if a different version of lunr.js attempts to load the index. This suggests that backward incompatible changes between versions are possible/expected. However, as the lunr.js lib is shipped with MkDocs we have full control over which version is being used. The tricky part is the lack of a spec. Someone would have to go through the JS code and re-implement the index builder in Python and transitioning from one version to the next has the potential to require a similar amount of work. I suppose if a third party created such a lib, it might make sense to use it (or via a Plugin if/when MkDocs adds support), but until then, calling to Node is the more sensible approach. But even that feels more like something for a Plugin.

In the end, using webworkers to avoid a browser freeze is probably the most sensible short-term solution.

rickpeters · 2016-03-25T20:05:17Z

Maybe I don't understand it, but search_index.json is already generated when you do mkdocs build. Problem is not in the generating or downloading, the problem is parsing this download by lunr.js. I am not a real expert on this, just looked at the webworker example, but I have no idea about all the javascript dependencies that are necessary, Just seems that python is not the issue here?
keep up the good work guys, I love the product!

regards,
Rick

waylan · 2016-03-25T20:42:10Z

@rickpeters the search_index.json file generated by MkDocs is not a lunr.js search index. It is simply a JSON file which contains all of the content of every page in a project. Every time you do a search, that file is downloaded by the browser, and the list of JSON objects are stepped through (one page at a time) with each one being passed to lunr.js to build the index which lunr.js then uses to run the search. And with each page load, the entire index has to be rebuilt. At least that is the way it works now.

The ideal solution would be to have search_index.json be an actual prebuilt lunr.js index. However, as a workaround, a webworker allows the browser to parse and build the index in a separate process from the one which is displaying the page. On page reloads, that existing separate process would continue to be used and the index would not need to be rebuilt. Even if we had a proper prebuilt index, the webworker would eliminate the need to reload the index on each page load, so using a webworker is a good idea regardless of how we generate an index. It also avoids browser freezes on index loading/building. Hope that clears things up.

rickpeters · 2016-03-25T20:55:43Z

Thanks very much for this clarification!
Do you have pointers on where I could find how to prebuild this index? I am currently performing the full continuous integration / continuous delivery of my mkdocs site (> 250 pages) using Docker, including extra steps like inclusion of a dot-language plugin. It generates the site and then packages it into a nginx container for presentation. Works great and is very transferable.
I would be willing to look into integrating the necessary parts for pre-building the index which then could be used by lunr.js I guess?
Even then, the webworker would be a good idea I think. Would the webworker need server side javascript also? If needed I could package that in the container also.
And of course all of this could be added to mkdocs as opensource code for showing how to host your results using docker containers or eventually transerring the static site to a great other hosting solution.

waylan · 2016-03-25T21:55:04Z

The lunr.js library includes an example script which could be adapted and run under Node.js to build the index. You would need to call out to Node in your build processes (or call JavaScript from Python some way). That would give you a prebuilt index. However, the current lunr.js calling code in MkDocs would need to be updated to accept a prebuilt index (at aprox search.js#L32) similar to this.

That code in search.js would need to be modified for a webworker as well. The code would need to be placed in a webworker with a fallback for when a webworker is not available (in some older browsers).

waylan · 2016-03-28T23:46:24Z

FYI, I have succeeded in getting the search index to build here: 3009a3a. It was relatively easy using the PyExecJS lib. Still not usable though as search.js has not yet been modified.

You can follow my progress here: master...waylan:search

waylan · 2016-03-29T15:07:42Z

So I thought I had a working solution for pre-building the index (here: master...waylan:search), but I'm getting weird errors I can't make sense of on Windows. It works fine on Linux and I don't have access to a Mac for testing right now. If anyone wants to help, please test (my branch) on your local machine and report back.

facelessuser · 2016-03-29T15:52:29Z

The require path must use forward slashes:

        lunr_path = os.path.join(
            os.path.dirname(os.path.abspath(__file__)),
            'assets/search/mkdocs/js/lunr.min.js'
        ).replace('\\', '/')

waylan · 2016-03-29T17:28:39Z

Thanks @facelessuser. That solved one problem.

I have Node installed on all my machines and it is working now. However, as I understand it, PyExecJS does not need any dependencies installed in Windows or OS X as it can use JScript or Apple JavaScriptCore, each of which are installed by default on their respective systems (Linux will always need an additional dependency (one of Node.js, PyV8, Spidermoney, PhantomJS, ...), but that seems like an acceptable compromise).

When I force the use of JScript, I'm getting an error I can't make sense of (TypeError: Object expected with no indication of where in the JS the error is being raised) and I don't have access to a Mac, so any additional feedback would be appreciated.

YoungElPaso · 2016-09-30T21:03:55Z

Maybe I don't know enough about the details but could the index, once created be saved into localstorage and then accessed from any subsequent page load without requiring it to be built again and again?

waylan · 2016-10-01T00:11:19Z

@YoungElPaso, that feature would need to be added to the library we use (olivernn/lunr.js), and if I recall correctly, such a feature request was made in the past and denied for good reason. I don't recall the details now, but perhaps a search of that project would answer your question better.

olivernn · 2016-10-03T17:29:43Z

@YoungElPaso @waylan You can achieve this without any modifications to the lunr. There is good support for serialising an index to JSON and then loading an index from JSON, this will be significantly quicker than regenerating the index.

Perhaps something like this:

if (existsInLocalStorage()) {
  var idx = lunr.Index.load(getJSONFromLocalStorage())
} else {
  var idx = buildIndexFromScratch()
  storeIndexInLocalStorage(JSON.stringify(idx))
}

waylan · 2016-10-03T17:52:55Z

Before we jump down the "store the index in LocalStorage" rabbit hole, we need to address the creation of the index in the first place. Even using LocalStorage, the first visit to a site would be painfully slow and unresponsive for the user (on a large site). You have lost your user on their first visit and they probably won't be coming back. Until we have that problem fixed, there is no point of exploring how to store the index across multiple visits. And if we do fix that problem, then maybe we don't need to worry about storing the index across visits. My point is, even if LocalStorage is possible, it doesn't solve the problem. it just addresses one of the symptoms.

As an aside, my current roadmap for this issue is as follows:

Pre-build the index (with a client-side fallback for those who don't have node installed)
Finish the Plugin API (see Plugin API. #206)
Refactor search to be a plugin
Wrap client built index in a webworker
Explore other options for storing index client-side longterm

Note that I only have item 1 completed. I started item 4 and realized that the above would be a better approach, so I stopped and moved my attention to the Plugin API. The great thing about using a Plugin is that others can create their own search which starts with a different set of assumptions (or different backend tools) to better meet there needs.

olivernn · 2016-10-03T18:17:29Z

@waylan completely agree that pre-building is the best option here. Was just saying that there is nothing in lunr preventing a serialised index being stored client side. The tradeoff made with pre-building the index and serving it to the client is that, even gzipped, it can be quite large. That said, if it was me, I'd still opt for serving a pre-built index.

If there is anything I can help with from lunr then please do let me know, I'm more than happy to help.

waylan · 2016-10-03T18:53:44Z

If there is anything I can help with from lunr then please do let me know, I'm more than happy to help.

What would be ideal is a Python library that builds the index. Requiring our users to have both Python and Node installed is a little much (our Python script calls out to a Node script to build the index). As it happens, many users have both installed anyway, but I'm not comfortable making it a hard requirement and having the fallback results in two very different user experiences. Of course, I don't expect you to go build a Python clone of your JavaScript lib, but a publicly published spec of your index file format could go a long way toward someone else creating a Python lib (or Ruby, or ... whatever other language static site generators are written in).

olivernn · 2016-10-03T20:25:44Z

This is similar to an approach I am hoping to move towards with a new version of lunr, documenting (JSON Schema?) the serialisation format would open up the possibility of using the JavaScript version as a client, with the actual indexing etc being done in any other language that can generate the serialised index. Python would probably be an excellent choice for this given the number of NLP and IR libraries available.

My Python is non-existent, so I'm unlikely to be able to produce anything production ready myself, but by documenting the format others could certainly produce an implementation that would be easier to integrate into MkDocs (or other tools).

This builds an actual lunr.js search index via Node.js in addition to a JSON array of page objects previously used to build the index client side. Note that both are needed as the array of page objects is used to display search results. If Node fails (not installed or errors out), then fallback to old behavior by creating an empty index. If the client (browser) receives an empty index, then it builds the index from the array of pages as previously, which is fine for most (small) sites. Large sites will want to make sure Node is installed to avoid the browser from hanging during index creation. Partially addresses mkdocs#859.

squidfunk · 2017-10-22T19:28:48Z

Just a note: This may be a perfect case for requestIdleCallback - processing in the background without freezing the UI. I will think about including this into Material. Reasonable polyfills available.

Fixes mkdocs#1218, fixes mkdocs#1127, and partially addresses mkdocs#859.

waylan · 2018-02-28T16:56:59Z

FYI to everyone following this: A refactor of search is available for review in #1418 which addresses all of the concerns raised here and more.

* Use a web worker in the browser with a fallback (fixes #859 & closes #1396). * Optionally pre-build search index (fixes #859 & closes #1061). * Upgrade to lunr.js 2.x (fixes #1319). * Support search in languages other than English (fixes #826). * Allow the user to define the word separators (fixes #867). * Only run searches for queries of length > 2 (fixes #1127). * Remove dependency on require.js, mustache, etc. (fixes #1218). * Compress the search index (fixes #1128).

sashadt changed the title ~~Search shows "No results" for a few seconds before results are loaded~~ Search shows "Sorry, page not found" for a few seconds before results are loaded Mar 3, 2016

d0ugal added the Bug label Mar 4, 2016

waylan mentioned this issue Mar 4, 2016

Update default text when search is running. #860

Merged

d0ugal mentioned this issue Mar 4, 2016

Improve the performance of Index.add. olivernn/lunr.js#208

Closed

d0ugal added a commit to d0ugal/mkdocs that referenced this issue Mar 9, 2016

Upgrade to Lunr.js 0.7

f5e7e14

See mkdocs#859

d0ugal mentioned this issue Mar 9, 2016

Upgrade to Lunr.js 0.7 #861

Merged

d0ugal changed the title ~~Search shows "Sorry, page not found" for a few seconds before results are loaded~~ Large search indexes can cause Lunr to freeze the page while it creates the index. Mar 9, 2016

waylan mentioned this issue Oct 3, 2016

Optimize search by building index locally. #1061

Closed

waylan self-assigned this Dec 2, 2016

waylan mentioned this issue Dec 13, 2016

Remove dependency lunr.js #1109

Closed

This was referenced Jan 20, 2017

Run search only for query having 3 and more characters #1127

Closed

Compress search_index.json #1128

Closed

fskreuz mentioned this issue Apr 3, 2017

Copy over examples from examples.ractivejs.org ractivejs/ractivejs.github.io#6

Closed

fskreuz mentioned this issue Jun 5, 2017

Add back the mkdocs search ractivejs/ractivejs.github.io#67

Closed

waylan added the Plugins label Nov 1, 2017

ihnorton mentioned this issue Dec 27, 2017

Better search results JuliaDocs/Documenter.jl#560

Merged

waylan mentioned this issue Jan 16, 2018

Defer scripts #1380

Merged

yeraydiazdiaz mentioned this issue Jan 27, 2018

Use web worker to build Lunr index #1396

Closed

waylan added a commit to waylan/mkdocs that referenced this issue Jan 31, 2018

Implement fallback for no worker support

2c544d6

Fixes mkdocs#1218, fixes mkdocs#1127, and partially addresses mkdocs#859.

waylan added a commit to waylan/mkdocs that referenced this issue Feb 27, 2018

Implement fallback for no worker support

f47a1b7

Fixes mkdocs#1218, fixes mkdocs#1127, and partially addresses mkdocs#859.

waylan mentioned this issue Feb 27, 2018

Refactor search plugin #1418

Merged

waylan closed this as completed in #1418 Mar 6, 2018

ptbrowne mentioned this issue Jun 5, 2018

Parsing of the search index causes page to freeze at load cozy/cozy.github.io#80

Closed

yeraydiazdiaz mentioned this issue Aug 25, 2018

Use lunr.py for prebuilding indices #1607

Merged

jcuenod mentioned this issue Apr 9, 2020

Lunr search performance wowthemesnet/mediumish-theme-jekyll#127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large search indexes can cause Lunr to freeze the page while it creates the index. #859

Large search indexes can cause Lunr to freeze the page while it creates the index. #859

sashadt commented Mar 3, 2016

waylan commented Mar 3, 2016

sashadt commented Mar 3, 2016

waylan commented Mar 4, 2016

d0ugal commented Mar 4, 2016

d0ugal commented Mar 4, 2016

waylan commented Mar 4, 2016

d0ugal commented Mar 6, 2016

olivernn commented Mar 8, 2016

d0ugal commented Mar 9, 2016

d0ugal commented Mar 9, 2016

pcdinh commented Mar 16, 2016

waylan commented Mar 16, 2016

rickpeters commented Mar 25, 2016

waylan commented Mar 25, 2016

rickpeters commented Mar 25, 2016

waylan commented Mar 25, 2016

waylan commented Mar 28, 2016

waylan commented Mar 29, 2016

facelessuser commented Mar 29, 2016

waylan commented Mar 29, 2016

YoungElPaso commented Sep 30, 2016

waylan commented Oct 1, 2016

olivernn commented Oct 3, 2016

waylan commented Oct 3, 2016 •

edited

olivernn commented Oct 3, 2016

waylan commented Oct 3, 2016

olivernn commented Oct 3, 2016

squidfunk commented Oct 22, 2017

waylan commented Feb 28, 2018

Large search indexes can cause Lunr to freeze the page while it creates the index. #859

Large search indexes can cause Lunr to freeze the page while it creates the index. #859

Comments

sashadt commented Mar 3, 2016

waylan commented Mar 3, 2016

sashadt commented Mar 3, 2016

waylan commented Mar 4, 2016

d0ugal commented Mar 4, 2016

d0ugal commented Mar 4, 2016

waylan commented Mar 4, 2016

d0ugal commented Mar 6, 2016

olivernn commented Mar 8, 2016

d0ugal commented Mar 9, 2016

d0ugal commented Mar 9, 2016

pcdinh commented Mar 16, 2016

waylan commented Mar 16, 2016

rickpeters commented Mar 25, 2016

waylan commented Mar 25, 2016

rickpeters commented Mar 25, 2016

waylan commented Mar 25, 2016

waylan commented Mar 28, 2016

waylan commented Mar 29, 2016

facelessuser commented Mar 29, 2016

waylan commented Mar 29, 2016

YoungElPaso commented Sep 30, 2016

waylan commented Oct 1, 2016

olivernn commented Oct 3, 2016

waylan commented Oct 3, 2016 • edited

olivernn commented Oct 3, 2016

waylan commented Oct 3, 2016

olivernn commented Oct 3, 2016

squidfunk commented Oct 22, 2017

waylan commented Feb 28, 2018

waylan commented Oct 3, 2016 •

edited