New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to lunr 2.x #1319

Closed
waylan opened this Issue Oct 23, 2017 · 13 comments

Comments

Projects
4 participants
@waylan
Member

waylan commented Oct 23, 2017

At some point recently, lunr 2.0 was released. A summary of changes are here: https://lunrjs.com/guides/upgrading.html

More importantly, there is a page on Pre-Building Indexes which includes an example script. Unfortunately, I still don't see a spec (of the index format) so that a Python lib could reliably build the index.

In any event, when search is refactored, we should use lunr 2.x.

@waylan waylan added the Enhancement label Oct 23, 2017

@waylan waylan added this to the 1.0.0 milestone Oct 23, 2017

@waylan waylan changed the title from Upgrade to lung 2.x to Upgrade to lunr 2.x Oct 23, 2017

@olivernn

This comment has been minimized.

olivernn commented Oct 31, 2017

I still don't see a spec (of the index format) so that a Python lib could reliably build the index.

There is a JSON schema describing the format of a serialised index. The repository is marked as WIP since I really want to get some feedback from implementors before committing to it. That said it is pretty stable, the only open questions I have are on how to version the schema.

I've also started working on a Rust library for generating an index, it might prove useful in getting started with a Python implementation. Of course, if anyone is interested on taking on that work (lunr.py) then I'm more than happy to answer questions and offer guidance, I'm not a Python programmer though so its unlikely I'll be able to work on an implementation.

@squidfunk

This comment has been minimized.

squidfunk commented Oct 31, 2017

When you implement the pre-building - please, please, please make it optional. Material builds the index in the browser in a very specific way that is incompatible with a pre-built index.

@waylan

This comment has been minimized.

Member

waylan commented Oct 31, 2017

@squidfunk when I played with this some time ago, my proof of concept pre-built the index but also provided the current JSON file. Your theme could ignore the pre-build index although given the huge improvement it makes, I can't imagine why you would want to. Seems like it would be worth it to make you theme work with the pre-built index. Of course, any such change will come with a complete refactor of the JavaScript provided by the plugin, so perhaps you will no longer even feel the need to provide your own at that point (that is one of the goals I have with the refactor).

@squidfunk

This comment has been minimized.

squidfunk commented Oct 31, 2017

Well, I do some post-processing on the search_index.json, cleaning some stuff up. Maybe this could be done afterward with a pre-built index, but one would have to check.

@waylan waylan added the Plugin label Nov 1, 2017

@waylan waylan added this to To Do in Refactor search. Jan 31, 2018

@waylan waylan moved this from To Do to In Progress in Refactor search. Feb 1, 2018

@waylan

This comment has been minimized.

Member

waylan commented Feb 1, 2018

I'm not so sure we want to upgrade to Lunr 2.x. Being able to "boost" a search term is nice, but we need to always have the title field boosted, regardless of what search terms are used. As described in olivernn/lunr.js#312, because we cannot "boost" the title field in 2.x, the search results are quite different (less relevant) than with 1.x.

waylan added a commit to waylan/mkdocs that referenced this issue Feb 1, 2018

@waylan waylan moved this from In Progress to To Do in Refactor search. Feb 1, 2018

@squidfunk

This comment has been minimized.

squidfunk commented Feb 1, 2018

Material already uses lunr 2.x with very good results. You can take a look at the setup here: https://github.com/squidfunk/mkdocs-material/blob/master/src/assets/javascripts/components/Material/Search/Result.jsx#L172

@waylan

This comment has been minimized.

Member

waylan commented Feb 1, 2018

@squidfunk my point is that version 2 ignores the { boost: 10} on L189. In any event, if you aren't noticing a problem with the results returned, that's good to hear. Thanks.

Btw, how are you passing user settings from Python to Lunr (such as lang, ect)? Are you rendering then as JS variables in the template or some other way? I was thinking of including plugin specific settings in the JSON file. By way of explanation, the JSON currently is structured like this:

{ 'docs': [...] }

So I was thinking of adding a second key, config, like this:

{ 'docs': [...], 'config': { 'key1': 'value1', 'key2': 'value2'} }

Of course, you could ignore config and continue to use your current method. Any thoughts.

@squidfunk

This comment has been minimized.

squidfunk commented Feb 1, 2018

I'm passing them as a configuration in the base.html, see this code section

EDIT: ... and for everything locale/translation related, it's mostly meta tags, see, which are then read like this

EDIT2: the reason why I'm doing this is that the translations for Material are done within the templates, so it's the only way to go. Some values (like extra.search.tokenizer) can be set within mkdocs.yml, but the translations provide default values which can be overridden, see language.html

@yeraydiazdiaz

This comment has been minimized.

Contributor

yeraydiazdiaz commented Feb 1, 2018

@waylan lunr.js 2.x moved the boost to the search according to upgrading docs

so theoretically in the search call we could do something like idx.search('title:' + query + '^10') to match the original search

@waylan

This comment has been minimized.

Member

waylan commented Feb 1, 2018

so theoretically in the search call we could do something like idx.search('title:' + query + '^10') to match the original search

AFAICT, that makes things worse. If query = 'foo', then only results with "foo" in the title are returned. Any items with "foo" only in a second field (body in this case) are not included in the results at all. But if query = 'foo bar', then the same behavior happens for "bar" (results with "bar" only in the body are excluded) but "foo" seems to get normal treatment in that I get results containing "foo" in the title and/or the body with results containing "bar" in the title being boosted (listed first). At least that's my impression based on my observation of a few unscientific tests.

waylan added a commit to waylan/mkdocs that referenced this issue Feb 1, 2018

@waylan waylan moved this from To Do to In Progress in Refactor search. Feb 1, 2018

@yeraydiazdiaz

This comment has been minimized.

Contributor

yeraydiazdiaz commented Feb 3, 2018

I've been doing some research on how to best fit our use case for boosting only the title, and looks like the Lunr 2.x API makes it slightly harder. I made an attempt using Lunr 2.x lunr.query rather than search and compared the results between the two versions yielding similar but not quite the same results:

Lunr 1.0.0 - searching for: "plugins"
[0.4260642498048509]: /user-guide/plugins/
Title: Plugins
[0.4257030685803262]: /user-guide/configuration/#plugins
Title: plugins
[0.3801081750025572]: /user-guide/plugins/#using-plugins
Title: Using Plugins

Lunr 2.x - searching for: "plugins"
[1.4141055325761154]: /user-guide/plugins/#mkdocs-plugins
Title: MkDocs Plugins
[1.2579877975842753]: /user-guide/configuration/#plugins
Title: plugins
[1.1677531694473782]: /user-guide/plugins/
Title: Plugins

I'm not sure if hacking the user's query string is the best approach though. Maybe I'm missing something in the Lunr API that will allows to simply boost the title in 2.x as we did previously.

@olivernn @squidfunk @waylan any thoughts?

@squidfunk

This comment has been minimized.

squidfunk commented Feb 3, 2018

@yeraydiazdiaz I can tell a little about my experience with lunr 2.x:

Material for MkDocs has been using lunr 2.x for 10 months now and the feedback from the user base is really good. In the source I still configure the title to be boosted and the search results are really good for most of the cases, don't really know if it has any effect. Integration with lunr-languages is also quite good and search now works in all 20 languages (with some minor limitations).

@waylan

This comment has been minimized.

Member

waylan commented Feb 3, 2018

I still configure the title to be boosted and the search results are really good for most of the cases, don't really know if it has any effect.

Based on my own testing and according to the docs, it has no effect. However, it certainly did in version 1.x.

In #931 elasticlunr is referenced, which is a fork of lunr. According to their docs, "boosting" is done at query time, but by field, not by term, which makes more sense that lunr2.x to me. elasticlunr also appears to support lunr-languages.

Given the above, and the added customizability, I'm wondering if we should switch to elasticlunr. To be clear, I have no experience with it myself. It just seems like a better fit from a read of the docs.

waylan added a commit to waylan/mkdocs that referenced this issue Feb 27, 2018

@waylan waylan closed this in #1418 Mar 6, 2018

Refactor search. automation moved this from In Progress to Done Mar 6, 2018

waylan added a commit that referenced this issue Mar 6, 2018

Refactor search plugin (#1418)
* Use a web worker in the browser with a fallback (fixes #859 & closes #1396).
* Optionally pre-build search index (fixes #859 & closes #1061).
* Upgrade to lunr.js 2.x (fixes #1319).
* Support search in languages other than English (fixes #826).
* Allow the user to define the word separators (fixes #867).
* Only run searches for queries of length > 2 (fixes #1127).
* Remove dependency on require.js, mustache, etc. (fixes #1218).
* Compress the search index (fixes #1128).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment