Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Lunr search module #192

Closed
daveaglick opened this issue Jun 23, 2021 · 3 comments
Closed

Rewrite Lunr search module #192

daveaglick opened this issue Jun 23, 2021 · 3 comments
Labels
General Enhancement New feature or request

Comments

@daveaglick
Copy link
Member

daveaglick commented Jun 23, 2021

There's a lot of potential to improve the Lunr module for client-side searching. Specifically, the lunr-core project makes building search indexes at generation time practical and can save both payload size and index build time on the client.

Here's some random thoughts:

  • We need to support two types of fields: those for searching (like content), and those for a result that don't need to be searched (like the link).
  • When Lunr returns results, it only returns a reference, which you use to get whatever other data you need. So we'll need something on the client to give us the data for fields back as a lookup based on the Lunr ref.
    • The index ref should be something stable and unique for a document. So maybe a sorted index (must be sorted if using an index for determinism!) or the cache code.
    • For always needed fields we can create a lookup table in the client JS. The smaller the better though (so no content data?).
    • For fields we only need after getting results, we could consider storing those in a separate JSON file with the same name as the index ref and deserializing them lazily as needed when results return. That would keep the core index client file much smaller. When reading them on the client based on results, could do so in parallel and cache (in local storage?!) in case another search gives the same result. Should also be able to turn off lazy reference field files if we don't want that feature or if no fields are marked as lazy.
  • Need ability to define fields that only appear in the lookup but not in the index (like link). And should specify if they're immediate (in the core client file) or lazy (in a separate client file).
  • The index could either be a separate (compressed?) file or directly in a client JS file. Either way, a client JS file should also be generated that bootstraps everything - though if index isn't in the bootstrap file, could consider making that one a theme problem (with an example in the docs).
  • The module should be able to go from inout documents to one (or more!) index items per input document.
  • At a module level, we should be able to define the fields (including the ref) and distinguish fields for search, immediate reference fields, and lazy reference fields.
    • Consider a IDictionary<string, bool?> where value indicates lazy for field definitions or only searchable (not immediate or lazy) if null. An enum might be clearer than bool?.
  • Default module settings should produce an index item for each document with a unique ref, title (based on GetTitle()), link, and content. Title and link should be immediate, content should be lazy (or just not part of reference item?).
  • Otherwise, a Config should be able to specify a mapping from the document to one or more (or none) index items. Each item should be a dictionary or maybe IMetadata.
    • When converting the index item to a Lunr Document for indexing, the keys should be converted to camelCase as per JS convention.
    • A complication is that Lunr accepts either a string or an array - we'd want to check and do the right thing so arrays as values get properly indexed.
@daveaglick
Copy link
Member Author

daveaglick commented Jun 24, 2021

Check out https://github.com/alanta/memoirs-theme/blob/main/Modules/LunrIndexer.cs from @alanta for some ideas. I especially like the HTML stripping regex for content.

Edit: lol, just realized that's in the existing module too - might even "promote" it to a common utility extension since it has value elsewhere too. Regardless, the linked LunrIndexer module has many other good ideas to borrow.

@daveaglick
Copy link
Member Author

Some thoughts on index vs. bootstrapper file:

  • Compress the search index JSON as gzip and store in a separate gzip file.
  • Output a bootstrapper js file that has the bits to conduct a search and lazily loads, unzips, and initializes the compressed index, but only when a search is performed (I.e. when a search isn't performed there's no performance hit to initialize the search index).
  • The bootstrapper should create a global object that includes functions needed to do everything transparently. The name of the object should be definable in case multiple search indexes are used.

@daveaglick
Copy link
Member Author

It's done! Really happy with how this turned out. It'll go out with a new release soon and I'll add comprehensive documentation to the docs site (and probably a search box 😄). I also need to put the finishing touches on how this flows through to Statiq Web.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
General Enhancement New feature or request
Development

No branches or pull requests

1 participant