Feature request: support for case-sensitive and case-insensitive search #331

giuliac89 · 2018-02-16T15:11:22Z

Hi Oliver,
do you plan to add this feature?

hoelzro · 2018-03-06T21:08:42Z

@giuliac89 FWIW, you could add this feature in current lunr.js by tweaking the pipeline - I believe the forced lowercasing that currently happens happens in the tokenizer.

olivernn · 2018-03-07T19:50:22Z

.@hoelzro is right, the current down casing happens inside lunr.tokenizer. Unfortunately this would mean you would need to re-implement it just to change that one part.

Do you have a specific use case in mind? How does the current behaviour fall short?

giuliac89 · 2018-03-08T09:08:30Z

I'm implementing a search engine for a research project related to philological editions. http://evt.labcd.unipi.it/

It's important to add this functionality to ensure more details in the philological studies that will be carried out on these editions.

olivernn · 2018-03-09T09:31:30Z

So, in your case, a term, say "FOO", has a different meaning than the downcased term "foo"?

As well as lunr.tokenizer the query parser also downcasses terms. This only affects lunr.Index#search, not lunr.Index#query:

// won't work, gets converted to "foo"
idx.search("FOO") 

// will work, no further processing of the terms done
idx.query(function (q) {
  q.term("FOO")
})

giuliac89 · 2018-03-09T14:13:45Z

Yes, the difference between a term "FOO" and a term "foo" could be basic for some research studies and this is the reason why I would like to include this feature in my search engine. So the only thing that I can do is re-implement the tokenizer.

Do you think that this feature could be interesting for lunr.js?

indolering · 2019-07-20T06:15:04Z

Do you think that this feature could be interesting for lunr.js?

To be honest, it seems pretty niche. It wouldn't be hard to implement as an all-or-nothing feature of the index (just add it as a config option) but how would you support query time case-sensitivity without blowing up the index size? I think it's important to remember that Lunr is primarily for static websites and size is a big deal....

giuliac89 · 2019-07-20T09:28:27Z

Well, I tried to develop the feature in my web app and the index size is not a big problem in this case!
In a document of about 1460 words, the index size (including two types of metadata) without case-sensitive feature is about 121kb. With case-sensitive feature is about 158kb.
Indexing is only in "lowercase mode". To handle case-sensitivity I simply developed a custom tokenizer, in which I create a lunr token like this:

new lunr.Token (token, {
   position: [startIndex, tokenLength],
   index: tokens.length,
   originalToken: originalToken
});

So I register the "original token" as metadata:

0: lunr.Token {
   str: "in",
   metadata: {
      index: 0
      originalToken: "In"
      position: (2) [0, 2]
   }
}

In this way is simple check the case-sensitivity without making the index size increase considerably.

indolering · 2019-07-20T14:44:49Z

Submit a patch!

olivernn · 2019-07-24T19:40:48Z

@giuliac89 @indolering this seems like a good candidate for being turned into a plugin, if so we could add it to the new list of plugins on the wiki and the website. If someone does the work to package this up I'm more than happy to feature it.

This was referenced Jan 28, 2021

Search is not intuitive matcornic/hugo-theme-learn#353

Open

invalid search results matcornic/hugo-theme-learn#340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: support for case-sensitive and case-insensitive search #331

Feature request: support for case-sensitive and case-insensitive search #331

giuliac89 commented Feb 16, 2018

hoelzro commented Mar 6, 2018

olivernn commented Mar 7, 2018

giuliac89 commented Mar 8, 2018 •

edited

Loading

olivernn commented Mar 9, 2018

giuliac89 commented Mar 9, 2018

indolering commented Jul 20, 2019

giuliac89 commented Jul 20, 2019

indolering commented Jul 20, 2019

olivernn commented Jul 24, 2019

Feature request: support for case-sensitive and case-insensitive search #331

Feature request: support for case-sensitive and case-insensitive search #331

Comments

giuliac89 commented Feb 16, 2018

hoelzro commented Mar 6, 2018

olivernn commented Mar 7, 2018

giuliac89 commented Mar 8, 2018 • edited Loading

olivernn commented Mar 9, 2018

giuliac89 commented Mar 9, 2018

indolering commented Jul 20, 2019

giuliac89 commented Jul 20, 2019

indolering commented Jul 20, 2019

olivernn commented Jul 24, 2019

giuliac89 commented Mar 8, 2018 •

edited

Loading