highlight search keywords #97

arussel · 2014-07-08T04:19:51Z

I have a requirement that search keywords should be highlighted in the found documents, is there a way to do this with lunrjs atm ?

olivernn · 2014-07-14T17:25:23Z

There are a couple of issues requesting this feature. It involves quite a few changes to the way lunr works. I made a start on an implementation here but I haven't got round to completing it. I'll try and spend some more time with it this week and see if I can get something out for people to try.

Qvatra · 2014-08-21T10:58:09Z

Is this issue already solved? I didn't find the way to do that with lunrjs...

olivernn · 2014-08-25T19:20:48Z

Sorry, still don't have a decent answer to this. It involves quite a lot of change to lunr and I'm not entirely convinced with the current implementation I put together (linked in the comment). I need to spend some more time thinking about how best to implement highlighting without sacrificing either index size or performance, and that takes some time!

jonathanhudak · 2015-07-09T18:49:59Z

If it helps anyone I achieved this functionality by using BlastJS http://julian.com/research/blast/

shobhitg · 2015-10-01T17:03:58Z

@olivernn I understand you were trying to get the right balance between index size and performance. Will it be possible for you to describe what approach were you trying/planning in branch next?

shobhitg · 2015-10-01T17:08:17Z

@olivernn Is it possible to get to know which stem word actually matched?

If yes, then I can easily use that information with the BlastJS library mentioned above by @hudakdidit

olivernn · 2015-10-05T19:44:47Z

What I was trying to do was to wrap the token in a lunr.Token which would keep track of any extra metadata about the token that was picked up in the pipeline. One such piece of metadata could have been the position of the token in the original text.

It involved vast changes to the existing way lunr works, and in the end I think I decided that rather than try and retrofit this kind of feature into the existing architecture a bigger rethink was required. The problem with that is getting the time to really work through what a different architecture would look like, time I just haven't had :(

As for getting the step that matched, you would basically have to re-implement parts of what the search function is currently doing:

Run the pipeline on the search terms, this gives you the stemmed tokens
Find the documents that contain each stemmed token using idx.tokenStore.get(stem)

There is no easier or more efficient way of doing this with the current set up of the lunr.

julkue · 2016-02-03T11:11:14Z

I would like to realize this with a highlighting component. However, first of we need to make sure that highlighted words and matches by lunr are exactly the same for a good usability concept. Therefore I created #200.

drallgood · 2016-02-11T11:18:24Z

@julmot
I did it this way:

var queryTokens = lunr.tokenizer(request.term)
$.each(queryTokens, function(index, token) {
     pageContentElement.jmHighlight(token,{"className":"lunr-match-highlight"});
});

Seemed "good enough" to me. I could have also used the full Lunr pipeline to get the stemmed words, but then the highlights would look weird (e.g. you search for 'Persistence' and it highlights 'persist')

julkue · 2016-02-11T13:08:52Z

@drallgood Thanks for letting me know.

I don't think it would be weird, rather it would be consistent. Imagine a situation where a user searches for "Searched". Lunr will find files containing "searching". But a highlighting component will highlight nothing (different than expected), as there is not the exact term "searched" inside.

Do you know a way to get all found words, also "searching" in this example?

d0ugal · 2016-03-10T09:50:01Z

This looks like a duplicate of #25

julkue · 2016-03-10T09:56:40Z

@d0ugal There are a couple of issues that are similar here

nknapp · 2016-07-18T07:14:48Z

@olivernn I'm not sure that I understood your comment correctly, so maybe the following is the same thing your said: I think the Lucene way of highligting terms

passing the query through the pipeline in order to get the stemmed search terms and then
passing the found document through the pipeline (again) match the words against the stemmed search tems. While doing this, it keeps track of the offset of the matched tokens and uses those to highlight terms and to extract to snippet of the text that matches best.

I have no deep experience with lunr so far, but it seems to be that this approach would not require large refactorings of the code. Or I may be completely mistaken.

Bahar1978 · 2016-12-08T14:51:12Z

Hello @drallgood ,
Could you please let me know how you could highlight the search terms?

drallgood · 2016-12-15T16:49:55Z

@hajarghaem
Sure.

The basic idea is as follows:

Get all documents that match
Reduce the set to the number of results you'd like to show
Get the content for those matching documents
Use mark.js (formerly known as jmHighlight) to highlight the keywords in those documents
Clean up the documents so that you'll only show a small portion of highlighted text.
Append the resulting content to your search results
repeat 4-6 for all documents in your result

Some code (this is actually embedded in an jQuery autocomplete definition):

      var queryTokens = lunrIndex.pipeline.run(lunr.tokenizer(request.term))
      var resultSet = _.chain(lunrIndex.search(request.term)).take(10).pluck('ref').map(function(ref) {
        return lunrData.docs[ref];
      }).value();

      resultSet.reduce(function(sequence, item) {
        return sequence.then(function() {
          return $.get(item.url);
        }).then(function( data ) {
            item.excerpt = '';
            var pageContent = $.parseHTML(data);
            var pageContentElement = $(pageContent).filter(".doc-body");

            $.each(queryTokens, function(index, token) {
              pageContentElement.jmHighlight(token,{"className":"lunr-match-highlight"});
            });

            pageContentElement.find(".lunr-match-highlight").slice(0,4).each(function(index, blastElement){
              var text = $(blastElement).map(function(i, element){
                  var previousNode = this.previousSibling.nodeValue;
                  var nextNode = this.nextSibling.nodeValue;
                  var wordsBefore = _.escape(previousNode.split(' ').slice(-10).join(' '));
                  var wordsAfter = _.escape(nextNode.split(' ').slice(0,10).join(' '));

                  if(nextNode.endsWith(" ")) {
                    wordsBefore += " ";
                  }

                  return wordsBefore + element.outerHTML + wordsAfter
              }).first().get();
              if(!item.excerpt) {
                item.excerpt = '';
              }
              item.excerpt += '<p class="lunr-match-highlight_result">'+text+"</p>";
            });
        });

Probably not the nicest code, but it works ;)

olivernn · 2017-04-10T20:23:54Z

The latest version of Lunr does provide support for highlighting matches in documents. There is a demo showing this in action.

To be clear, Lunr does not provide the actual highlighting, but it is now able to return the positions of keywords that did match. This should enable the use of other libraries to perform the highlighting of terms in a page.

Please try it out and let me know any feedback.

JinxMan25 · 2017-04-15T20:01:22Z

I can't seem to get the position of the terms returned from the result in the metaData attribute. How can I get the position?

olivernn · 2017-04-18T17:36:30Z

@clanofnoobs please open a new issue showing what you've tried and I'll take a look.

I'm closing this issue now as there is support for highlighting terms with lunr. If there are problems with getting highlighting to work they should considered bugs and a new issue should be opened.

edave · 2018-11-26T00:42:23Z

For anyone who comes across this, in reference to @clanofnoobs's question, the position must be whitelisted in the metadata when the index is constructed (within the passed-in function), like so:

this.metadataWhitelist = ['position']

From the bottom of https://lunrjs.com/guides/core_concepts.html

manuadappt · 2021-06-22T05:27:04Z

@hajarghaem
Sure.

The basic idea is as follows:

Get all documents that match
Reduce the set to the number of results you'd like to show
Get the content for those matching documents
Use mark.js (formerly known as jmHighlight) to highlight the keywords in those documents
Clean up the documents so that you'll only show a small portion of highlighted text.
Append the resulting content to your search results
repeat 4-6 for all documents in your result

Some code (this is actually embedded in an jQuery autocomplete definition):

      var queryTokens = lunrIndex.pipeline.run(lunr.tokenizer(request.term))
      var resultSet = _.chain(lunrIndex.search(request.term)).take(10).pluck('ref').map(function(ref) {
        return lunrData.docs[ref];
      }).value();

      resultSet.reduce(function(sequence, item) {
        return sequence.then(function() {
          return $.get(item.url);
        }).then(function( data ) {
            item.excerpt = '';
            var pageContent = $.parseHTML(data);
            var pageContentElement = $(pageContent).filter(".doc-body");

            $.each(queryTokens, function(index, token) {
              pageContentElement.jmHighlight(token,{"className":"lunr-match-highlight"});
            });

            pageContentElement.find(".lunr-match-highlight").slice(0,4).each(function(index, blastElement){
              var text = $(blastElement).map(function(i, element){
                  var previousNode = this.previousSibling.nodeValue;
                  var nextNode = this.nextSibling.nodeValue;
                  var wordsBefore = _.escape(previousNode.split(' ').slice(-10).join(' '));
                  var wordsAfter = _.escape(nextNode.split(' ').slice(0,10).join(' '));

                  if(nextNode.endsWith(" ")) {
                    wordsBefore += " ";
                  }

                  return wordsBefore + element.outerHTML + wordsAfter
              }).first().get();
              if(!item.excerpt) {
                item.excerpt = '';
              }
              item.excerpt += '<p class="lunr-match-highlight_result">'+text+"</p>";
            });
        });

Probably not the nicest code, but it works ;)

what is "request.term" within tokenizer

olivernn mentioned this issue Jun 16, 2015

Get surrounding text for search. #158

Closed

Windyo mentioned this issue Jul 17, 2016

Include hit snippet in result jamwise/ghostHunter#15

Closed

olivernn closed this as completed Apr 18, 2017

This was referenced Nov 30, 2018

how to get the positions of the term matches weixsong/elasticlunr.js#96

Open

Indicating which field was matched on during search weixsong/elasticlunr.js#91

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

highlight search keywords #97

highlight search keywords #97

arussel commented Jul 8, 2014

olivernn commented Jul 14, 2014

Qvatra commented Aug 21, 2014

olivernn commented Aug 25, 2014

jonathanhudak commented Jul 9, 2015

shobhitg commented Oct 1, 2015

shobhitg commented Oct 1, 2015

olivernn commented Oct 5, 2015

julkue commented Feb 3, 2016

drallgood commented Feb 11, 2016

julkue commented Feb 11, 2016

d0ugal commented Mar 10, 2016

julkue commented Mar 10, 2016

nknapp commented Jul 18, 2016

Bahar1978 commented Dec 8, 2016

drallgood commented Dec 15, 2016

olivernn commented Apr 10, 2017

JinxMan25 commented Apr 15, 2017

olivernn commented Apr 18, 2017

edave commented Nov 26, 2018

manuadappt commented Jun 22, 2021

highlight search keywords #97

highlight search keywords #97

Comments

arussel commented Jul 8, 2014

olivernn commented Jul 14, 2014

Qvatra commented Aug 21, 2014

olivernn commented Aug 25, 2014

jonathanhudak commented Jul 9, 2015

shobhitg commented Oct 1, 2015

shobhitg commented Oct 1, 2015

olivernn commented Oct 5, 2015

julkue commented Feb 3, 2016

drallgood commented Feb 11, 2016

julkue commented Feb 11, 2016

d0ugal commented Mar 10, 2016

julkue commented Mar 10, 2016

nknapp commented Jul 18, 2016

Bahar1978 commented Dec 8, 2016

drallgood commented Dec 15, 2016

olivernn commented Apr 10, 2017

JinxMan25 commented Apr 15, 2017

olivernn commented Apr 18, 2017

edave commented Nov 26, 2018

manuadappt commented Jun 22, 2021