Search problems #46

egmontkob · 2018-11-30T11:28:09Z

The Search box seems to be pretty unreliable to me. Sometimes it shows false positives, sometimes actual results are missing.

Example: On your site, type "searchsearch_enabled" to the search box.

As long as you type "searchsearch_en", there are matches. As you keep typing "abl", no more matches. When you type the trailing "ed", the matches come back.

Part of the story is that plenty of space characters are missing from search-data.json, e.g. whenever there's an explicit newline in markdown (to keep a sensible width of the source file), the two words become concatenated. That's how lines 14-15 of docs/configuration.md become "searchsearch_enabled", that's why searching for "abcdef" (present in ui-components/typography.md) doesn't give any match, etc.

Another part of the story is that the search engine must be buggy as well, for example not returning the match for the substring "searchsearch_ena". Another example: utilities/color.md contains the word "systemsized", this correctly appears in search_data.json, yet not found via the search box. It's quite easy to find other examples too.

pmarsceill · 2018-11-30T16:23:07Z

Yes, you're right there is some search weirdness. Honestly I was hoping to have someone who is more proficient in JavaScript / lunr.js help out here. I think the search engine itself works fine (I've tested it on other sites), but the way that the search index is getting generated and initialized is probably causing the majority of the problems.

kendallmiller · 2018-12-17T21:12:34Z

One quick fix I was able to do when it was erroring out on my content was to convert crlf, cr, lf, and tab to a single space:

    "content": "{{ page.content | markdownify | strip_html | xml_escape | remove: 'Table of contents' | normalize_whitespace | replace: "\r\n", " " | replace: "\r", " " | replace: "\n", " "  | replace: "\t", " " | replace: '\', ' ' }}",

The odd thing is that reading the documentation this seems to be what normalize_whitespace should do.

That improved the quality of matches, but it's still got issues - I'll take a peek at lunr next.

kendallmiller · 2018-12-17T22:02:02Z

I ended up somewhere interesting:

{  
  {% assign docId = 1 %}{% for page in site.html_pages %}{% unless page.search == "exclude" %}"{{ docId }}": {
    "id": "{{ docId }}",
    "title": "{{ page.title | xml_escape }}",
    "content": "{{ page.content | markdownify | strip_html | xml_escape | remove: 'Table of contents' | replace: "\r", " " | replace: "\n", " "  | replace: "\t", " " | normalize_whitespace |  replace: '\', ' ' }}",
    "url": "{{ page.url | absolute_url | xml_escape }}",
    "relUrl": "{{ page.url | xml_escape }}"
  }{% if forloop.last %}{% else %},
  {% endif %}{% assign docId = docId | plus: 1 %}{% endunless %}{% endfor %}
  {% for collection in site.collections %}{% unless collection.output == false %}{% for page in collection.docs %}{% unless page.search == "exclude" %}{% if forloop.first %},{% else %}{% endif %}"{{ docId }}": {
    "id": "{{ docId }}",
    "title": "{{ page.title | xml_escape }}",
    "content": "{{ page.content | markdownify | strip_html | xml_escape | remove: 'Table of contents' | replace: "\r", " " | replace: "\n", " "  | replace: "\t", " " | normalize_whitespace |  replace: '\', ' ' }}",
    "url": "{{ page.url | absolute_url | xml_escape }}",
    "relUrl": "{{ page.url | xml_escape }}"
  }{% if forloop.last %}{% else %},
  {% endif %}{% assign docId = docId | plus: 1 %}{% endunless %}{% endfor %}{% endunless %}{% endfor %}
}

In this case I made changes to:

Include content from collections that are output. I use them a lot and they were otherwise not being included.
I reordered the filtering so I swap in a space for CR, LF, and Tab then let normalize whitespace get rid of any consecutive whitespace.
I changed the document Id methodology so it could handle going through the additional collections
I couldn't see in the documentation what standard to use for excluding from documentation so I went with what I was already doing in my template (setting search = exclude on pages to skip). Season to taste.

pmarsceill · 2019-01-06T20:55:25Z

@kendallmiller thanks for your digging in here... I will take a look at this soon.

washere · 2019-01-12T16:57:12Z

Thanks for this theme. While we are waiting for some help on this issue, can we have an option for google search opening a new tab as in this theme:

https://phlow.github.io/feeling-responsive/search/

Would make this fantastic theme be deployed by many as there would be no more serious issues.

Thanks again.

pmarsceill · 2019-01-15T16:03:59Z

I think i have solved the search bugs in #52 ... I will open an issue to use collections as a separate work stream to not hold up v0.2.2 from being released.

pmarsceill added the help wanted label Nov 30, 2018

pmarsceill mentioned this issue Dec 7, 2018

[Help]How to improve search result ? #49

Closed

pmarsceill pinned this issue Dec 16, 2018

pmarsceill mentioned this issue Jan 7, 2019

v0.2.2 release branch #52

Merged

pmarsceill removed the help wanted label Jan 15, 2019

pmarsceill closed this as completed Jan 15, 2019

pmarsceill unpinned this issue Jan 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search problems #46

Search problems #46

egmontkob commented Nov 30, 2018 •

edited

pmarsceill commented Nov 30, 2018

kendallmiller commented Dec 17, 2018 •

edited

kendallmiller commented Dec 17, 2018

pmarsceill commented Jan 6, 2019

washere commented Jan 12, 2019

pmarsceill commented Jan 15, 2019 •

edited

Search problems #46

Search problems #46

Comments

egmontkob commented Nov 30, 2018 • edited

pmarsceill commented Nov 30, 2018

kendallmiller commented Dec 17, 2018 • edited

kendallmiller commented Dec 17, 2018

pmarsceill commented Jan 6, 2019

washere commented Jan 12, 2019

pmarsceill commented Jan 15, 2019 • edited

egmontkob commented Nov 30, 2018 •

edited

kendallmiller commented Dec 17, 2018 •

edited

pmarsceill commented Jan 15, 2019 •

edited