Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search problems #46

Closed
egmontkob opened this issue Nov 30, 2018 · 6 comments
Closed

Search problems #46

egmontkob opened this issue Nov 30, 2018 · 6 comments

Comments

@egmontkob
Copy link

egmontkob commented Nov 30, 2018

The Search box seems to be pretty unreliable to me. Sometimes it shows false positives, sometimes actual results are missing.

Example: On your site, type "searchsearch_enabled" to the search box.

As long as you type "searchsearch_en", there are matches. As you keep typing "abl", no more matches. When you type the trailing "ed", the matches come back.

Part of the story is that plenty of space characters are missing from search-data.json, e.g. whenever there's an explicit newline in markdown (to keep a sensible width of the source file), the two words become concatenated. That's how lines 14-15 of docs/configuration.md become "searchsearch_enabled", that's why searching for "abcdef" (present in ui-components/typography.md) doesn't give any match, etc.

Another part of the story is that the search engine must be buggy as well, for example not returning the match for the substring "searchsearch_ena". Another example: utilities/color.md contains the word "systemsized", this correctly appears in search_data.json, yet not found via the search box. It's quite easy to find other examples too.

@pmarsceill
Copy link
Collaborator

Yes, you're right there is some search weirdness. Honestly I was hoping to have someone who is more proficient in JavaScript / lunr.js help out here. I think the search engine itself works fine (I've tested it on other sites), but the way that the search index is getting generated and initialized is probably causing the majority of the problems.

@kendallmiller
Copy link

kendallmiller commented Dec 17, 2018

One quick fix I was able to do when it was erroring out on my content was to convert crlf, cr, lf, and tab to a single space:

    "content": "{{ page.content | markdownify | strip_html | xml_escape | remove: 'Table of contents' | normalize_whitespace | replace: "\r\n", " " | replace: "\r", " " | replace: "\n", " "  | replace: "\t", " " | replace: '\', ' ' }}",

The odd thing is that reading the documentation this seems to be what normalize_whitespace should do.

That improved the quality of matches, but it's still got issues - I'll take a peek at lunr next.

@kendallmiller
Copy link

I ended up somewhere interesting:

{  
  {% assign docId = 1 %}{% for page in site.html_pages %}{% unless page.search == "exclude" %}"{{ docId }}": {
    "id": "{{ docId }}",
    "title": "{{ page.title | xml_escape }}",
    "content": "{{ page.content | markdownify | strip_html | xml_escape | remove: 'Table of contents' | replace: "\r", " " | replace: "\n", " "  | replace: "\t", " " | normalize_whitespace |  replace: '\', ' ' }}",
    "url": "{{ page.url | absolute_url | xml_escape }}",
    "relUrl": "{{ page.url | xml_escape }}"
  }{% if forloop.last %}{% else %},
  {% endif %}{% assign docId = docId | plus: 1 %}{% endunless %}{% endfor %}
  {% for collection in site.collections %}{% unless collection.output == false %}{% for page in collection.docs %}{% unless page.search == "exclude" %}{% if forloop.first %},{% else %}{% endif %}"{{ docId }}": {
    "id": "{{ docId }}",
    "title": "{{ page.title | xml_escape }}",
    "content": "{{ page.content | markdownify | strip_html | xml_escape | remove: 'Table of contents' | replace: "\r", " " | replace: "\n", " "  | replace: "\t", " " | normalize_whitespace |  replace: '\', ' ' }}",
    "url": "{{ page.url | absolute_url | xml_escape }}",
    "relUrl": "{{ page.url | xml_escape }}"
  }{% if forloop.last %}{% else %},
  {% endif %}{% assign docId = docId | plus: 1 %}{% endunless %}{% endfor %}{% endunless %}{% endfor %}
}

In this case I made changes to:

  1. Include content from collections that are output. I use them a lot and they were otherwise not being included.
  2. I reordered the filtering so I swap in a space for CR, LF, and Tab then let normalize whitespace get rid of any consecutive whitespace.
  3. I changed the document Id methodology so it could handle going through the additional collections
  4. I couldn't see in the documentation what standard to use for excluding from documentation so I went with what I was already doing in my template (setting search = exclude on pages to skip). Season to taste.

@pmarsceill
Copy link
Collaborator

@kendallmiller thanks for your digging in here... I will take a look at this soon.

@washere
Copy link

washere commented Jan 12, 2019

Thanks for this theme. While we are waiting for some help on this issue, can we have an option for google search opening a new tab as in this theme:

https://phlow.github.io/feeling-responsive/search/

Would make this fantastic theme be deployed by many as there would be no more serious issues.

Thanks again.

@pmarsceill
Copy link
Collaborator

pmarsceill commented Jan 15, 2019

I think i have solved the search bugs in #52 ... I will open an issue to use collections as a separate work stream to not hold up v0.2.2 from being released.

@pmarsceill pmarsceill unpinned this issue Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants