New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lunr-gr.js and fix lunr-en.js #1445

Merged
merged 3 commits into from Jan 3, 2018

Conversation

Projects
None yet
2 participants
@nickgarlis
Contributor

nickgarlis commented Jan 3, 2018

screenshot from 2018-01-03 18-12-40

@mmistakes

This comment has been minimized.

Show comment
Hide comment
@mmistakes

mmistakes Jan 3, 2018

Owner

This looks good to me. Question on stemmers. Is it your intention to add ones from that repo you shared before? Or does it make sense to provide some documentation and how to build one, and then others provide PR's or their own as needed?

Owner

mmistakes commented Jan 3, 2018

This looks good to me. Question on stemmers. Is it your intention to add ones from that repo you shared before? Or does it make sense to provide some documentation and how to build one, and then others provide PR's or their own as needed?

@nickgarlis

This comment has been minimized.

Show comment
Hide comment
@nickgarlis

nickgarlis Jan 3, 2018

Contributor

@mmistakes It would make sense but I personally find stemmers kind of complicated to make. I forgot to mention that I got the Greek one from this repository. I am planning on adding the ones from lunr-languages however, I haven't tried using them yet.

Contributor

nickgarlis commented Jan 3, 2018

@mmistakes It would make sense but I personally find stemmers kind of complicated to make. I forgot to mention that I got the Greek one from this repository. I am planning on adding the ones from lunr-languages however, I haven't tried using them yet.

@mmistakes mmistakes merged commit 6fe3b9c into mmistakes:master Jan 3, 2018

@mmistakes

This comment has been minimized.

Show comment
Hide comment
@mmistakes

mmistakes Jan 3, 2018

Owner

I'm going to partially roll this one back. If the user sets their locale to one we don't support (which currently is a lot except en and gr) this logic is going to try and load a JavaScript file that doesn't exist. Breaking search.

<script src="{{ '/assets/js/lunr-' | append: lang | append: '.js' | absolute_url }}"></script>

Example:

locale: es-ES

Will produce and fail.

<script src="{{ '/assets/js/lunr-es.js' }}"></script>

Not sure if there is a clean way of only appending lang for stemmed versions of Lunr that exist in the theme... and falling back to -en if the user has a locale set for one that doesn't.

Might have to go back to drawing board on this one. How bad is Lunr's search without using any of the language/stemm stuff? Can we get away with just going vanilla?

The theme does support overriding the scripts, so in theory we provide a basic English version. And if you're competent enough you build your own language version as you have and load that instead.

I'm most interested in making the theme as simple as possible. It should be something where you install the theme and that's it. Anything involving complicated configs and other nonsense is out. I've already failed at that in a few places... cough cough archives.

Owner

mmistakes commented Jan 3, 2018

I'm going to partially roll this one back. If the user sets their locale to one we don't support (which currently is a lot except en and gr) this logic is going to try and load a JavaScript file that doesn't exist. Breaking search.

<script src="{{ '/assets/js/lunr-' | append: lang | append: '.js' | absolute_url }}"></script>

Example:

locale: es-ES

Will produce and fail.

<script src="{{ '/assets/js/lunr-es.js' }}"></script>

Not sure if there is a clean way of only appending lang for stemmed versions of Lunr that exist in the theme... and falling back to -en if the user has a locale set for one that doesn't.

Might have to go back to drawing board on this one. How bad is Lunr's search without using any of the language/stemm stuff? Can we get away with just going vanilla?

The theme does support overriding the scripts, so in theory we provide a basic English version. And if you're competent enough you build your own language version as you have and load that instead.

I'm most interested in making the theme as simple as possible. It should be something where you install the theme and that's it. Anything involving complicated configs and other nonsense is out. I've already failed at that in a few places... cough cough archives.

@mmistakes

This comment has been minimized.

Show comment
Hide comment
@mmistakes

mmistakes Jan 4, 2018

Owner

Came up with something by using Liquid's case to control the value of lang. Will just need to add more cases as languages are added. When one doesn't exist it defaults back to en.

{% if site.search == true or page.layout == "search" %}
  {% assign lang = site.locale | slice: 0,2 | default: "en" %}
  {% case lang %}
  {% when "gr" %}
    {% assign lang = "gr" %}
  {% else %}
    {% assign lang = "en" %}
  {% endcase %}
  <script src="{{ '/assets/js/lunr.min.js' | absolute_url }}"></script>
  <script src="{{ '/assets/js/lunr-' | append: lang | append: '.js' | absolute_url }}"></script>
{% endif %}
Owner

mmistakes commented Jan 4, 2018

Came up with something by using Liquid's case to control the value of lang. Will just need to add more cases as languages are added. When one doesn't exist it defaults back to en.

{% if site.search == true or page.layout == "search" %}
  {% assign lang = site.locale | slice: 0,2 | default: "en" %}
  {% case lang %}
  {% when "gr" %}
    {% assign lang = "gr" %}
  {% else %}
    {% assign lang = "en" %}
  {% endcase %}
  <script src="{{ '/assets/js/lunr.min.js' | absolute_url }}"></script>
  <script src="{{ '/assets/js/lunr-' | append: lang | append: '.js' | absolute_url }}"></script>
{% endif %}

mmistakes added a commit that referenced this pull request Jan 4, 2018

@nickgarlis

This comment has been minimized.

Show comment
Hide comment
@nickgarlis

nickgarlis Jan 4, 2018

Contributor

@mmistakes Oops I forgot to set the default value.... Your solution looks great to me I totally missed that scenario.

As far as languages are concerned, Lunr doesn't work at all with Greek or any other non-latin alphabet by default. Languages that use latin characters might work but not well enough since they are most likely to use some extra characters too.

Contributor

nickgarlis commented Jan 4, 2018

@mmistakes Oops I forgot to set the default value.... Your solution looks great to me I totally missed that scenario.

As far as languages are concerned, Lunr doesn't work at all with Greek or any other non-latin alphabet by default. Languages that use latin characters might work but not well enough since they are most likely to use some extra characters too.

@nickgarlis

This comment has been minimized.

Show comment
Hide comment
@nickgarlis

nickgarlis Jan 6, 2018

Contributor

@mmistakes Ok so while I was experimenting with Russian, I came to the conclusion that Stemmers, are useful but not as necessary as I presented them to be. So my sincere apologies for that. I will try to explain what is going on.

First of all it was Lunr trimmer that blocked non-latin characters.

While we were using idx.search(query) Stemmers were kind of necessary, but now that we use idx.query things are different. I replaced idx.search(query) because after the upgrade to the newest version of Lunr wildcards were not automatically added. That means that simply typing 'c' would give no results anymore. We would either have to type in a whole word for something to appear e.g. "Cookies", or the root of that word which would be "Cook". Root searching only works thanks to stemming.

Things are a bit different now because we use

idx.query(function (q) {
  query.split(lunr.tokenizer.separator).forEach(function (term) {
    q.term(term, { boost: 100 })
       if(query.lastIndexOf(" ") != query.length-1){
        q.term(term, {  usePipeline: false, wildcard: lunr.Query.wildcard.TRAILING, boost: 10 })
       }
       if (term != ""){
         q.term(term, {  usePipeline: false, editDistance: 1, boost: 1 })
       }
  })
});

Which means that:

  • we give great priority to exact word or root matching q.term(term, { boost: 100 }) .
  • we give some priority to a query that starts with the same characters with a string in our dataset by using a trailing wildcard q.term(term, { usePipeline: false, wildcard: lunr.Query.wildcard.TRAILING, boost: 10 })( Can often be inaccurate that's why it's inside of an if statement)
  • we give a little priority to a query that needs one edit to match with a word in our dataset q.term(term, { usePipeline: false, editDistance: 1, boost: 1 }) e.g "ede" with "edge".

Simply put, stemming is only useful in exact matching q.term(term, { boost: 100 }) because that way stemmed words get greater priority but other than that wildcards work okay for other languages now that Lunr trimmer is removed. I am still willing to work on other languages but I thought this is something you wanted to know.

Contributor

nickgarlis commented Jan 6, 2018

@mmistakes Ok so while I was experimenting with Russian, I came to the conclusion that Stemmers, are useful but not as necessary as I presented them to be. So my sincere apologies for that. I will try to explain what is going on.

First of all it was Lunr trimmer that blocked non-latin characters.

While we were using idx.search(query) Stemmers were kind of necessary, but now that we use idx.query things are different. I replaced idx.search(query) because after the upgrade to the newest version of Lunr wildcards were not automatically added. That means that simply typing 'c' would give no results anymore. We would either have to type in a whole word for something to appear e.g. "Cookies", or the root of that word which would be "Cook". Root searching only works thanks to stemming.

Things are a bit different now because we use

idx.query(function (q) {
  query.split(lunr.tokenizer.separator).forEach(function (term) {
    q.term(term, { boost: 100 })
       if(query.lastIndexOf(" ") != query.length-1){
        q.term(term, {  usePipeline: false, wildcard: lunr.Query.wildcard.TRAILING, boost: 10 })
       }
       if (term != ""){
         q.term(term, {  usePipeline: false, editDistance: 1, boost: 1 })
       }
  })
});

Which means that:

  • we give great priority to exact word or root matching q.term(term, { boost: 100 }) .
  • we give some priority to a query that starts with the same characters with a string in our dataset by using a trailing wildcard q.term(term, { usePipeline: false, wildcard: lunr.Query.wildcard.TRAILING, boost: 10 })( Can often be inaccurate that's why it's inside of an if statement)
  • we give a little priority to a query that needs one edit to match with a word in our dataset q.term(term, { usePipeline: false, editDistance: 1, boost: 1 }) e.g "ede" with "edge".

Simply put, stemming is only useful in exact matching q.term(term, { boost: 100 }) because that way stemmed words get greater priority but other than that wildcards work okay for other languages now that Lunr trimmer is removed. I am still willing to work on other languages but I thought this is something you wanted to know.

@mmistakes

This comment has been minimized.

Show comment
Hide comment
@mmistakes

mmistakes Jan 6, 2018

Owner

That makes sense. I'm cool simplifying things and leaving out the stemmers.

Search is a tricky thing which is why I held off for a long time adding it, knowing there would be a lot of nuance to it and/or users having different needs and expectations with it.

Whatever we can do to give a good vanilla experience that covers the basics and worry less about edge cases I'm all for that. If someone wants to enhance search and tweak the priorities they can do that by overriding the scripts or using one of the other search providers we'll eventually support.

Owner

mmistakes commented Jan 6, 2018

That makes sense. I'm cool simplifying things and leaving out the stemmers.

Search is a tricky thing which is why I held off for a long time adding it, knowing there would be a lot of nuance to it and/or users having different needs and expectations with it.

Whatever we can do to give a good vanilla experience that covers the basics and worry less about edge cases I'm all for that. If someone wants to enhance search and tweak the priorities they can do that by overriding the scripts or using one of the other search providers we'll eventually support.

@nickgarlis

This comment has been minimized.

Show comment
Hide comment
@nickgarlis

nickgarlis Jan 6, 2018

Contributor

@mmistakes I guess we can start messing with Algolia now

Contributor

nickgarlis commented Jan 6, 2018

@mmistakes I guess we can start messing with Algolia now

@mmistakes

This comment has been minimized.

Show comment
Hide comment
@mmistakes

mmistakes Jan 6, 2018

Owner

@nickgarlis I'll need a good rainy day for that 😄. I tried quickly to test it out with my own personal site, but it's so big it hit their free tier limits so I gave up.

Owner

mmistakes commented Jan 6, 2018

@nickgarlis I'll need a good rainy day for that 😄. I tried quickly to test it out with my own personal site, but it's so big it hit their free tier limits so I gave up.

@nickgarlis

This comment has been minimized.

Show comment
Hide comment
@nickgarlis

nickgarlis Jan 6, 2018

Contributor

@mmistakes hahahah thankfully I have got plenty of those where I live 😜

Contributor

nickgarlis commented Jan 6, 2018

@mmistakes hahahah thankfully I have got plenty of those where I live 😜

ihexon pushed a commit to ihexon/ihexon.github.io that referenced this pull request Jul 16, 2018

ihexon pushed a commit to ihexon/ihexon.github.io that referenced this pull request Jul 16, 2018

ihexon pushed a commit to ihexon/ihexon.github.io that referenced this pull request Jul 19, 2018

ihexon pushed a commit to ihexon/ihexon.github.io that referenced this pull request Jul 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment