Custom word boundary #2896

Mikk3lRo · 2017-10-02T19:28:37Z

Summary

The default / current word boundary regex ^|\\b|\\s is remarkably bad at actually finding word boundaries in languages that use non-unicode characters. A word boundary is basically detected after any non-ascii character (fx. ü, å, ø and æ to mention just a few - but there are MANY).

I've looked into possibilities, and unfortunately there doesn't seem to be any way to get decent word-boundary detection for anything except ascii in javascripts RegExp implementation... without either using a third-party library or including some 4k+ characters in the string.

Therefore, I don't see any way to reliably detect word boundaries with any pre-set, hardcoded regex.

Turning it into an option means that people can at least set something appropriate for their individual language and / or use case if they care about word boundaries being detected in "weird" places.

Please double-check that:

All changes were made in CoffeeScript files, not JavaScript files.
You used Grunt to build the JavaScript files and tested them locally.
You've updated both the jQuery and Prototype versions.
You haven't manually updated the version number in package.json.
If necessary, you've updated the documentation.

References

First partial PR from: #2894

Mikk3lRo · 2017-10-02T19:33:15Z

I'm fairly certain this will solve at least this open issue / bug:
#2862

# Conflicts: # public/options.html

tjschuck · 2017-10-02T21:00:51Z

@Mikk3lRo Can you squash this down to one commit? Instructions here.

Add Test for search_word_boundary Add search_word_boundary to options page

koenpunt · 2017-10-02T22:01:44Z

coffee/lib/abstract-chosen.coffee

@@ -27,6 +27,7 @@ class AbstractChosen
    @enable_split_word_search = if @options.enable_split_word_search? then @options.enable_split_word_search else true
    @group_search = if @options.group_search? then @options.group_search else true
    @search_contains = @options.search_contains || false
+    @search_word_boundary = if @options.search_word_boundary? then @options.search_word_boundary else '^|\\s|\\b'


Can’t this be just;

@search_word_boundary = options.search_word_boundary || '^|\\s|\\b'

?

Also, the option should not necessarily include “search” in its name

Is it not logical to prefix search related options with "search"?

And I'm pretty sure the question mark is needed, otherwise the empty string will evaluate to false and the default value will be used. The empty string is a perfectly valid value for this option... if set to an empty string Chosen works like search_contains=true, but allowing enable_split_word_search to decide whether to only match the beginning of the option - see linked issue for a relevant use case.

koenpunt · 2017-10-02T22:03:21Z

public/options.html

+          <td>^|\\b|\\s</td>
+          <td>By default, Chosen uses JS RegExp's built-in word boundary to detect word beginnings as well as whitespace or the beginning of the entire label. That works great for ascii-only languages, but <strong>will</strong> erroneously detect word boundaries after letters with umlauts among many, many others.<br>You can pass a string (that will be interpreted as part of a <code class="language-javascript">RegExp</code>) refined for your language and use case to correctly detect word boundaries. A (simplified) example could be <code class="language-javascript">'^|[^A-zæøåÆØÅ]'</code> for Danish.</td>
+        </tr>
+        <tr>


This is now in here twice

koenpunt · 2017-10-02T22:03:51Z

spec/jquery/searching.spec.coffee

@@ -247,7 +247,7 @@ describe "Searching", ->

    expect(div.find(".active-result").length).toBe(1)
    expect(div.find(".active-result")[0].innerHTML).toBe("oh <em>h</em>ello")
-
+    


Nit; unrelated change

Yeah... not eactly a git wiz here, and the squahing turned out to not be as easy as the instuctions suggested due to a conflict on the options page. I'm just about to give up and start a new branch instead.

Ok the whole thing is a mess now. I have no idea how to recover. Can I start a new branch and pull request?

# This is the 1st commit message: Add Option search_word_boundary Add Test for search_word_boundary Add search_word_boundary to options page Add new option: search_word_boundary Add Test for search_word_boundary Add search_word_boundary to options page # This is the commit message harvesthq#2: Cleaned unintentional, unrelated change. # This is the commit message harvesthq#3: Add Option search_word_boundary # This is the commit message harvesthq#4: Add Test for search_word_boundary

…osen into custom-word-boundary

Mikk3lRo · 2017-10-02T23:11:47Z

Sorry, but this is too messed up now. Some stupid conflict going around itself in circles.

Mikk3lRo added 3 commits October 2, 2017 21:06

Add Option search_word_boundary

559813f

Add Test for search_word_boundary

2cf0c61

Add search_word_boundary to options page

869d96b

Merge branch 'master' of https://github.com/harvesthq/chosen into HEAD

47962d9

# Conflicts: # public/options.html

Add new option: search_word_boundary

1bae7c5

Add Test for search_word_boundary Add search_word_boundary to options page

koenpunt reviewed Oct 2, 2017

View reviewed changes

Mikk3lRo added 3 commits October 3, 2017 00:32

Merge branch 'custom-word-boundary' of https://github.com/Mikk3lRo/ch…

867610e

…osen into custom-word-boundary

Trying again.

ddb6221

Mikk3lRo closed this Oct 2, 2017

Mikk3lRo mentioned this pull request Oct 2, 2017

Add new option: search_word_boundary #2898

Open

5 tasks

Mikk3lRo deleted the custom-word-boundary branch October 7, 2017 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom word boundary #2896

Custom word boundary #2896

Mikk3lRo commented Oct 2, 2017

Mikk3lRo commented Oct 2, 2017

tjschuck commented Oct 2, 2017

koenpunt Oct 2, 2017

koenpunt Oct 2, 2017

Mikk3lRo Oct 2, 2017

koenpunt Oct 2, 2017

koenpunt Oct 2, 2017

Mikk3lRo Oct 2, 2017 •

edited

Mikk3lRo Oct 2, 2017

Mikk3lRo commented Oct 2, 2017

		@@ -247,7 +247,7 @@ describe "Searching", ->

		expect(div.find(".active-result").length).toBe(1)
		expect(div.find(".active-result")[0].innerHTML).toBe("oh <em>h</em>ello")

Custom word boundary #2896

Custom word boundary #2896

Conversation

Mikk3lRo commented Oct 2, 2017

Summary

References

Mikk3lRo commented Oct 2, 2017

tjschuck commented Oct 2, 2017

koenpunt Oct 2, 2017

Choose a reason for hiding this comment

koenpunt Oct 2, 2017

Choose a reason for hiding this comment

Mikk3lRo Oct 2, 2017

Choose a reason for hiding this comment

koenpunt Oct 2, 2017

Choose a reason for hiding this comment

koenpunt Oct 2, 2017

Choose a reason for hiding this comment

Mikk3lRo Oct 2, 2017 • edited

Choose a reason for hiding this comment

Mikk3lRo Oct 2, 2017

Choose a reason for hiding this comment

Mikk3lRo commented Oct 2, 2017

Mikk3lRo Oct 2, 2017 •

edited