-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom word boundary #2896
Custom word boundary #2896
Conversation
I'm fairly certain this will solve at least this open issue / bug: |
# Conflicts: # public/options.html
@Mikk3lRo Can you squash this down to one commit? Instructions here. |
Add Test for search_word_boundary Add search_word_boundary to options page
@@ -27,6 +27,7 @@ class AbstractChosen | |||
@enable_split_word_search = if @options.enable_split_word_search? then @options.enable_split_word_search else true | |||
@group_search = if @options.group_search? then @options.group_search else true | |||
@search_contains = @options.search_contains || false | |||
@search_word_boundary = if @options.search_word_boundary? then @options.search_word_boundary else '^|\\s|\\b' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can’t this be just;
@search_word_boundary = options.search_word_boundary || '^|\\s|\\b'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the option should not necessarily include “search” in its name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not logical to prefix search related options with "search"?
And I'm pretty sure the question mark is needed, otherwise the empty string will evaluate to false and the default value will be used. The empty string is a perfectly valid value for this option... if set to an empty string Chosen works like search_contains=true
, but allowing enable_split_word_search
to decide whether to only match the beginning of the option - see linked issue for a relevant use case.
<td>^|\\b|\\s</td> | ||
<td>By default, Chosen uses JS RegExp's built-in word boundary to detect word beginnings as well as whitespace or the beginning of the entire label. That works great for ascii-only languages, but <strong>will</strong> erroneously detect word boundaries after letters with umlauts among many, many others.<br>You can pass a string (that will be interpreted as part of a <code class="language-javascript">RegExp</code>) refined for your language and use case to correctly detect word boundaries. A (simplified) example could be <code class="language-javascript">'^|[^A-zæøåÆØÅ]'</code> for Danish.</td> | ||
</tr> | ||
<tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now in here twice
@@ -247,7 +247,7 @@ describe "Searching", -> | |||
|
|||
expect(div.find(".active-result").length).toBe(1) | |||
expect(div.find(".active-result")[0].innerHTML).toBe("oh <em>h</em>ello") | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; unrelated change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... not eactly a git wiz here, and the squahing turned out to not be as easy as the instuctions suggested due to a conflict on the options page. I'm just about to give up and start a new branch instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok the whole thing is a mess now. I have no idea how to recover. Can I start a new branch and pull request?
# This is the 1st commit message: Add Option search_word_boundary Add Test for search_word_boundary Add search_word_boundary to options page Add new option: search_word_boundary Add Test for search_word_boundary Add search_word_boundary to options page # This is the commit message harvesthq#2: Cleaned unintentional, unrelated change. # This is the commit message harvesthq#3: Add Option search_word_boundary # This is the commit message harvesthq#4: Add Test for search_word_boundary
…osen into custom-word-boundary
Sorry, but this is too messed up now. Some stupid conflict going around itself in circles. |
Summary
The default / current word boundary regex
^|\\b|\\s
is remarkably bad at actually finding word boundaries in languages that use non-unicode characters. A word boundary is basically detected after any non-ascii character (fx. ü, å, ø and æ to mention just a few - but there are MANY).I've looked into possibilities, and unfortunately there doesn't seem to be any way to get decent word-boundary detection for anything except ascii in javascripts RegExp implementation... without either using a third-party library or including some 4k+ characters in the string.
Therefore, I don't see any way to reliably detect word boundaries with any pre-set, hardcoded regex.
Turning it into an option means that people can at least set something appropriate for their individual language and / or use case if they care about word boundaries being detected in "weird" places.
Please double-check that:
package.json
.References
First partial PR from: #2894