Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide what to do for \b and \B #1

Closed
mathiasbynens opened this issue Aug 26, 2014 · 2 comments
Closed

Decide what to do for \b and \B #1

mathiasbynens opened this issue Aug 26, 2014 · 2 comments

Comments

@mathiasbynens
Copy link
Owner

mathiasbynens commented Aug 26, 2014

http://unicode.org/reports/tr18/#Simple_Word_Boundaries or http://unicode.org/reports/tr18/#Default_Word_Boundaries? Or something else entirely?


See the UTS#18 <word_character> production:

The class of <word_character> includes all the Alphabetic values from the Unicode character database, from UnicodeData.txt [UData], plus the decimals (General_Category=Decimal_Number, or equivalently Numeric_Type=Decimal), and the U+200C ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER (Join_Control=True).

However, http://unicode.org/reports/tr18/#b says:

If there is a requirement that \b align with \w, then it would use the approximation above instead.

@patch
Copy link

patch commented Sep 12, 2014

For compatibility with other Unicode regex engines, I think \b and \B should probably operate on simple word boundaries. What's most important though is that \b should operate on the same level as \w, so if \w matches a Unicode word character then \b should match on the boundary of a word consisting of Unicode word characters, but if \w only matches an ASCII word character then \b should also match on the boundary of a word consisting of ASCII word characters. Default word boundaries could then be implemented as \b{w} and \B{w} as well as extended grapheme cluster boundaries with \b{g}, sentence boundaries with \b{s}, etc.

Details on the \b{…} syntax:
http://unicode.org/reports/tr18/#Default_Grapheme_Clusters

@mathiasbynens
Copy link
Owner Author

This is not gonna happen as per https://esdiscuss.org/topic/questions-regarding-es6-unicode-regular-expressions#content-5. Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants