New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[[:punct:]] and \p{Punct} #42

Closed
k-takata opened this Issue Aug 9, 2014 · 4 comments

Comments

Projects
None yet
2 participants
@k-takata
Owner

k-takata commented Aug 9, 2014

Perl's document (perlrecharclass) says that:

\p{PosixPunct} and [[:punct:]] in the ASCII range match all non-controls, non-alphanumeric, non-space characters: [-!"#$%&'()*+,./:;<=>?@[\\\]^_{|}~]`

The similarly named property, \p{Punct} , matches a somewhat different set in the ASCII range, namely [-!"#%&'()*,./:;?@[\\\]_{}]. That is, it is missing the nine characters [$+<=>^|~]`.

In current Onigmo, [[:punct:]] and \p{Punct} is the same in the ASCII range and they depend on the encoding.
If the encoding is Unicode encoding, [[:punct:]] and \p{Punct} don't match the nine characters.
If the encoding is not Unicode encoding, [[:punct:]] and \p{Punct} match the nine characters.

Is it OK?

@k-takata k-takata added the spec label Aug 9, 2014

@tom-lord

This comment has been minimized.

Show comment
Hide comment
@tom-lord

tom-lord Feb 24, 2015

Contributor

I think this is wrong; both Unicode and non-Unicode should match the nine characters.

http://search.cpan.org/~shay/perl-5.20.2/pod/perlreref.pod

I believe the difference should actually be that under Unicode enoding, [[:punct:]] should additionally match non-ASCII punctuation. The "symbols" ($+<=>^|~`) should always be matched.

Contributor

tom-lord commented Feb 24, 2015

I think this is wrong; both Unicode and non-Unicode should match the nine characters.

http://search.cpan.org/~shay/perl-5.20.2/pod/perlreref.pod

I believe the difference should actually be that under Unicode enoding, [[:punct:]] should additionally match non-ASCII punctuation. The "symbols" ($+<=>^|~`) should always be matched.

k-takata added a commit that referenced this issue Oct 15, 2016

Support XPosixPunct (Issue #42)
Now /(?u)[[:punct:]]/ and /\p{XPosixPunct}/ have the same meaning when
Unicode encodings are used. On the other hand, /\p{Punct}/ is not
changed.

    /(?u)[[:punct:]]/ == /\p{XPosixPunct}/ == /[\p{Punct}$+<=>^`|~]/

\p{XPosixPunct} can be used only with Unicode encodings. For other
encodings, /[[:punct:]]/ is the same with /\p{Punct}/. They both
includes the nine characters: "$+<=>^`|~".
@k-takata

This comment has been minimized.

Show comment
Hide comment
@k-takata

k-takata Oct 19, 2016

Owner

I have decided to change the behavior of [[:punct:]] on Unicode encodings, and already committed into devel-6.0 branch.
Now [[:punct:]] matches the nine characters $+<=>^`|~ on all encodings.
New property \p{XPosixPunct} can be used on Unicode encodings. This is the same as (?u)[[:punct:]].
However \p{Punct} still works differently on Unicode encodings and non-Unicode encodings. It matches the nine characters on non-Unicode encodings, and doesn't match on Unicode encodings.

Owner

k-takata commented Oct 19, 2016

I have decided to change the behavior of [[:punct:]] on Unicode encodings, and already committed into devel-6.0 branch.
Now [[:punct:]] matches the nine characters $+<=>^`|~ on all encodings.
New property \p{XPosixPunct} can be used on Unicode encodings. This is the same as (?u)[[:punct:]].
However \p{Punct} still works differently on Unicode encodings and non-Unicode encodings. It matches the nine characters on non-Unicode encodings, and doesn't match on Unicode encodings.

@k-takata

This comment has been minimized.

Show comment
Hide comment
@k-takata

k-takata Oct 19, 2016

Owner

Closing.

Owner

k-takata commented Oct 19, 2016

Closing.

@k-takata k-takata closed this Oct 19, 2016

@k-takata

This comment has been minimized.

Show comment
Hide comment
@k-takata
Owner

k-takata commented Dec 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment