New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support grapheme detection via \X #4568

Closed
janlelis opened this Issue Apr 20, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@janlelis

janlelis commented Apr 20, 2017

JRuby should support matching "grapheme clusters" (glyphs), which are constructed using mutliple Unicode codepoints.

Expected Behavior (MRI)

glyphs = "\u{61 308 62}".scan(/\X/) # => ["ä", "b"]`
glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61", "308"], ["62"]]

Actual Behavior (JRuby)

glyphs = "\u{61 308 62}".scan(/\X/) # =>  ["a", "b"]`
glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61"], ["62"]]

Related Links

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 28, 2017

Member

Given that this does not appear to be fully baked in MRI 2.3 I think this is safe to defer to our 2.3-compatible release in 9.2.0.0.

Member

headius commented Nov 28, 2017

Given that this does not appear to be fully baked in MRI 2.3 I think this is safe to defer to our 2.3-compatible release in 9.2.0.0.

@lopex

This comment has been minimized.

Show comment
Hide comment
@lopex

lopex Dec 30, 2017

Member

This should work ootb with new joni

Member

lopex commented Dec 30, 2017

This should work ootb with new joni

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Jan 25, 2018

Member

@lopex I forget our status here. We can move 9.1 to the new joni, yes?

Member

headius commented Jan 25, 2018

@lopex I forget our status here. We can move 9.1 to the new joni, yes?

@headius headius modified the milestones: JRuby 9.2.0.0, JRuby 9.1.16.0 Feb 13, 2018

@headius headius closed this Feb 13, 2018

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 13, 2018

Member

Works in 9.1.16.0.

irb(main):001:0> glyphs = "\u{61 308 62}".scan(/\X/) # => ["ä", "b"]`
(irb):1: warning: character class has duplicated range
=> ["ä", "b"]
irb(main):002:0> glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61", "308"], ["62"]]
=> [["61", "308"], ["62"]]
Member

headius commented Feb 13, 2018

Works in 9.1.16.0.

irb(main):001:0> glyphs = "\u{61 308 62}".scan(/\X/) # => ["ä", "b"]`
(irb):1: warning: character class has duplicated range
=> ["ä", "b"]
irb(main):002:0> glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61", "308"], ["62"]]
=> [["61", "308"], ["62"]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment