ECMA 262: \d should only match ASCII digits #64

fdutton · 2023-04-30T21:26:26Z

Given this pattern ^\d$

This should match: 0

And this should not: ߀

The text was updated successfully, but these errors were encountered:

enebo · 2023-05-01T16:45:54Z

@fdutton on JRuby we behave as you describe. So something with our encodings will not match ߀ but does match 0. I am guessing you are using joni as a Java library so perhaps there is something config/call-wise which does behave this way?

Any extra info and we can try and figure out why we work and if we really are working how we get that result.

enebo · 2023-05-01T16:48:06Z

It looks like Ruby(JRuby) restricts numerics to only be ASCII explicitly: https://github.com/jruby/joni/blob/master/src/org/joni/Syntax.java#L459

fdutton · 2023-05-01T18:17:01Z

I'll write some unit-tests but this is what I am doing to work around the issue.

// Joni is too liberal on some constructs
String s = regex
    .replace("\\d", "[0-9]")
    .replace("\\D", "[^0-9]")
    .replace("\\w", "[a-zA-Z0-9_]")
    .replace("\\W", "[^a-zA-Z0-9_]")
    .replace("\\s", "[ \\f\\n\\r\\t\\v\\u00a0\\u1680\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff]")
    .replace("\\S", "[^ \\f\\n\\r\\t\\v\\u00a0\\u1680\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff]");

byte[] bytes = s.getBytes(StandardCharsets.UTF_8);
this.pattern = new Regex(bytes, 0, bytes.length, Option.NONE, UTF8Encoding.INSTANCE, Syntax.ECMAScript);

enebo · 2023-05-01T18:43:26Z

@fdutton I don't know where oniguruma repo is but you could check to see if syntax for ECMAScript was updated "up stream". We tend to look at the onigmo fork using by C Ruby but we are pretty far down stream. Perhaps there is a more up to date syntax?

lopex · 2023-05-01T19:33:23Z

@enebo I think we are still on par wrt regexp functionality. We've been tracking https://github.com/k-takata/Onigmo/graphs/contributors and there's not a lot of activity there. There's been more changes in MRI codebase lately though.

lopex · 2023-05-01T19:37:19Z

There also doesnt seem to be ecma syntax in neither Onigmo or MRI repository.

fdutton changed the title ~~[Question] Is a more recent ECMA 262 syntax supported?~~ ECMA 262: \d should only match ASCII digits May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECMA 262: \d should only match ASCII digits #64

ECMA 262: \d should only match ASCII digits #64

fdutton commented Apr 30, 2023 •

edited

Loading

enebo commented May 1, 2023

enebo commented May 1, 2023

fdutton commented May 1, 2023

enebo commented May 1, 2023

lopex commented May 1, 2023

lopex commented May 1, 2023

ECMA 262: \d should only match ASCII digits #64

ECMA 262: \d should only match ASCII digits #64

Comments

fdutton commented Apr 30, 2023 • edited Loading

enebo commented May 1, 2023

enebo commented May 1, 2023

fdutton commented May 1, 2023

enebo commented May 1, 2023

lopex commented May 1, 2023

lopex commented May 1, 2023

fdutton commented Apr 30, 2023 •

edited

Loading