Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support u flag for JavaScript #7

Open
danny0838 opened this issue Nov 19, 2022 · 11 comments
Open

Support u flag for JavaScript #7

danny0838 opened this issue Nov 19, 2022 · 11 comments

Comments

@danny0838
Copy link

The code:

console.warn(JSON.stringify(Regex.Analyzer(/\u{20000}/u).tree(), null, 2))

throws an error as the \u{XXXXX} is not supported when the u flag is used.

@foo123
Copy link
Owner

foo123 commented Nov 19, 2022

Update to 1.2.0 (js only) takes some care of this issue, but I am not sure if something else is needed.
Take a look. I leave this open.

@danny0838
Copy link
Author

/\u{2}/u seems to throw an error.

@danny0838
Copy link
Author

Something like /\p{Punctuation}/u need to be implemented.

@danny0838
Copy link
Author

danny0838 commented Nov 19, 2022

Value of char for /\u{20000}/u is not correct. It should be a UTF-16 surrogate pair \uD840\uDC00, which can be get from String.fromCodePoint(0x20000).

Browsers that supports the unicode flag seems to support String.fromCodePoint. A polyfill may be required if this library is intended to work on a JavaScript engine that doesn't support it.

@danny0838
Copy link
Author

Regex.Analyzer(/\u{20000}/u).compile() should be /\u{20000}/u rather than /\u20000/u.

@danny0838
Copy link
Author

danny0838 commented Nov 19, 2022

When the unicode flag is not set, anything like /\u{2}/ should be treated as a literal u and a quantifier {2}.

See doc for more syntax details.

@foo123
Copy link
Owner

foo123 commented Nov 19, 2022

new upload of v.1.2.0

/\u{61}/u
{
  "type": 1,
  "val": [
    {
      "type": 32,
      "val": "u{61}",
      "flags": {
        "Char": "a",
        "Code": "61",
        "UnicodePoint": true
      },
      "typeName": "UnicodeChar"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}
/\u{61}/
{
  "type": 1,
  "val": [
    {
      "type": 16,
      "val": {
        "type": 1024,
        "val": "u",
        "flags": {},
        "typeName": "String"
      },
      "flags": {
        "val": "{61}",
        "MatchMinimum": "61",
        "MatchMaximum": "61",
        "min": 61,
        "max": 61,
        "StartRepeats": 1,
        "isGreedy": 1
      },
      "typeName": "Quantifier"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}

When the unicode flag is not set, anything like /\u{2}/ should be treated as a literal u and a quantifier {2}.

Fixed

Regex.Analyzer(/\u{20000}/u).compile() should be /\u{20000}/u rather than /\u20000/u.

Fixed

Value of char for /\u{20000}/u is not correct. It should be a UTF-16 surrogate pair \uD840\uDC00, which can be get from String.fromCodePoint(0x20000).

Fixed

Something like /\p{Punctuation}/u need to be implemented.

Only on a major update, not anytime soon

@danny0838
Copy link
Author

/\u{2}/u seems not correctly treated as a unicode char.

@danny0838
Copy link
Author

danny0838 commented Nov 19, 2022

The unicode flag changes a behavior that an incomplete unicode sequence like /\x/u, /\x3/u, /\u/u, or /\u30/u throws.

Also a character group like /[\W-3]/u will be invalid. (See doc for more syntax details.)

Not sure if you are going to implement it.

@foo123
Copy link
Owner

foo123 commented Nov 19, 2022

/\u{2}/u seems not correctly treated as a unicode char.

Fixed

/\u{2}/u
"\\u{2}"
{
  "type": 1,
  "val": [
    {
      "type": 32,
      "val": "u{2}",
      "flags": {
        "Char": "\u0002",
        "Code": "2",
        "UnicodePoint": true
      },
      "typeName": "UnicodeChar"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}
/\u{61}/u
"\\u{61}"
{
  "type": 1,
  "val": [
    {
      "type": 32,
      "val": "u{61}",
      "flags": {
        "Char": "a",
        "Code": "61",
        "UnicodePoint": true
      },
      "typeName": "UnicodeChar"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}
/\u{61}/
"u{61}"
{
  "type": 1,
  "val": [
    {
      "type": 16,
      "val": {
        "type": 1024,
        "val": "u",
        "flags": {},
        "typeName": "String"
      },
      "flags": {
        "val": "{61}",
        "MatchMinimum": "61",
        "MatchMaximum": "61",
        "min": 61,
        "max": 61,
        "StartRepeats": 1,
        "isGreedy": 1
      },
      "typeName": "Quantifier"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}

@danny0838
Copy link
Author

Something like /\p{Punctuation}/u need to be implemented.

Only on a major update, not anytime soon

Maybe we can implement a quick support that simply creates a corresponding node with the provided value (that is, without checking whether it's really valid)? The syntax can be found in the doc. So that developers can use the library to analyze a regex with such syntax without error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants