Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-ascii case folding within i modifier #90

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

JLHwung
Copy link
Collaborator

@JLHwung JLHwung commented Sep 30, 2023

Fixes #79
Closes #80

Most test cases are inherited from #80 while I also commented some differences.

@stulov Thank you for your work, which is very helpful.

'pattern': '(?i:[є-ґ])',
'options': { modifiers: 'transform' },
'matches': ['\u0462', '\u0463', '\u1C87'],
'expected': '(?:[\\u0404-\\u040F\\u0454-\\u0491\\u1C87])',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here U+1C87 ᲇ should be matched because the uppercase of U+1C87 is U+0462 Ѣ, well in range of [є-ґ], while the lowercase of U+0462 Ѣ is U+0463 ѣ. So the String#toLowerCase approach in #80 will miss cases like that, which is why we have introduced iu-mappings.

'expected': '(?:[Kk\\u212A])',
},
{
'pattern': '(?i:\\u2C2F)',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also an example that a legacy ES5 engine might not support all case foldings in the Basic Plane, because the U+2C2F is introduced in Unicode 14.

Copy link
Collaborator

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks goods in principle, but I don't have enough domain knowledge to do a full review.

// https://mths.be/es6#sec-runtime-semantics-canonicalize-abstract-operation
(
if(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there is some weird formatting going on here

Copy link
Collaborator Author

@JLHwung JLHwung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(✔️ I have completed forgotten the context so I can give a review)

The idea of this PR is to further extend current iu-mappings to the BMP characters (U+0080 - U+FFFF) because we have to handle the i modifier, as we generate a non-i flag regex and simulate the i behaviour in modified groups. While the approach in #80 works for most common characters, it will introduce platform-depending behaviours because old platform will not support new Unicode characters.

@JLHwung JLHwung force-pushed the support-non-ascii-case-folding-i-modifier branch from 17d07d1 to 22a5353 Compare September 18, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline IgnoreCase modifier does not work with non-ASCII
2 participants