Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Unicode Normalization #124

Closed
zedseven opened this issue Feb 5, 2021 · 8 comments
Closed

Feature Request: Unicode Normalization #124

zedseven opened this issue Feb 5, 2021 · 8 comments

Comments

@zedseven
Copy link

zedseven commented Feb 5, 2021

Working with a lot of Unicode characters from potentially many sources, I've lately found myself performing Unicode normalization on text a lot.

It would be useful to be able to perform the various Unicode normalization forms on text directly in-editor, without having to copy it elsewhere then paste in the normalized version. In my particular case this is Form D, but I can see uses for all 4 forms.

This isn't a pressing matter by any means but I thought I'd suggest it since there seems to be a built-in method group for this exact purpose in Java already.

@krasa
Copy link
Owner

krasa commented Feb 5, 2021

Convert Diacritics (Accents) to ASCII is not doing what you need?

@zedseven
Copy link
Author

zedseven commented Feb 6, 2021

Ah, no - Unicode normalization does things like decompose characters into their component parts:
á (U+E1) -> (U+61 U+0301)
and converting characters into their canonical forms if necessary:
ʹ (U+0374) -> ʹ (U+02B9)

@krasa
Copy link
Owner

krasa commented Feb 11, 2021

I see. Adding a new action would be fine. Althought I am thinking that it could be a good idea to use icu4j (unfortunately 13MB jar) beucase JRE 11 Unicode support is limited. https://youtrack.jetbrains.com/issue/JBR-2875

@krasa
Copy link
Owner

krasa commented Mar 1, 2021

I do not really work with unicode that often, so it seems to me useful to see what happens exactly. What do you say to this?
image

@krasa
Copy link
Owner

krasa commented Mar 1, 2021

For your example:
image

@zedseven
Copy link
Author

zedseven commented Mar 1, 2021

That looks fantastic. From what I can see, everything looks good there.

krasa added a commit that referenced this issue Mar 1, 2021
krasa added a commit that referenced this issue Mar 1, 2021
@krasa
Copy link
Owner

krasa commented Mar 1, 2021

You can try it:
StringManipulation.zip

Btw is your IDE version 2020.3+?

@zedseven
Copy link
Author

zedseven commented Mar 7, 2021

This is fantastic!
I just tried it with Rider EAP 4 2021.1, and everything seems to be working perfectly.
Thank you so much.

@krasa krasa closed this as completed Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants