Invalid unicode characters removed from datagrid #578

nicojapas · 2023-04-26T15:28:13Z

Fixes #456

When dealing with astral symbols and ellipsing, datagrid generates invalid Unicode characters because of the use of substring().

With the regular expression /[\u{D800}-\u{DFFF}]/gu we match any character falling within the range of surrogate code points. This includes both high surrogates (0xD800 to 0xDBFF) and low surrogates (0xDC00 to 0xDFFF). So any invalid Unicode character resulting from splitting a surrogate pair is removed with replace().

welcome · 2023-04-26T15:28:17Z

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.

You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

krassowski · 2023-04-26T15:41:24Z

Thank you for opening the PR! On conceptual level, what if someone has a table with all Unicode code points? Or if those are mapped to something else in a font. Would it be better to rewrite eliding to use Unicode-aware slice by first converting the string to an array as in https://stackoverflow.com/questions/62341685/javascript-unicode-aware-string-slice/62341816#62341816 ?

nicojapas · 2023-04-26T16:55:37Z

Thank you for opening the PR! On conceptual level, what if someone has a table with all Unicode code points? Or if those are mapped to something else in a font. Would it be better to rewrite eliding to use Unicode-aware slice by first converting the string to an array as in https://stackoverflow.com/questions/62341685/javascript-unicode-aware-string-slice/62341816#62341816 ?

Hi! Yes, that is a better approach I think. I just commited a new solution.

fcollonval

Thanks @nicojapas

Letting this opened to let @krassowski have a look at the latter version.

krassowski

Thank you! I will open a follow-up PR with unit tests.

welcome · 2023-05-11T22:01:38Z

Congrats on your first merged pull request in this project! 🎉

Thank you for contributing, we are very proud of you! ❤️

nicojapas added 2 commits April 26, 2023 16:50

Invalid unicode characters removed (jupyterlab#456)

0b952c8

Merge branch 'jupyterlab:main' into main

97000c4

github-actions bot assigned nicojapas Apr 26, 2023

krassowski added the bug Something isn't working label Apr 26, 2023

nicojapas added 3 commits April 26, 2023 18:38

Using arrays

ac6410b

Simplified structure

f9f0881

Merge branch 'main' of https://github.com/nicojapas/lumino into main

7569670

Prettified

aa03a19

fcollonval approved these changes May 2, 2023

View reviewed changes

krassowski approved these changes May 11, 2023

View reviewed changes

krassowski merged commit e887f33 into jupyterlab:main May 11, 2023
18 of 19 checks passed

krassowski changed the title ~~Invalid unicode characters removed from datagrid (#456)~~ Invalid unicode characters removed from datagrid May 11, 2023

krassowski mentioned this pull request May 11, 2023

Seed tests for datagrid, test TextRenderer/drawText #585

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid unicode characters removed from datagrid #578

Invalid unicode characters removed from datagrid #578

nicojapas commented Apr 26, 2023 •

edited by krassowski

welcome bot commented Apr 26, 2023

krassowski commented Apr 26, 2023

nicojapas commented Apr 26, 2023

fcollonval left a comment

krassowski left a comment

welcome bot commented May 11, 2023

Invalid unicode characters removed from datagrid #578

Invalid unicode characters removed from datagrid #578

Conversation

nicojapas commented Apr 26, 2023 • edited by krassowski

welcome bot commented Apr 26, 2023

krassowski commented Apr 26, 2023

nicojapas commented Apr 26, 2023

fcollonval left a comment

Choose a reason for hiding this comment

krassowski left a comment

Choose a reason for hiding this comment

welcome bot commented May 11, 2023

nicojapas commented Apr 26, 2023 •

edited by krassowski