Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of astral character reference #158

Merged
merged 2 commits into from
Oct 27, 2023

Conversation

Porges
Copy link
Contributor

@Porges Porges commented Oct 26, 2023

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and couldn’t find anything (or linked relevant results below)
  • If applicable, I’ve added docs and tests

Description of changes

decodeNumericCharacterReference was using String.fromCharCode which only handles UTF-16 code units, not true codepoints. Any characters not representable by a single UTF-16 code unit ("astral characters") would come out incorrectly.

Note that there are other uses of String.fromCharCode in micromark. These should also be examined. The current behaviour is a regression that was introduced 4 months ago by bbb726d, although the correct version only existed for 3 days prior to that 🙂 — the very first version of the function had the same problem.

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Oct 26, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2023

Codecov Report

Merging #158 (21a77fc) into main (e0dce3d) will not change coverage.
The diff coverage is 100.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff            @@
##              main      #158   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           57        57           
  Lines        11936     11936           
=========================================
  Hits         11936     11936           
Files Coverage Δ
...il-decode-numeric-character-reference/dev/index.js 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@wooorm wooorm changed the title Fix character reference handling for astral characters Fix handling of astral character reference Oct 27, 2023
@wooorm wooorm merged commit 6db3b42 into micromark:main Oct 27, 2023
14 checks passed
@wooorm wooorm added the 💪 phase/solved Post is done label Oct 27, 2023
@github-actions

This comment has been minimized.

@wooorm
Copy link
Member

wooorm commented Oct 27, 2023

Thanks!

@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Oct 27, 2023
@wooorm
Copy link
Member

wooorm commented Oct 27, 2023

Our linter xo thought it was useful. But we mostly index into strings, so changing to code points actually breaks things. In this case it doesn’t though!

@wooorm
Copy link
Member

wooorm commented Oct 27, 2023

Thank you, released! https://github.com/micromark/micromark/releases/tag/micromark-util-decode-numeric-character-reference%402.0.1

@Porges Porges deleted the fix-character-references branch October 27, 2023 09:06
@Porges
Copy link
Contributor Author

Porges commented Oct 27, 2023

Thank you, released! https://github.com/micromark/micromark/releases/tag/micromark-util-decode-numeric-character-reference%402.0.1

Thanks for the quick turnaround 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 phase/solved Post is done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants