Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

micromark-util-character: fix crash on some Node.js versions #157

Closed
wants to merge 1 commit into from

Conversation

aduh95
Copy link

@aduh95 aduh95 commented Oct 26, 2023

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and couldn’t find anything (or linked relevant results below)
  • If applicable, I’ve added docs and tests

Description of changes

\p regex notation is not available on Node.js build without intl support, causing the process to crash even the regex is never used. Ideally we would have a regex that works everywhere, but I figure lazy-loading the same regex should be enough for most use-cases.

Refs: nodejs/node#49988 (comment)

Signed-off-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
@github-actions github-actions bot added the 👋 phase/new Post is being triaged automatically label Oct 26, 2023
@github-actions

This comment has been minimized.

@github-actions github-actions bot added 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Oct 26, 2023
@wooorm
Copy link
Member

wooorm commented Oct 26, 2023

Lazy loading isn’t going to solve it. This regex is required to parse markdown.

Hmmm. I literally started using this because this is now available in modern browsers and Node 16+ (https://github.com/micromark/micromark/releases/tag/4.0.0).

I feel like we should be able to depend on recent Unicode?

@aduh95
Copy link
Author

aduh95 commented Oct 26, 2023

Lazy loading isn’t going to solve it. This regex is required to parse markdown.

Bummer, also I remember now that a SyntaxError would still be emitted no matter if the code is executed. Maybe we could try using the RegExp constructor with a string literal (i.e. new RegExp('\\p{P}', 'u')); node is still compiling on my machine btw, I haven't tested any of it yet, I'm just throwing ideas.

I feel like we should be able to depend on recent Unicode?

Node.js supports being build without any unicode support, I've suggested deprecating it that in nodejs/node#35942, you can read through the responses to see what are the use cases for this.
In the nodejs/node repo, we need to be able to run the test suite when node is compiled in this mode, which blocks us from updating our doc dependencies.

@wooorm
Copy link
Member

wooorm commented Oct 26, 2023

Like, markdown uses unicode. We could inject all those unicode characters in the regex again (like before), but that feels like going against someone specifically compiling without unicode support?

Node.js supports being build without any unicode support

This might be out of Nodes control but couldn’t it then use ascii punctuation in this case?

@wooorm
Copy link
Member

wooorm commented Oct 26, 2023

So your problem is that the Node project doesn’t compile. If it were to compile, it doesn’t actually parse markdown, so it wouldn’t get an error?

@aduh95
Copy link
Author

aduh95 commented Oct 26, 2023

The problem is that we need to build the docs before running the tests (because some tests ensure some specific parts of the documentation matches the actual implementation), and the way the CI is configured today, it's using the binary freshly built, which doesn't have unicode support.

new RegExp('\\p{P}', 'u'))

I've tried that, and as you predicted, it didn't help because the code is still run.
const unicodePunctuationInternal = () => false seems to work, but obviously that's not a great fix 😅

This might be out of Nodes control but couldn’t it then use ascii punctuation in this case?

I'm not sure what you mean by that, I think Node.js documentation is ASCII-only.

@wooorm
Copy link
Member

wooorm commented Oct 26, 2023

That for \P, when you build as ascii, to use ascii punctuation, instead of unicode punctuation

@wooorm
Copy link
Member

wooorm commented Feb 12, 2024

Closing as abandoned. I still kinda feel like, if v8 is built to not include unicode knowledge, then \p{P} in a regex should perhaps not crash, but instead limit itself to the ASCII range?
The alternative, of tools that has knowledge of unicode to prebuilt unicode indexes, I dunno that feels weird.

If there was some condition that Node would pick up on when built without unicode capabilities, we could perhaps do something about providing two versions for files here.

@wooorm wooorm closed this Feb 12, 2024
@wooorm wooorm added the 🙅 no/wontfix This is not (enough of) an issue for this project label Feb 12, 2024

This comment has been minimized.

@github-actions github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Feb 12, 2024
@aduh95
Copy link
Author

aduh95 commented Feb 12, 2024

If there was some condition that Node would pick up on when built without unicode capabilities, we could perhaps do something about providing two versions for files here.

I would have to doublecheck, but I think process.versions.unicode would be undefined when built without Unicode support. EDIT: yes this is correct, process.versions.unicode and process.versions.icu are defined only node was compiled with support for i18n.

@wooorm
Copy link
Member

wooorm commented Feb 12, 2024

Hmm, maybe interesting, but I was hoping for conditions instead of Node-specific process

@aduh95
Copy link
Author

aduh95 commented Feb 12, 2024

conditions could work, if the package was exposing a "without-intl" exports, we could spawn our process with node -C without-intl. Should I make a PR like that?

@wooorm
Copy link
Member

wooorm commented Feb 12, 2024

And what would it do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🙅 no/wontfix This is not (enough of) an issue for this project 👎 phase/no Post cannot or will not be acted on
Development

Successfully merging this pull request may close these issues.

None yet

2 participants