Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser): error on source larger than 4 GiB #1860

Merged
merged 3 commits into from
Jan 2, 2024

Conversation

overlookmotel
Copy link
Collaborator

@overlookmotel overlookmotel commented Dec 30, 2023

Token and Span both represent start and end as u32.

This limits size of source which can be parsed to u32::MAX.

/// NOTE: `u32` is sufficient for "all" reasonable programs. Larger than u32 is a 4GB JS file.
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
#[cfg_attr(feature = "serde", derive(Serialize))]
pub struct Span {
pub start: u32,
pub end: u32,
}

However, this constraint is currently not enforced.

In a release build, code will not panic on arithmetic overflow, so start/end could wrap around back to zero if source is 4 GiB or more.

That'd produce nonsense spans. But worse, the lexer relies in some places on self.current.token.start being correct, so if the value wrapped around, possibly it'd keep rewinding to the start of the source and lexing it again, causing an infinite loop.

In worst case, if for some reason an application's public API used OXC's parser with user-supplied source code (parser-as-a-service!), this could be exploited for denial of service.

This PR adds an assertion to catch this at the start of parsing instead.

This does add an extra instruction, but I imagine the effect will be negligible compared to the work required to parse the code.

@github-actions github-actions bot added the A-parser Area - Parser label Dec 30, 2023
@overlookmotel
Copy link
Collaborator Author

Actually maybe panicking isn't ideal. Should parser instead return an empty AST and an error (i.e. treat too-long code as a syntax error)?

@overlookmotel
Copy link
Collaborator Author

overlookmotel commented Dec 30, 2023

Last commit changes behavior to return an error instead of panic if source size exceeds limit. I assume that's better.

2 problems remain with the tests:

  1. I think it's ideal to have a debug_assert!() for source length in Lexer::new so that the size constraint is encapsulated in the lexer. But that makes the overlong_source test fail in a debug build (passes in release mode). That's the reason CI is failing. Any idea how to tackle that?
  2. legal_length_source test takes about 5 secs to run in release mode, and 1 minute in debug mode. Is 5s too long for a test?

Could those tests be set to only run in release mode? Does CI runs tests against a release build? Or can we just skip these tests?

@overlookmotel overlookmotel changed the title fix(parser): assert max size of source fix(parser): refuse to parse source larger than 4 GiB Dec 30, 2023
@overlookmotel overlookmotel force-pushed the parser-max-len branch 3 times, most recently from b997e17 to 66cacf3 Compare December 31, 2023 14:26
@Boshen
Copy link
Member

Boshen commented Dec 31, 2023

I would just let it crash and change the public API

/// Create a new parser
pub fn new(allocator: &'a Allocator, source_text: &'a str, source_type: SourceType) -> Self {

to something like

/// # Panics
/// * When source length is 4GB (u32 max)
pub fn new(...) {
  assert!(source_text.len() < 4GB, "source text is too long ...");
}

Or play nice and return an empty AST and a nice error.

As for the tests, I'll just omit them because it's not that interesting to unit test a 4GB program.

@overlookmotel overlookmotel changed the title fix(parser): refuse to parse source larger than 4 GiB fix(parser): error on source larger than 4 GiB Dec 31, 2023
Copy link

codspeed-hq bot commented Dec 31, 2023

CodSpeed Performance Report

Merging #1860 will not alter performance

Comparing overlookmotel:parser-max-len (f4c4839) with main (2ac2630)

Summary

✅ 14 untouched benchmarks

@overlookmotel
Copy link
Collaborator Author

I found solution to the above problems. Now squashed to a single commit and tests pass.

I went with returning empty AST and error approach. Currently I don't think parser has any code paths which lead to panic, and I worried introducing one might hurt performance.

Feel free to remove the 2nd test. That's the one that takes 5 secs to run, but it won't run on CI anyway.

However, I think it'd be ideal to keep the 1st test. Atom is going to rely on the invariant that a string which references source text cannot exceed 4 GiB, and if that invariant was violated, it could produce UB.

Note: To allow the debug_assert! for size in Lexer::new, there are 2 places in parser where the length is checked. However, the 2nd check only occurs on code path where parsing has failed, so won't affect performance of successful parsing, which I imagine is what matters.

Sorry for all the noise on this PR!

@overlookmotel
Copy link
Collaborator Author

Shall I squash commits and rebase on main? Or will that happen automatically when merging?

@Boshen Boshen merged commit 62bc8c5 into oxc-project:main Jan 2, 2024
17 checks passed
@Boshen
Copy link
Member

Boshen commented Jan 2, 2024

Thank you for making Oxc safe!

@overlookmotel overlookmotel deleted the parser-max-len branch January 2, 2024 12:07
Brooooooklyn added a commit to toeverything/AFFiNE that referenced this pull request Jan 7, 2024
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [oxlint](https://oxc-project.github.io) ([source](https://togithub.com/oxc-project/oxc/tree/HEAD/npm/oxlint)) | [`0.0.22` -> `0.1.2`](https://renovatebot.com/diffs/npm/oxlint/0.0.22/0.1.2) | [![age](https://developer.mend.io/api/mc/badges/age/npm/oxlint/0.1.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/oxlint/0.1.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/oxlint/0.0.22/0.1.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/oxlint/0.0.22/0.1.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>oxc-project/oxc (oxlint)</summary>

### [`v0.1.2`](https://togithub.com/oxc-project/oxc/releases/tag/oxlint_v0.1.2): oxlint v0.1.2

[Compare Source](https://togithub.com/oxc-project/oxc/compare/821cc8e9c7cfb326ff546483bb2a32d12e018e4c...4a9e0c4bf4179bf5839b690be3690163ff00e2ef)

#### Try it out!

-   Run `npx --yes oxlint@latest` from your terminal
-   Install the Vscode extension https://marketplace.visualstudio.com/items?itemName=oxc.oxc-vscode
-   Read the [usage guide](https://oxc-project.github.io/docs/guide/usage/linter.html)

#### Svelte support

`<script>` tag is linted by default.

#### Features

-   feat(linter): <script> part of svelte file by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#1918
-   feat(linter): disable no-unused-labels for svelte by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#1919

### Fixes

-   fix(linter): change no-var to restriction [`bb6128b`](https://togithub.com/oxc-project/oxc/commit/bb6128b)
-   chore: add some useful informantion log by [@&#8203;IWANABETHATGUY](https://togithub.com/IWANABETHATGUY) in [oxc-project/oxc#1912
-   fix(linter) fix eslint config for filename case by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1913

**Full Changelog**: oxc-project/oxc@oxlint_v0.1.1...oxlint_v0.1.2

### [`v0.1.1`](https://togithub.com/oxc-project/oxc/releases/tag/oxlint_v0.1.1): oxlint v0.1.1

[Compare Source](https://togithub.com/oxc-project/oxc/compare/v0.1.0...821cc8e9c7cfb326ff546483bb2a32d12e018e4c)

#### Try it out!

-   Run `npx --yes oxlint@latest` from your terminal
-   Install the Vscode extension https://marketplace.visualstudio.com/items?itemName=oxc.oxc-vscode
-   Read the [usage guide](https://oxc-project.github.io/docs/guide/usage/linter.html)

#### Vue support

`<script>` and `<script setup>` are linted by default.

#### Astro support

Frontmatter component script `---` and all `<script>` tags are linted by default.

#### Configuration files (experimental)

`-c ./eslintrc.json` will use the `rules` field for rule configuration, as documented in [ESLint's documentation](https://eslint.org/docs/latest/use/configure/rules#using-configuration-files).

Unfortunately, only the `json` format is supported right now.

The `extends` field will not take effect; normal `-D` and `-A` flags still apply.

#### New Rules

##### Correctness

-   deepscan: bad object literal comparison by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1844
-   oxc: erasing op by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1834
-   oxc: only used in recursion by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1833
-   eslint: no irregular whitespace by [@&#8203;DeividAlmeida](https://togithub.com/DeividAlmeida) in [oxc-project/oxc#1877
-   eslint: no-unused-private-class-members rule by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#1820
-   eslint: no-var by [@&#8203;zhangrunzhao](https://togithub.com/zhangrunzhao) in [oxc-project/oxc#1890
-   eslint-plugin-react: jsx-no-undef for by [@&#8203;XantreGodlike](https://togithub.com/XantreGodlike) in [oxc-project/oxc#1862
-   eslint-plugin-jsx-a11y: aria-role by [@&#8203;msdlisper](https://togithub.com/msdlisper) in [oxc-project/oxc#1849
-   eslint-plugin-jsx-a11y: lang by [@&#8203;msdlisper](https://togithub.com/msdlisper) in [oxc-project/oxc#1812
-   eslint-plugin-jsx-a11y: media-has-caption by [@&#8203;poteboy](https://togithub.com/poteboy) in [oxc-project/oxc#1822
-   eslint-plugin-jsx-a11y: mouse-events-have-key-events(correctness) by [@&#8203;Ken-HH24](https://togithub.com/Ken-HH24) in [oxc-project/oxc#1867
-   eslint-plugin-jsx-a11y: prefer-tag-over-role rule by [@&#8203;yossydev](https://togithub.com/yossydev) in [oxc-project/oxc#1831
-   eslint-plugin-jsx-a11y: aria-unsupported-elements by [@&#8203;re-taro](https://togithub.com/re-taro) in [oxc-project/oxc#1855
-   eslint-plugin-jsx-a11y: html_has_lang by [@&#8203;msdlisper](https://togithub.com/msdlisper) in [oxc-project/oxc#1843

##### Suspicious

-   oxc: approx constant by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1818
-   oxc: misrefactored assign op by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1832

##### Restriction

-   react: button has type by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1785
-   unicorn: prefer modern math apis by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1620

#### Fixes

-   fix(linter): ignore anonymous functional components in arrays for eslint-plugin-react(jsx-key) by [@&#8203;maurice](https://togithub.com/maurice) in [oxc-project/oxc#1858
-   Prioritize ignored paths when linting by [@&#8203;clarkf](https://togithub.com/clarkf) in [oxc-project/oxc#1878
-   feat(linter): handle more cases for `const-comparisons` by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1817
-   feat(semantic): allow reserved keyword defined in ts module block by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#1907
-   fix(parser): error on source larger than 4 GiB by [@&#8203;overlookmotel](https://togithub.com/overlookmotel) in [oxc-project/oxc#1860
-   fix(parser): fix incorrectly identified directives by [@&#8203;overlookmotel](https://togithub.com/overlookmotel) in [oxc-project/oxc#1885
-   fix(parser): terminate parsing if an EmptyParenthesizedExpression error occurs by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#1874
-   fix(semantic): remove duplicate errors in ModuleDeclaration::ImportDeclaration by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#1846
-   perf(linter): reduce the number of diagnostics for no_sparse_arrays by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#1895

#### New Contributors

-   [@&#8203;maurice](https://togithub.com/maurice) made their first contribution in [oxc-project/oxc#1858
-   [@&#8203;re-taro](https://togithub.com/re-taro) made their first contribution in [oxc-project/oxc#1855
-   [@&#8203;DeividAlmeida](https://togithub.com/DeividAlmeida) made their first contribution in [oxc-project/oxc#1835
-   [@&#8203;XantreGodlike](https://togithub.com/XantreGodlike) made their first contribution in [oxc-project/oxc#1862
-   [@&#8203;Qix-](https://togithub.com/Qix-) made their first contribution in [oxc-project/oxc#1861
-   [@&#8203;yossydev](https://togithub.com/yossydev) made their first contribution in [oxc-project/oxc#1831
-   [@&#8203;clarkf](https://togithub.com/clarkf) made their first contribution in [oxc-project/oxc#1878
-   [@&#8203;zhangrunzhao](https://togithub.com/zhangrunzhao) made their first contribution in [oxc-project/oxc#1890

**Full Changelog**: oxc-project/oxc@oxlint_v0.0.22...oxlint_v0.1.1

### [`v0.1.0`](https://togithub.com/oxc-project/oxc/releases/tag/v0.1.0): CLI v0.1.0 Ezno Type Checker

[Compare Source](https://togithub.com/oxc-project/oxc/compare/a1accdca7f83694a6ea520d5cbfd090ea5dd271a...v0.1.0)

`npx oxidation-compiler@latest check ./test.ts`

![image](https://togithub.com/Boshen/oxc/assets/1430279/c7308395-1856-43fa-b4b8-b239886ec259)

#### New Contributors

-   [@&#8203;anonrig](https://togithub.com/anonrig) made their first contribution in [oxc-project/oxc#388
-   [@&#8203;kaleidawave](https://togithub.com/kaleidawave) made their first contribution in [oxc-project/oxc#413

**Full Changelog**: https://github.com/Boshen/oxc/compare/v0.0.7...

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/toeverything/AFFiNE).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xMjEuMCIsInVwZGF0ZWRJblZlciI6IjM3LjEyMS4wIiwidGFyZ2V0QnJhbmNoIjoiY2FuYXJ5In0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants