Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix big endian serialization #269

Merged
merged 3 commits into from Mar 2, 2024

Conversation

chqrlie
Copy link
Collaborator

@chqrlie chqrlie commented Feb 18, 2024

Big endian serialization was broken because:

  • it partially relied on WORDS_ENDIAN (unconditionally undef'd in cutils.h)
  • endianness was not handled at all in the bc reader.
  • bc_tag_str was missing the "RegExp" string
  • lre_byte_swap() was broken for REOP_range and REOP_range32

Modifications:

  • remove WORDS_ENDIAN
  • use bc_put_u32() / bc_put_u64() in JS_WriteBigInt()
  • use bc_get_u32() / bc_get_u64() in JS_ReadBigInt()
  • handle host endianness in bc_get_u16(), bc_get_u32(), bc_get_u64() and JS_ReadFunctionBytecode()
  • handle optional littleEndian argument as specified in js_dataview_getValue() and js_dataview_setValue()
  • fix bc_tag_str and lre_byte_swap()

Big endian serialization was broken because:
- it partially relied on `WORDS_ENDIAN` (unconditionally undef'd in cutils.h)
- endianness was not handled at all in the bc reader.

Modifications:
- remove `WORDS_ENDIAN`
- use `bc_put_u32()` / `bc_put_u64()` in `JS_WriteBigInt()`
- use `bc_get_u32()` / `bc_get_u64()` in `JS_ReadBigInt()`
- handle host endianness in `bc_get_u16()`, `bc_get_u32()`, `bc_get_u64()` and
  `JS_ReadFunctionBytecode()`

- handle optional littleEndian argument as specified in
  `js_dataview_getValue()` and `js_dataview_setValue()`
Big endian serialization was broken because:
- it partially relied on `WORDS_ENDIAN` (unconditionally undef'd in cutils.h)
- endianness was not handled at all in the bc reader.
- `bc_tag_str` was missing the `"RegExp"` string
- `lre_byte_swap()` was broken for `REOP_range` and `REOP_range32`

Modifications:
- remove `WORDS_ENDIAN`
- use `bc_put_u32()` / `bc_put_u64()` in `JS_WriteBigInt()`
- use `bc_get_u32()` / `bc_get_u64()` in `JS_ReadBigInt()`
- handle host endianness in `bc_get_u16()`, `bc_get_u32()`, `bc_get_u64()` and
  `JS_ReadFunctionBytecode()`
- handle optional littleEndian argument as specified in
  `js_dataview_getValue()` and `js_dataview_setValue()`
- fix `bc_tag_str` and `lre_byte_swap()`
@saghul
Copy link
Contributor

saghul commented Feb 18, 2024

Thanks @chqrlie ! Is any test currently testing this behavior? Any chance one could be written?

@chqrlie
Copy link
Collaborator Author

chqrlie commented Feb 18, 2024

Thanks @chqrlie ! Is any test currently testing this behavior? Any chance one could be written?

There are no big endian architectures in the current CI tests. This actually explains why the partial implementation did not trigger any errors as all the big endian support essentially expands to nothing.

Test cases can be added to test_bjson.js for all cases where endianness matters (byte code, wide strings, complex regexp, typed arrays...) and running make test on a big endian target will be an effective test for the serialization code.

Maybe we can add mips-unknown-linux-gnu, or build-linux-s390x as in https://github.com/google/flatbuffers/pull/7707/files

@saghul
Copy link
Contributor

saghul commented Feb 18, 2024

Yep, that would be very useful.

Do you want to give it a try?

Once you add anything to the yaml file and push the new CI will run.

@chqrlie
Copy link
Collaborator Author

chqrlie commented Feb 18, 2024

Do you mind doing this yourself? I am not familiar enough with the yaml stuff yet, I don't want to break anything.

@saghul
Copy link
Contributor

saghul commented Feb 19, 2024

No worries! I'll take a look in the upcoming days, I'm traveling this week.

Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@saghul can I go and merge this or should I wait for your CI work? Either is fine by me.

@chqrlie
Copy link
Collaborator Author

chqrlie commented Feb 20, 2024

@saghul can I go and merge this or should I wait for your CI work? Either is fine by me.

Actually, I would like to refine this patch and clarify the methodology. I shall post another commit later today (after I get some sleep :)

@saghul
Copy link
Contributor

saghul commented Feb 20, 2024

LGTM

@saghul can I go and merge this or should I wait for your CI work? Either is fine by me.

Land it at will, once @chqrlie is ready, I'll add the CI afterwards.

quickjs.c Outdated Show resolved Hide resolved
Big endian serialization was broken because:
- it partially relied on `WORDS_ENDIAN` (unconditionally undef'd in cutils.h)
- endianness was not handled at all in the bc reader.
- `bc_tag_str` was missing the `"RegExp"` string
- `lre_byte_swap()` was broken for `REOP_range` and `REOP_range32`

Modifications:
- remove `WORDS_ENDIAN`
- use `bc_put_u32()` / `bc_put_u64()` in `JS_WriteBigInt()`
- use `bc_get_u32()` / `bc_get_u64()` in `JS_ReadBigInt()`
- handle host endianness in `bc_get_u16()`, `bc_get_u32()`, `bc_get_u64()` and
  `JS_ReadFunctionBytecode()`
- handle optional littleEndian argument as specified in
  `js_dataview_getValue()` and `js_dataview_setValue()`
- fix `bc_tag_str` and `lre_byte_swap()`
Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Charlie, do you want me to give you commit access so you can merge it yourself?

@@ -50512,7 +50513,8 @@ static JSValue js_dataview_setValue(JSContext *ctx,
{
JSTypedArray *ta;
JSArrayBuffer *abuf;
int is_swap, size;
BOOL littleEndian, is_swap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's nitpicking but for consistency I would suggest calling it little_endian.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used littleEndian for consistency with the ECMA document, otherwise I would definitely use little_endian.

@chqrlie
Copy link
Collaborator Author

chqrlie commented Mar 1, 2024

LGTM. Charlie, do you want me to give you commit access so you can merge it yourself?

Thank you for proposing. I appreciate. I guess I would also get reviewing rights?
What is your policy for reviewing before committing?

@saghul
Copy link
Contributor

saghul commented Mar 1, 2024

Anyone can review, committee or not.

In general we open PRs for every change and it should be reviewed by at least 1 person before merging.

@saghul
Copy link
Contributor

saghul commented Mar 1, 2024

Just sent you an invite @chqrlie !

@chqrlie chqrlie merged commit 708dbcb into quickjs-ng:master Mar 2, 2024
35 checks passed
@chqrlie chqrlie deleted the fix-be-serialization branch March 2, 2024 17:39
@chqrlie
Copy link
Collaborator Author

chqrlie commented Mar 2, 2024

No worries! I'll take a look in the upcoming days, I'm traveling this week.

I merged the new code, but we still do not have a big-endian CI target...

@saghul
Copy link
Contributor

saghul commented Mar 2, 2024

Life happened 😅 I think I'll have time this week 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants