Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBOR vs. JSON performance -- why is CBOR.encode so much slower than JSON.stringify? #2542

Open
emily785 opened this issue Sep 6, 2023 · 5 comments

Comments

@emily785
Copy link

emily785 commented Sep 6, 2023

In my tests CBOR.encode is around ~3x slower than JSON.stringify, but CBOR.decode is ~3.7x faster.

Would be incredible if CBOR.encode could achieve similar performance as JSON.stringify.

Has anyone looked into this more closely? I've tried various json objects and the numbers are around the same every time.

json str size=130217
1000 iterations

JSON.parse: 1610 ms
JSON.stringify: 422 ms

CBOR.decode: 421 ms
CBOR.encode: 1375 ms
@svaarala
Copy link
Owner

svaarala commented Sep 6, 2023

With the JSON fast path enabled the JSON encoding is relatively well optimized, CBOR is not refined to such an extent which explains some of the difference.

In principle it should be trivial to CBOR encode strings, but since CBOR strings need to be pure UTF-8 there's an "is this valid UTF-8" check for the strings during encoding. This is now a string scan before the encode can proceed. It should be possible to optimize this to match JSON.stringify performance.

However, in master there's (soon) an even simpler fix: with WTF-8 support duk_hstring will soon have a flag which indicates whether the string is pure UTF-8 or needs WTF-8 extensions (= unpaired surrogates). The string scan can thus soon be removed which should make it very fast.

@emily785
Copy link
Author

emily785 commented Sep 6, 2023

My test object actually contained a fair amount of strings.

Thanks for the information, looked into duk__cbor_encode_string_top and I noticed you already have some test options DUK_CBOR_TEXT_STRINGS / DUK_CBOR_BYTE_STRINGS. Looks like I will be able to make it skip the utf8 check for my usage using one of these #defines.

I have to say, I'm really looking forward to future duktape updates. I hope you are doing well. Is there any way to donate to show support?

@svaarala
Copy link
Owner

svaarala commented Sep 10, 2023

If you happen to test CBOR performance with DUK_CBOR_{TEXT,BYTE}_STRINGS enabled, it'd be nice to know how much impact that has. The same level of overhead would be eliminated after the "string is UTF8" flag is added to duk_hstring.

I have to say, I'm really looking forward to future duktape updates. I hope you are doing well. Is there any way to donate to show support?

Thanks! It's been a bit difficult finding time for Duktape in the past few years but slowly things are getting better. There's no active donation method right now, but a good way to give support is to provide concrete, reproducible and actionable issues and pulls :-)

@emily785
Copy link
Author

My test json:
test.txt

100k iterations
Ran a few times
Varies a bit each time, computer stuff I guess. The results below is the general trend.
Fastest encode was with DUK_CBOR_BYTE_STRINGS but then decode becomes very slow for some reason.

I haven't looked any closer. I will try to research it properly one day.

Luckily I mostly use decode in my project, and that beats JSON so I'm happy.. but hopeflly CBOR encode can be as fast as JSON stringify some day, or close to it.

#define DUK_CBOR_DECODE_FASTPATH
JSON.stringify: 486 ms.
JSON.parse: 1804 ms.
CBOR.encode: 1726 ms.
CBOR.decode: 529 ms.

#define DUK_CBOR_TEXT_STRINGS
JSON.stringify: 488 ms.
JSON.parse: 1798 ms.
CBOR.encode: 1688 ms.
CBOR.decode: 532 ms.

#define DUK_CBOR_BYTE_STRINGS
JSON.stringify: 499 ms.
JSON.parse: 1801 ms.
CBOR.encode: 1586 ms.
CBOR.decode: 2651 ms. (???)

@svaarala
Copy link
Owner

svaarala commented Sep 20, 2023

Thanks for the measurements 👍

Fastest encode was with DUK_CBOR_BYTE_STRINGS but then decode becomes very slow for some reason.

This is probably because when decoding back, CBOR byte strings will decode into Uint8Array objects which are much heavier than strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants