Conversation
quickjs.c
Outdated
| static char B64_PAIR_LUT[4096][2]; | ||
| static int B64_PAIR_INIT = 0; | ||
|
|
||
| static inline void b64_pair_init_once(void) { |
There was a problem hiding this comment.
Not thread-safe. Why don't you precompute the LUT? Faster than doing it once per process, plus it can be stored in read-only memory.
Aside: ALL_CAPS suggests B64_PAIR_LUT etc. are macros when they're not.
quickjs.c
Outdated
| unsigned ch2 = (unsigned char)src[k]; | ||
| uint8_t f2 = B64_FLAGS[ch2]; | ||
| if (f2 & K_WS) continue; // ws | ||
| if (ch2!='=') { if (err) *err = 1; return 0; } |
There was a problem hiding this comment.
This line is kind of hideous :)
|
8KB of extra data seems wasteful for little gain for a rare use case. Please keep quickjs small. |
|
There are also base64 operations in uint8array, this should be useful for that too. |
|
@bnoordhuis I think I addressed all the feedback :-) I'll admit I just ported the original PR to master without looking much into it 😅 |
|
The static array b64_pair is just useless bloat. Did anyone actually benchmark the performance gain and what usecase make it compelling to go down this road? |
The original author is @aabbdev and there was some discussion and benchmarks on the original PR. which got stalled and I'm trying to revive here. #1143 |
|
This PR still doesn't match the spec.
On On the LUT: compared to chqrlie's naive solution, on my (old-ish) x86_64 CPU (+ clang 21) the LUT does speed up encoding by 8%. But considering that
I agree with chqrlie in that including the LUT is a poor tradeoff. On the decoder: chqrlie's decoder wins with -O3 -march=native. (With just -O3, aabbdev's decoder wins.) |
|
Thanks for the feedback @bptato ! I'll try some alternatives. I already noticed some changes would ne becessary when making the uint8array base64 functions PR... |
- Introduce global btoa() and atob() functions - Standard base64 alphabet (RFC 4648) - Decoder implements forgiving-base64-decode (WHATWG Infra spec) - Tolerant to whitespace, validates padding per spec - JS_AddIntrinsicAToB() ensures DOMException is registered Co-authored-by: Saúl Ibarra Corretgé <s@saghul.net>
|
Updated, thanks for the patience everyone! I think @chqrlie and @bptato are right, sorry it took me a bit to get it folks. We can make it work first, then see if / how to optimize it. I made the intrinsics function check and add DOMException if necessary. I'm not sure about giving it a more generic name yet, we can do that when we get to structuredClone. Please take a look! |
| uint32_t v = ((uint32_t)src[i] << 16) | ||
| | ((uint32_t)src[i + 1] << 8) | ||
| | (uint32_t)src[i + 2]; |
There was a problem hiding this comment.
This is really just a suggestion but per the integer promotion rules you can write it more succinctly as demonstrated below and it compiles down to the same code:
| uint32_t v = ((uint32_t)src[i] << 16) | |
| | ((uint32_t)src[i + 1] << 8) | |
| | (uint32_t)src[i + 2]; | |
| uint32_t v = 65536*src[i+0] + 256*src[i+1] + src[i+2]; |
If you were to read four fields you'd have to write the multiplier as 16777216u or 0x1000000u to force unsigned arithmetic.
| for (i = 0; i < len; i++) { | ||
| if (!(b64_flags[(unsigned char)src[i]] & K_WS)) | ||
| nws++; | ||
| } |
There was a problem hiding this comment.
I'm 98% sure any decent compiler will compile it to branchless code anyway but if you want to be explicit (and help tinycc a hand):
| for (i = 0; i < len; i++) { | |
| if (!(b64_flags[(unsigned char)src[i]] & K_WS)) | |
| nws++; | |
| } | |
| for (i = 0; i < len; i++) | |
| nws += !(b64_flags[(unsigned char)src[i]] & K_WS); |
| nws++; | ||
| if (bits >= 8) { | ||
| bits -= 8; | ||
| dst[j++] = (uint8_t)((acc >> bits) & 0xFF); |
There was a problem hiding this comment.
The AND is superfluous if you're casting to uint8_t right after (and the cast itself isn't necessary either but it also doesn't hurt.)
| JSString *s, *ostr; | ||
| size_t len, out_len, written; | ||
|
|
||
| val = JS_ToString(ctx, argv[0]); |
There was a problem hiding this comment.
Maybe set ret = JS_EXCEPTION here, then you don't have to ret = ThingThatThrows(ctx) everywhere.
What you're doing now isn't wrong but
| written = b64_encode(in8, len, (char *)str8(ostr)); | ||
| str8(ostr)[written] = '\0'; |
There was a problem hiding this comment.
It should be slightly faster to cache the result of str8(ostr) in a local variable because str8 is not so trivial that you can rely on compilers doing that for you.
Also applies to line 61,071 and below.
| }; | ||
|
|
||
| static const char b64_flags[256] = { | ||
| [' ']=K_WS, ['\t']=K_WS, ['\r']=K_WS, ['\n']=K_WS, |
There was a problem hiding this comment.
"ASCII whitespace" also includes form feed (\f).
See: https://infra.spec.whatwg.org/#ascii-whitespace
|
I agree with getting it right before optimizing, but can we at least not make it O(N*2)? static size_t
b64_decode(const char *src, size_t len, uint8_t *dst, int *err)
{
size_t i, j;
uint32_t acc;
int seen, pad;
unsigned ch;
acc = 0;
seen = 0;
for (i = 0, j = 0; i < len; i++) {
ch = (unsigned char)src[i];
if ((b64_flags[ch] & K_WS))
continue;
if (!(b64_flags[ch] & K_VAL))
break;
acc = (acc << 6) | b64_val[ch];
seen++;
if (seen == 4) {
dst[j++] = (acc >> 16) & 0xFF;
dst[j++] = (acc >> 8) & 0xFF;
dst[j++] = (acc >> 0) & 0xFF;
seen = 0;
acc = 0;
}
}
if (seen != 0) {
if (seen == 3) { /* discard last 2 bits */
dst[j++] = (acc >> 10) & 0xFF;
dst[j++] = (acc >> 2) & 0xFF;
} else if (seen == 2) { /* discard last 4 bits */
dst[j++] = (acc >> 4) & 0xFF;
} else { /* remainder is 1, return failure */
*err = 1;
return 0;
}
for (pad = 0; i < len; i++) {
ch = (unsigned char)src[i];
if (pad < 2 && ch == '=')
pad++;
else if (!(b64_flags[ch] & K_WS))
break;
}
if (pad != 0 && seen + pad != 4) {
/* got padding, but code point length is not a multiple of 4 */
*err = 1;
return 0;
}
}
*err = i < len;
return j;
}If remainder ( |
See the updated commit description.