add btoa()/atob() builtins by saghul · Pull Request #1444 · quickjs-ng/quickjs

saghul · 2026-04-09T18:42:17Z

See the updated commit description.

bnoordhuis · 2026-04-09T21:26:58Z

quickjs.c

+static char B64_PAIR_LUT[4096][2];
+static int B64_PAIR_INIT = 0;
+
+static inline void b64_pair_init_once(void) {


Not thread-safe. Why don't you precompute the LUT? Faster than doing it once per process, plus it can be stored in read-only memory.

Aside: ALL_CAPS suggests B64_PAIR_LUT etc. are macros when they're not.

bnoordhuis · 2026-04-09T21:32:05Z

quickjs.c

+                unsigned ch2 = (unsigned char)src[k];
+                uint8_t f2 = B64_FLAGS[ch2];
+                if (f2 & K_WS) continue; // ws
+                if (ch2!='=') { if (err) *err = 1; return 0; }


This line is kind of hideous :)

quickjs.c

chqrlie · 2026-04-10T06:23:50Z

8KB of extra data seems wasteful for little gain for a rare use case. Please keep quickjs small.

saghul · 2026-04-10T06:56:58Z

There are also base64 operations in uint8array, this should be useful for that too.

saghul · 2026-04-10T07:39:03Z

@bnoordhuis I think I addressed all the feedback :-) I'll admit I just ported the original PR to master without looking much into it 😅

chqrlie · 2026-04-10T08:17:03Z

The static array b64_pair is just useless bloat. Did anyone actually benchmark the performance gain and what usecase make it compelling to go down this road?

saghul · 2026-04-10T08:50:30Z

The static array b64_pair is just useless bloat. Did anyone actually benchmark the performance gain and what usecase make it compelling to go down this road?

The original author is @aabbdev and there was some discussion and benchmarks on the original PR. which got stalled and I'm trying to revive here. #1143

bptato · 2026-04-10T13:20:39Z

This PR still doesn't match the spec.

It encodes/decodes using the "base64url" charset, but standard btoa doesn't. '-' and '_' ought to be swapped with '+' and '/'. ref. forgiving-base64 encode, Table 1: The Base64 Alphabet. Test cases: btoa("\xFF"), btoa("\xFA").
(Strangely enough, the original PR even acknowledges this in a comment.)
atob("Aaa==") should throw (too much padding), but it doesn't. It even accepts atob("Aaa==============") :)

On JS_AddIntrinsicBase64: I still think it should be together with JS_AddIntrinsicDOMException, otherwise the API is very easy to misuse. I'd suggest JS_AddIntrinsicSerialization which also anticipates structuredClone.

On the LUT: compared to chqrlie's naive solution, on my (old-ish) x86_64 CPU (+ clang 21) the LUT does speed up encoding by 8%. But considering that

even aabbdev says that calling costs are the main bottleneck, not encoding
according to chqrlie, the pair solution is actually slower on aarch64
a small size is one of QuickJS's main selling points

I agree with chqrlie in that including the LUT is a poor tradeoff.

On the decoder: chqrlie's decoder wins with -O3 -march=native. (With just -O3, aabbdev's decoder wins.)

saghul · 2026-04-10T13:46:40Z

Thanks for the feedback @bptato ! I'll try some alternatives. I already noticed some changes would ne becessary when making the uint8array base64 functions PR...

- Introduce global btoa() and atob() functions - Standard base64 alphabet (RFC 4648) - Decoder implements forgiving-base64-decode (WHATWG Infra spec) - Tolerant to whitespace, validates padding per spec - JS_AddIntrinsicAToB() ensures DOMException is registered Co-authored-by: Saúl Ibarra Corretgé <s@saghul.net>

saghul · 2026-04-11T07:55:22Z

Updated, thanks for the patience everyone!

I think @chqrlie and @bptato are right, sorry it took me a bit to get it folks.

We can make it work first, then see if / how to optimize it.

I made the intrinsics function check and add DOMException if necessary. I'm not sure about giving it a more generic name yet, we can do that when we get to structuredClone.

Please take a look!

bnoordhuis · 2026-04-11T10:48:58Z

quickjs.c

+        uint32_t v = ((uint32_t)src[i] << 16)
+                   | ((uint32_t)src[i + 1] << 8)
+                   | (uint32_t)src[i + 2];


This is really just a suggestion but per the integer promotion rules you can write it more succinctly as demonstrated below and it compiles down to the same code:

Suggested change

uint32_t v = ((uint32_t)src[i] << 16)

| ((uint32_t)src[i + 1] << 8)

| (uint32_t)src[i + 2];

uint32_t v = 65536*src[i+0] + 256*src[i+1] + src[i+2];

If you were to read four fields you'd have to write the multiplier as 16777216u or 0x1000000u to force unsigned arithmetic.

bnoordhuis · 2026-04-11T10:53:11Z

quickjs.c

+    for (i = 0; i < len; i++) {
+        if (!(b64_flags[(unsigned char)src[i]] & K_WS))
+            nws++;
+    }


I'm 98% sure any decent compiler will compile it to branchless code anyway but if you want to be explicit (and help tinycc a hand):

Suggested change

for (i = 0; i < len; i++) {

if (!(b64_flags[(unsigned char)src[i]] & K_WS))

nws++;

}

for (i = 0; i < len; i++)

nws += !(b64_flags[(unsigned char)src[i]] & K_WS);

bnoordhuis · 2026-04-11T10:55:59Z

quickjs.c

+        nws++;
+        if (bits >= 8) {
+            bits -= 8;
+            dst[j++] = (uint8_t)((acc >> bits) & 0xFF);


The AND is superfluous if you're casting to uint8_t right after (and the cast itself isn't necessary either but it also doesn't hurt.)

bnoordhuis · 2026-04-11T10:57:39Z

quickjs.c

+    JSString *s, *ostr;
+    size_t len, out_len, written;
+
+    val = JS_ToString(ctx, argv[0]);


Maybe set ret = JS_EXCEPTION here, then you don't have to ret = ThingThatThrows(ctx) everywhere.

What you're doing now isn't wrong but ⤴️ is what we do in most places.

bnoordhuis · 2026-04-11T11:01:44Z

quickjs.c

+    written = b64_encode(in8, len, (char *)str8(ostr));
+    str8(ostr)[written] = '\0';


It should be slightly faster to cache the result of str8(ostr) in a local variable because str8 is not so trivial that you can rely on compilers doing that for you.

Also applies to line 61,071 and below.

bptato · 2026-04-11T11:35:38Z

quickjs.c

+};
+
+static const char b64_flags[256] = {
+    [' ']=K_WS, ['\t']=K_WS, ['\r']=K_WS, ['\n']=K_WS,


"ASCII whitespace" also includes form feed (\f).
See: https://infra.spec.whatwg.org/#ascii-whitespace

bptato · 2026-04-11T12:07:33Z

I agree with getting it right before optimizing, but can we at least not make it O(N*2)?
Here's a cleaner option:

static size_t
b64_decode(const char *src, size_t len, uint8_t *dst, int *err)
{
    size_t i, j;
    uint32_t acc;
    int seen, pad;
    unsigned ch;

    acc = 0;
    seen = 0;
    for (i = 0, j = 0; i < len; i++) {
        ch = (unsigned char)src[i];
        if ((b64_flags[ch] & K_WS))
            continue;
        if (!(b64_flags[ch] & K_VAL))
            break;
        acc = (acc << 6) | b64_val[ch];
        seen++;
        if (seen == 4) {
            dst[j++] = (acc >> 16) & 0xFF;
            dst[j++] = (acc >> 8) & 0xFF;
            dst[j++] = (acc >> 0) & 0xFF;
            seen = 0;
            acc = 0;
        }
    }

    if (seen != 0) {
        if (seen == 3) { /* discard last 2 bits */
            dst[j++] = (acc >> 10) & 0xFF;
            dst[j++] = (acc >> 2) & 0xFF;
        } else if (seen == 2) { /* discard last 4 bits */
            dst[j++] = (acc >> 4) & 0xFF;
        } else { /* remainder is 1, return failure */
            *err = 1;
            return 0;
        }
        for (pad = 0; i < len; i++) {
            ch = (unsigned char)src[i];
            if (pad < 2 && ch == '=')
                pad++;
            else if (!(b64_flags[ch] & K_WS))
                break;
        }
        if (pad != 0 && seen + pad != 4) {
            /* got padding, but code point length is not a multiple of 4 */
            *err = 1;
            return 0;
        }
    }

    *err = i < len;
    return j;
}

If remainder (seen) is 0, there can be no padding (2.). If it is 1, we return failure (3.). And if it's 2 or 3, we read up to 2 =s, and then the sum of seen + pad must be 4 (2.1).
(K_PAD can be removed.)

This was referenced Apr 9, 2026

add btoa()/atob() builtins #1143

Closed

Issue #16 references not having atob and btoa (enc/dec base64) #1032

Closed

saghul requested a review from bnoordhuis April 9, 2026 18:44

bnoordhuis reviewed Apr 9, 2026

View reviewed changes

saghul force-pushed the feat/base64 branch from d11c178 to dc2d64b Compare April 10, 2026 07:33

saghul mentioned this pull request Apr 10, 2026

Add Uint8Array base64/hex methods #1445

Draft

saghul force-pushed the feat/base64 branch from dc2d64b to e133c1e Compare April 11, 2026 07:52

bnoordhuis approved these changes Apr 11, 2026

View reviewed changes

bptato reviewed Apr 11, 2026

View reviewed changes

		written = b64_encode(in8, len, (char *)str8(ostr));
		str8(ostr)[written] = '\0';

Conversation

saghul commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chqrlie commented Apr 10, 2026

Uh oh!

saghul commented Apr 10, 2026

Uh oh!

saghul commented Apr 10, 2026

Uh oh!

chqrlie commented Apr 10, 2026

Uh oh!

saghul commented Apr 10, 2026

Uh oh!

bptato commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saghul commented Apr 10, 2026

Uh oh!

saghul commented Apr 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bptato commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

saghul commented Apr 9, 2026 •

edited

Loading

bptato commented Apr 10, 2026 •

edited

Loading