Skip to content

Conversation

nigrosimone
Copy link
Contributor

@nigrosimone nigrosimone commented Sep 27, 2025

Checklist

see #792 (comment)

fix #794
fix #793

@nigrosimone nigrosimone marked this pull request as draft September 27, 2025 11:50
@Uzlopak
Copy link
Contributor

Uzlopak commented Sep 27, 2025

This is imho not acceptable for fairness reasons. as the test is basically from @cesco69. You should try to give him atleast the credentials for the test.

IMHO checkout his PR by doing gh pr checkout 788, then create a new branch from this, as you have his commit in the git commit historie, add your solution as another commit on top of his commit.

@nigrosimone

This comment was marked as off-topic.

@Uzlopak
Copy link
Contributor

Uzlopak commented Sep 27, 2025

Ah, ok. :)

Yeah, i thought so too. I guess we need to revert it completely

@nigrosimone

This comment was marked as off-topic.

@nigrosimone nigrosimone deleted the is-well-formed branch September 27, 2025 12:11
@nigrosimone nigrosimone restored the is-well-formed branch September 27, 2025 12:53
@nigrosimone nigrosimone reopened this Sep 27, 2025
@nigrosimone

This comment was marked as outdated.

Copy link
Member

@gurgunday gurgunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Uzlopak, @cesco69 is my account at work :D we are the same person

Lol, good one cesco

@nigrosimone nigrosimone marked this pull request as ready for review September 27, 2025 15:35
@nigrosimone

This comment was marked as outdated.

@nigrosimone nigrosimone marked this pull request as draft September 28, 2025 06:33
@nigrosimone
Copy link
Contributor Author

nigrosimone commented Sep 28, 2025

I have made a benchmark for test only asString over different type of string (short, long, with and without surrogate ect)

TLTR: isWellFormed with a short regex (only ASCII escape) seem faster.

I have compared:

On my PC (cpu: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz)

runtime: node 20.14.0 (x64-win32)
asStringNew +1.47…+26.67x faster than asStringOld

asStringNew offers substantial performance gains over the old implementation, especially for longer or escape-heavy strings.

BENCHMARK

I have published the benchmark online

index.mjs

import { run, bench, summary, do_not_optimize } from "mitata";

/**
 * NEW IMPLEMENTATION OF asString
 * 
 * Uses String.prototype.isWellFormed() to skip regex check for surrogate pairs
 */
const ASCII_ESCAPE = /[\u0000-\u001f\u0022\u005c]/;
const asStringNew = (str) => {
  const len = str.length;

  if (len === 0) {
    return '""';
  } else if (len < 42) {
    // magically escape strings for json
    // relying on their charCodeAt
    // everything below 32 needs JSON.stringify()
    // every string that contain surrogate needs JSON.stringify()
    // 34 and 92 happens all the time, so we
    // have a fast case for them
    let result = "";
    let last = -1;
    let point = 255;
    for (let i = 0; i < len; i++) {
      point = str.charCodeAt(i);
      if (
        point === 0x22 || // '"'
        point === 0x5c // '\'
      ) {
        last === -1 && (last = 0);
        result += str.slice(last, i) + "\\";
        last = i;
      } else if (point < 32 || (point >= 0xd800 && point <= 0xdfff)) {
        // The current character is non-printable characters or a surrogate.
        return JSON.stringify(str);
      }
    }
    return (
      (last === -1 && '"' + str + '"') || '"' + result + str.slice(last) + '"'
    );
  } else if (len < 5000 && str.isWellFormed() && ASCII_ESCAPE.test(str) === false) {
    // Only use the regular expression for shorter input. The overhead is otherwise too much.
    return '"' + str + '"';
  } else {
    return JSON.stringify(str);
  }
};

/**
 * OLD IMPLEMENTATION OF asString
 * 
 * Uses a regex to check for surrogate pairs
 */
const STR_ESCAPE = /[\u0000-\u001f\u0022\u005c\ud800-\udfff]/;
const asStringOld = (str) => {
  const len = str.length;
  if (len === 0) {
    return '""';
  } else if (len < 42) {
    // magically escape strings for json
    // relying on their charCodeAt
    // everything below 32 needs JSON.stringify()
    // every string that contain surrogate needs JSON.stringify()
    // 34 and 92 happens all the time, so we
    // have a fast case for them
    let result = "";
    let last = -1;
    let point = 255;
    for (let i = 0; i < len; i++) {
      point = str.charCodeAt(i);
      if (
        point === 0x22 || // '"'
        point === 0x5c // '\'
      ) {
        last === -1 && (last = 0);
        result += str.slice(last, i) + "\\";
        last = i;
      } else if (point < 32 || (point >= 0xd800 && point <= 0xdfff)) {
        // The current character is non-printable characters or a surrogate.
        return JSON.stringify(str);
      }
    }
    return (
      (last === -1 && '"' + str + '"') || '"' + result + str.slice(last) + '"'
    );
  } else if (len < 5000 && STR_ESCAPE.test(str) === false) {
    // Only use the regular expression for shorter input. The overhead is otherwise too much.
    return '"' + str + '"';
  } else {
    return JSON.stringify(str);
  }
};

/**
 * BENCHMARKS CASES
 */
const shortString = 'a'.repeat(10)
const longString = 'a'.repeat(1000)
const shortDirtyString = shortString + '\n'
const longDirtyString = longString + '\n'
const surrogate = '\uD800';
const escapeChar = '"';
const shortSurrogateString = 'a'.repeat(10) + surrogate;
const shortQuoteString = 'a'.repeat(10) + escapeChar;
const longSurrogateString = 'a'.repeat(1000) + surrogate;
const longQuoteString = 'a'.repeat(1000) + escapeChar;

const args = { 
  str: [
    '',
    shortString, longString, 
    shortDirtyString, longDirtyString,
    shortSurrogateString, shortQuoteString,
    longSurrogateString, longQuoteString
  ] 
};

/**
 * BENCHMARKS IMPLEMENTATIONS
 */
summary(() => {
  bench("asStringNew", function* xNew(state) {
    const str = state.get('str');
    yield () => do_not_optimize(asStringNew(str));
  }).gc("inner").args(args);

  bench("asStringOld", function* xOld(state) {
    const str = state.get('str');
    yield () => do_not_optimize(asStringOld(str));
  }).gc("inner").args(args);
});

await run({
  format: "mitata",
  colors: true,
  throw: true,
});

run with

node --expose-gc --allow-natives-syntax index.mjs --iterations=50000

RAW RESULT

node --expose-gc --allow-natives-syntax index.mjs --iterations=50000

clk: ~2.96 GHz
cpu: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
runtime: node 20.14.0 (x64-win32)

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
asStringNew                    6.67 ns/iter   7.69 ns ▃ █▂ ▆▅
                       (2.29 ns … 38.26 ns)  20.02 ns ███████▄▃▂▃▂▃▄▃▁▂▁▁▁▁
                  gc(  1.75 ms …   6.64 ms)   0.50  b (  0.17  b… 42.74  b)

asStringNew                  120.74 ns/iter 143.55 ns     █▃▂   ▃▂▆ ▃
                     (51.32 ns … 231.05 ns) 202.03 ns ▂▂▂▅███▄▆██████▆▂▂▁▁▂
                  gc(  1.91 ms …   6.16 ms)  60.07  b (  6.28  b…144.35  b)

asStringNew                    1.04 µs/iter   1.04 µs ▃██▅▂
                      (916.89 ns … 1.57 µs)   1.48 µs █████▂▄▄▁▂▃▄▃▂▂▂▂▂▂▁▂
                  gc(  1.79 ms …   5.84 ms)  78.31  b ( 64.19  b…340.62  b)

asStringNew                  269.22 ns/iter 321.44 ns ██▅
                    (125.76 ns … 875.37 ns) 842.99 ns ███▆▆▆▆▃▄▃▁▁▂▁▂▁▂▂▂▁▂
                  gc(  1.72 ms …   6.89 ms)  62.08  b (  5.14  b…238.20  b)

asStringNew                    6.78 µs/iter   8.08 µs   ▅▂    █▂    ▂▂
                       (4.26 µs … 10.95 µs)   9.87 µs ▄▄██▄▄▄▄██▇▇▄▁██▇▁▁▁▄
                  gc(  2.18 ms …   7.96 ms)   2.23 kb (  2.19 kb…  2.23 kb)

asStringNew                  702.15 ns/iter 826.42 ns     ▇▂▃▃ █
                      (378.81 ns … 1.38 µs)   1.37 µs ███▅████▆█▇▃▆▂▂▁▂▁▃▃▂
                  gc(  1.93 ms …   9.78 ms) 205.19  b (142.14  b…473.12  b)

asStringNew                  201.91 ns/iter 248.00 ns   ▆█▇▄ ▃▅▇  ▃▆▃
                    (100.61 ns … 357.50 ns) 340.36 ns ▂▄█████████████▆▂▁▃▅▃
                  gc(  1.78 ms …   6.80 ms) 168.70  b ( 43.10  b…361.37  b)

asStringNew                    5.95 µs/iter   6.21 µs             █▂
                        (3.06 µs … 8.44 µs)   7.83 µs ▃▁▁▁▃▃▁▃▆▄▃▇██▁▆▄▁▃▃▄
                  gc(  2.09 ms …   7.20 ms) 200.28  b (199.85  b…212.28  b)

asStringNew                    7.87 µs/iter   8.52 µs           █
                       (5.33 µs … 10.24 µs)   9.85 µs ▃▁▁▁▃▅▅▃▃▇█▃▃▅▃▃▁▅▅▁▅
                  gc(  2.33 ms …   7.66 ms)   2.23 kb (  2.23 kb…  2.24 kb)

asStringOld                   11.60 µs/iter  12.30 µs      ▅█ ▆█
                       (8.10 µs … 33.60 µs)  16.00 µs ▃▄▄▄▃██████▇▇▃▅▃▄▂▂▂▄
                  gc(  2.78 ms …  11.86 ms) 519.60  b (504.00  b…  2.64 kb)

asStringOld                  177.89 ns/iter 208.23 ns            ▂▅█▃▆▄
                     (86.43 ns … 252.03 ns) 244.36 ns ▂▇▃▅█▇▆▅▇▄▆████████▆▃
                  gc(  1.86 ms …   8.59 ms)  54.05  b ( 22.36  b…191.69  b)

asStringOld                    1.88 µs/iter   2.22 µs    ▃    ▆▆█▅ ▅▆
                      (981.91 ns … 3.37 µs)   2.94 µs ▆▄▄██▄██████████▃▁▃▁▃
                  gc(  1.83 ms …   9.18 ms)  65.97  b ( 32.11  b…288.79  b)

asStringOld                  275.36 ns/iter 361.18 ns ▄█
                    (168.02 ns … 732.08 ns) 514.65 ns ██▆▅▄▃▅▄▃▃▃▅▃▆▅▄▃▁▁▁▁
                  gc(  1.76 ms …   6.52 ms)  50.09  b ( 48.11  b…163.04  b)

asStringOld                    5.48 µs/iter   6.73 µs  █             ▃
                        (3.36 µs … 7.92 µs)   7.77 µs ██▆▄▆█▁█▁█▁▆▆▄▄█▄▄█▆█
                  gc(  2.24 ms …   7.49 ms)   2.23 kb (  2.20 kb…  2.25 kb)

asStringOld                  732.46 ns/iter 868.70 ns  ▇ ▃    █
                      (368.90 ns … 1.61 µs)   1.56 µs ▇█████▄▇██▅▂▃▂▃▂▃▁▃▂▂
                  gc(  1.81 ms …   5.73 ms) 199.43  b ( 92.16  b…360.21  b)

asStringOld                  191.50 ns/iter 250.49 ns ██
                    (119.07 ns … 360.08 ns) 336.96 ns ██▇▆▅▄▅▂▆▃▅▃▄▄▄▆▅▆▂▁▁
                  gc(  1.76 ms …  10.20 ms) 165.59  b ( 40.81  b…452.06  b)

asStringOld                    4.80 µs/iter   5.75 µs █
                        (3.12 µs … 7.11 µs)   7.04 µs █▃▄▃▆▄▆▄▇▁▆▄▃▆▄▁▃▆▃▆▄
                  gc(  1.97 ms …   4.95 ms) 199.61  b (173.26  b…222.61  b)

asStringOld                    3.43 µs/iter   3.31 µs █
                        (2.92 µs … 6.92 µs)   6.88 µs ██▅▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂
                  gc(  1.76 ms …   7.42 ms)   2.23 kb (  2.21 kb…  2.27 kb)

summary
  asStringNew
   +1.47…+26.67x faster than asStringOld

Side note for @Uzlopak in my previous comment I say it is slower, but it is slow only on browser side

The built in bench

PR

long string without double quotes........................ x 12,424 ops/sec ±1.41% (186 runs sampled)
long string.............................................. x 12,072 ops/sec ±1.29% (184 runs sampled)

MASTER from #707

long string without double quotes........................ x 15,398 ops/sec ±0.24% (192 runs sampled)
long string.............................................. x 15,779 ops/sec ±0.22% (193 runs sampled)

PS sorry for the bad english

@nigrosimone nigrosimone marked this pull request as ready for review September 28, 2025 06:56
@mcollina
Copy link
Member

@nigrosimone can you please include a revert of the revert of @climba03003 original commits? I've added tests for all listed regressions

Signed-off-by: Nigro Simone <nigro.simone@gmail.com>
@nigrosimone
Copy link
Contributor Author

Done

Copy link
Contributor

@Uzlopak Uzlopak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSLGTM

To be honest, I cant follow the benchmarks. mitata is not the best benchmark out there. Actually tinybench is afaik the best benchmark out there, which is the reason it is used for benchmarks in nodejs.

If you post that master has 15.000 ops and this PR has 12.000 ops then this PR is slower. But as I said I cant read the mitata benchmarks. Or lets say, I dont want to dig deep into them.

Disclaimer: I am also maintainer of tinybench.

@nigrosimone
Copy link
Contributor Author

Ok, I can try to make a benchmark with Tiny bench

@nigrosimone nigrosimone marked this pull request as draft September 29, 2025 11:35
@cesco69
Copy link
Contributor

cesco69 commented Sep 29, 2025

@Uzlopak for tinybench asStringNew seem slower

Result

node .\index.mjs
asString
┌─────────┬───────────────┬──────────────────┬───────────────────┬────────────────────────┬────────────────────────┬─────────┐
│ (index) │ Task name     │ Latency avg (ns) │ Latency med (ns)  │ Throughput avg (ops/s) │ Throughput med (ops/s) │ Samples │
├─────────┼───────────────┼──────────────────┼───────────────────┼────────────────────────┼────────────────────────┼─────────┤
│ 0       │ 'asStringNew''7098.5 ± 0.12%''7100.0 ± 100.00''142612 ± 0.04%''140845 ± 2012'        │ 140875  │
│ 1       │ 'asStringOld''6991.4 ± 0.69%''6900.0 ± 400.00''145945 ± 0.05%''144928 ± 7941'        │ 143033  

Benchmark

import { Bench } from 'tinybench'

/**
 * NEW IMPLEMENTATION OF asString
 * 
 * Uses String.prototype.isWellFormed() to skip regex check for surrogate pairs
 */
const ASCII_ESCAPE = /[\u0000-\u001f\u0022\u005c]/;
const asStringNew = (str) => {
    const len = str.length;

    if (len === 0) {
        return '""';
    } else if (len < 42) {
        // magically escape strings for json
        // relying on their charCodeAt
        // everything below 32 needs JSON.stringify()
        // every string that contain surrogate needs JSON.stringify()
        // 34 and 92 happens all the time, so we
        // have a fast case for them
        let result = "";
        let last = -1;
        let point = 255;
        for (let i = 0; i < len; i++) {
            point = str.charCodeAt(i);
            if (
                point === 0x22 || // '"'
                point === 0x5c // '\'
            ) {
                last === -1 && (last = 0);
                result += str.slice(last, i) + "\\";
                last = i;
            } else if (point < 32 || (point >= 0xd800 && point <= 0xdfff)) {
                // The current character is non-printable characters or a surrogate.
                return JSON.stringify(str);
            }
        }
        return (
            (last === -1 && '"' + str + '"') || '"' + result + str.slice(last) + '"'
        );
    } else if (len < 5000 && str.isWellFormed() && ASCII_ESCAPE.test(str) === false) {
        // Only use the regular expression for shorter input. The overhead is otherwise too much.
        return '"' + str + '"';
    } else {
        return JSON.stringify(str);
    }
};

/**
 * OLD IMPLEMENTATION OF asString
 * 
 * Uses a regex to check for surrogate pairs
 */
const STR_ESCAPE = /[\u0000-\u001f\u0022\u005c\ud800-\udfff]/;
const asStringOld = (str) => {
    const len = str.length;
    if (len === 0) {
        return '""';
    } else if (len < 42) {
        // magically escape strings for json
        // relying on their charCodeAt
        // everything below 32 needs JSON.stringify()
        // every string that contain surrogate needs JSON.stringify()
        // 34 and 92 happens all the time, so we
        // have a fast case for them
        let result = "";
        let last = -1;
        let point = 255;
        for (let i = 0; i < len; i++) {
            point = str.charCodeAt(i);
            if (
                point === 0x22 || // '"'
                point === 0x5c // '\'
            ) {
                last === -1 && (last = 0);
                result += str.slice(last, i) + "\\";
                last = i;
            } else if (point < 32 || (point >= 0xd800 && point <= 0xdfff)) {
                // The current character is non-printable characters or a surrogate.
                return JSON.stringify(str);
            }
        }
        return (
            (last === -1 && '"' + str + '"') || '"' + result + str.slice(last) + '"'
        );
    } else if (len < 5000 && STR_ESCAPE.test(str) === false) {
        // Only use the regular expression for shorter input. The overhead is otherwise too much.
        return '"' + str + '"';
    } else {
        return JSON.stringify(str);
    }
};

/**
 * BENCHMARKS CASES
 */
const shortString = 'a'.repeat(10)
const longString = 'a'.repeat(1000)
const shortDirtyString = shortString + '\n'
const longDirtyString = longString + '\n'
const surrogate = '\uD800';
const escapeChar = '"';
const shortSurrogateString = 'a'.repeat(10) + surrogate;
const shortQuoteString = 'a'.repeat(10) + escapeChar;
const longSurrogateString = 'a'.repeat(1000) + surrogate;
const longQuoteString = 'a'.repeat(1000) + escapeChar;

const strs = [
    '',
    shortString, longString,
    shortDirtyString, longDirtyString,
    shortSurrogateString, shortQuoteString,
    longSurrogateString, longQuoteString
];

/**
 * BENCHMARKS IMPLEMENTATIONS
 */
const bench = new Bench({ name: 'asString', time: 1000 })

bench
    .add('asStringNew', () => {
        for (const str of strs) {
            asStringNew(str)
        }
    })
    .add('asStringOld', () => {
        for (const str of strs) {
            asStringOld(str)
        }
    })

await bench.run()

console.log(bench.name)
console.table(bench.table())

Copy link
Member

@climba03003 climba03003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing the mistake.

@mcollina mcollina marked this pull request as ready for review September 30, 2025 08:57
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina
Copy link
Member

There are conflicts now.

@cesco69
Copy link
Contributor

cesco69 commented Sep 30, 2025

Guys, this implementation seem slower...

@gurgunday
Copy link
Member

Oh I thought it was fixing something else actually, then yeah maybe let's keep it as is?

@Uzlopak
Copy link
Contributor

Uzlopak commented Sep 30, 2025

I am closing this, as this part of the code made already too much confusion and friction.

We should first make some good benchmarks for strings, with various cases. E.g. I think one of the tests is repeated NUL-character (\u0000), which is basically pointless, because it should have the same performance like a string with length of 1 with one NUL character. So we should make some smart test cases and then balance the performance.

Also using tinybench, as it is more correct than any other benchmark engine right now on the market. E.g. one of the tables in benchmark.js is mathematically wrong. mitata is programmatically not correct.

@Uzlopak Uzlopak closed this Sep 30, 2025
@Uzlopak Uzlopak mentioned this pull request Sep 30, 2025
@cesco69
Copy link
Contributor

cesco69 commented Sep 30, 2025

Side note,
I’d like to propose the implementation of JSON.isWellFormed() to TC39: https://es.discourse.group/t/json-iswellformed-str-string-boolean

It will probably be very difficult for the committee to approve such a niche idea.
@mcollina since you’re a TSC member, you have the power!

A dedicated JSON.isWellFormed method would provide a fast, standardized way to check this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6.1.0: Breaks Fastify error serialization for messages which contain quotes Newlines in serialized bodies
6 participants