Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaw in the benchmark? #2

Closed
josdejong opened this issue Sep 5, 2023 · 42 comments
Closed

Flaw in the benchmark? #2

josdejong opened this issue Sep 5, 2023 · 42 comments

Comments

@josdejong
Copy link

I was a bit surprised to see Papaparse being twice as fast as csv42 in many of the benchmarks. I did similar benchmarks before and there csv42 was always at least a bit faster than Papaparse. Now, it can be that I'm overlooking something here and comparing apples with pears, but I would love to understand what's going on, these benchmarks should give similar kind of results.

I created a minimal benchmark to try figure out what's going on: https://github.com/josdejong/csv-benchmark

The results show csv42 being twice as fast as Papaparse with for example the HPI_master.csv file and also another file:

udsv x 9.42 ops/sec ±3.74% (28 runs sampled)
csv42 x 4.38 ops/sec ±3.70% (16 runs sampled)
papaparse x 2.33 ops/sec ±2.47% (10 runs sampled)

Can it be that papaparse isn't configured correctly? When omitting the header: true setting, it will create arrays instead of objects which is about 3 times faster, giving results similar to the results shown here. Not sure, but here papaparse isn't configured with header: true: https://github.com/leeoniya/uDSV/blob/main/bench/non-streaming/untyped/PapaParse.cjs.

Did you verify that the output is indeed what you expect for all the libraries in all the benchmarks? (i.e. an object with parsed numbers and booleans).

@leeoniya
Copy link
Owner

leeoniya commented Sep 5, 2023

i started by using Benchmark.js as well but its results were always significantly slower than what i measured in a raw loop. so i wrote something that seemed to have lower overhead. i tried https://github.com/tinylibs/tinybench just to make sure i wasn't crazy, and it gave me the same results as what i expected, and faster than Benchmark.js.

maybe it's the runner, maybe we can dig a bit further to see if it's just that or something else.

@josdejong
Copy link
Author

A benchmark runner with more overhead would just make all (absolute) results equally lower, right? But I don't think it would impact how the results compare to each other (relatively).

@leeoniya
Copy link
Owner

leeoniya commented Sep 5, 2023

i will try to dig in a bit more and would appreciate help here. this benchmark loop is super straightforward, so 🤷.

i think a lot depends on how much breathing room the bench runner gives to the GC between each cycle. in combo with how much each lib stresses the GC.

can you try tinybench and see what you get?

@leeoniya
Copy link
Owner

leeoniya commented Sep 6, 2023

i think your Papa Parse numbers are worse because you're using header: true, while i'm using header: false to get max perf out of it.

the difference is huge: https://github.com/leeoniya/uDSV/tree/main/bench#output-formats

it's a legit setting for the nested parsing comparison but as i wrote in the csv42 repo, Papa Parse + flat does not produce valid output, so my benchmarks do not compare Papa for the structured case.

@leeoniya leeoniya closed this as completed Sep 6, 2023
@josdejong
Copy link
Author

josdejong commented Sep 6, 2023

i think your Papa Parse numbers are worse because you're using header: true, while i'm using header: false to get max perf out of it.

the difference is huge: https://github.com/leeoniya/uDSV/tree/main/bench#output-formats

it's a legit setting for the nested parsing comparison

The {header: true} option is indeed the difference here, and this has a huge impact indeed (parsing into arrays or into objects). Just like there is a serious difference in either leaving all parsed values a string vs parsing numeric values into numbers and dates into dates. That requires extra processing. But in the real world, a developer can't just remove that option {header: true} "to get max perf" like you did in the benchmark: then the application will break since the parser suddenly produces a totally different kind of output.

If you mix these things in a single benchmark, you're comparing apples with pears.

A useful benchmark should compare different ways to get to the same output. And it should create different categories of benchmarks for different options resulting in other types of output (unless it explicitly wants to show the impact of different types of outputs like here of course). Else the benchmark is meaningless and misleading for users that want to make a comparison between either library X or Y for their use-case Z.

but as i wrote in the csv42 repo, Papa Parse + flat does not produce valid output, so my benchmarks do not compare Papa for the structured case.

Can you share what you tried? I just tested papa+flat again to be sure, and it does actually work as intended: parsing nested objects and producing the same output as the other CSV libraries that I compare it with.

@leeoniya
Copy link
Owner

leeoniya commented Sep 6, 2023

the problem with comparing the same outputs is that a bunch of libraries dont support one or the other output, which means you have to do additional translation in userland, which is extremely expensive.

in my view, the core of parsing is simply finding the correct column and row offsets in the full csv string. this is what allows you to access the values. imo if you have objects or tuples, there is little practical difference which format you structure your app around, and swapping formats is trivial. i did not want to penalize 20% of the libs in either direction. i also have no desire to show/maintain/rerun twice the amount of benchmarks than i already do :D

typed parsing is indeed very different, since you cannot avoid type conversion if you actually need it. so i do have that broken out.

But in the real world, a developer can't just remove that option {header: true} "to get max perf" like you did in the benchmark: then the application will break since the parser suddenly produces a totally different kind of output.

im not trying to show what libraries you can swap into existing apps with zero effort. with a bunch of them you cannot since they have different apis, like async, streaming, etc. my goal here is to show the max possible parsing perf of each lib. you can always fiddle with options and datasets to then shoot yourself in the foot, perf-wise.

Can you share what you tried? I just josdejong/csv42#3 (comment) to be sure, and it does actually work as intended: parsing nested objects and producing the same output as the other CSV libraries that I compare it with.

i'll dig it up today...

@leeoniya
Copy link
Owner

leeoniya commented Sep 6, 2023

i put a breakpoint in here. size and geo are wrong, what am i missing?:

module.exports = {
name: 'PapaParse deep {}',
repo: 'https://github.com/mholt/PapaParse',
load: async () => {
const { default: Papa } = await import('papaparse');
const { default: flat } = await import('flat');
return (csvStr, path) => new Promise(res => {
let deep = flat.unflatten(Papa.parse(csvStr, { header: true, transform }).data.map(flat.unflatten));
res(deep);
});
},
};

image

@josdejong
Copy link
Author

Looking at the code:

flat.unflatten(Papa.parse(csvStr, { header: true, transform }).data.map(flat.unflatten))

you run unflatten twice, shouldn't it just be:

Papa.parse(csvStr, { header: true, transform }).data.map(flat.unflatten)

@leeoniya
Copy link
Owner

leeoniya commented Sep 6, 2023

did not help :(

@josdejong
Copy link
Author

I think the reason is that flatten splits property names by dot ., and in your example, the nested array values use square brackets like geo[1] notation and not dot notation like geo.1.

@leeoniya
Copy link
Owner

leeoniya commented Sep 6, 2023

i can add support for this as well, but i took the format directly from your article, where you also post the Papa benchmarks:

image

@josdejong
Copy link
Author

Ahh, I will have a look at that and correct it.

@josdejong
Copy link
Author

josdejong commented Sep 6, 2023

the problem with comparing the same outputs is that a bunch of libraries dont support one or the other output, which means you have to do additional translation in userland, which is extremely expensive.

True. There are only tree main outputs though: array, object, or nested object. It's not that complex, and these different formats have a large impact on how usable the data is. It's very relevant for the user, which probably has to do this "expensive extra translation" himself it the output format doesn't align with what the user needs. It is very helpful if the benchmark is transparent about this.

my goal here is to show the max possible parsing perf of each lib

I understand your point, thanks for elaborating. I don't think this is a good approach though. In my opinion, a good benchmark should change only a single property at a time and compare that. It should not change two different properties at the same time, that's just mixing apples and pears. So, one benchmark can keep the output the same and compares different CSV libraries, and a second benchmark can compare different output formats (array, object, nested object) so you can get a feel for the impact of that.

i also have no desire to show/maintain/rerun twice the amount of benchmarks than i already do :D

Ehhhh... I hope that's not a real argument to decide which benchmarks are useful to give the best insight 😉

typed parsing is indeed very different, since you cannot avoid type conversion if you actually need it.
[...]
swapping formats is trivial

No and no. The same argument holds for objects: maybe the table component that you need requires objects. Both type parsing and object parsing are optional steps, and depending on the use-case you cannot avoid either of them. And these optional processing steps are not trivial, they can come with a large performance penalty, so, in a performance benchmark it is definitely relevant to recon with them.

@leeoniya
Copy link
Owner

leeoniya commented Sep 6, 2023

Ehhhh... I hope that's not a real argument to decide which benchmarks are useful to give the best insight 😉

running these benchmarks is currently a pretty manual process, since not all libs can run all datasets, etc. in addition i have to manually update the tables in the writeup. the whole thing takes 90mins to complete...it is really tedious. it would be dishonest if i said there is zero marketing aspect to these benchmarks. i dont have an infinite amount of time on my hands to demonstrate every permutation of libs, options, datasets, manual userland conversion, etc, beyond what is necessary to demonstrate uDSV's performance. i think that's fair, given how much effort is involved here.

i have shown quite well that Papa degrades massively once you switch it to object mode and enable typing (this is a big reason i added the Sample column). i expect users to apply these demonstrated handicaps/caveats mentally. if you have an interest in showing that csv42 is much faster than Papa+flat at the typed/nested case, then you have already done so in your own writeup, and i have no concerns that your benchmarks are inaccurate, you clearly also did your homework.

i will add Papa + flat to my table once i get its output to be the same, which may require additional litmus gen and i'll need to update uDSV to support the dot-index notation.

@josdejong
Copy link
Author

Working out benchmarks is definitely very time-intensive 😅. Ideally we should create one fully automated benchmark repo for all of this. Not sure where to find the time for that though.

Maybe what frustrates me is that your benchmark may be fair to uDSV and gives the right perspective there (and uDSV is actually very fast, credits there!), but at the same time the benchmark gives the impression that csv42 is much slower than the popular papaparse whereas csv42 actually does do very well when properly comparing apples to apples.

@leeoniya
Copy link
Owner

leeoniya commented Sep 6, 2023

i get it. i certainly have no nefarious plan to make other parsers look worse than their best self, quite the opposite, in fact!

not everyone parses to or needs to parse to objects, nested objects are niche, and typing may in fact be unneeded if all you're doing is sticking the data into JSX to render a table.

it makes no sense to force libs into a slow path without knowing the final use case, and i dont intend to dictate one here.

if you add a faster, untyped tuple mode to csv42, then i will gladly use it!

@josdejong
Copy link
Author

👍

@leeoniya
Copy link
Owner

leeoniya commented Sep 6, 2023

i added support for dot notation in 91cea7b, which makes flat work with Papa. the plot twist is that csv42 is now not outputting arrays?

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ csv42_nested_10k_dot.csv (1.6 MB, 9 cols x 10K rows)                                                                                                                                                           │
├─────────────────────┬────────┬─────────────────────────────────────────────────────────────┬─────────────────────────────────┬────────────────────────────┬────────────────────────────────────────────────────┤
│ Name                │ Rows/s │ Throughput (MiB/s)                                          │ RSS above 47 MiB baseline (MiB) │ Types                      │ Sample                                             │
├─────────────────────┼────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────┼────────────────────────────┼────────────────────────────────────────────────────┤
│ uDSV typed deep {}  │ 1.1M   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 173 │ ░░░░░░░░░░░░░░░ 64.9            │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
│ csv42 typed deep {} │ 289K   │ ░░░░░░░░░░░░░░░ 45.4                                        │ ░░░░░░░░░░░░░░░░ 71             │ number,object,string       │ [{"_type":"item","name":"Item 2","description":"It │
│ PapaParse deep {}   │ 111K   │ ░░░░░░ 17.5                                                 │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 123 │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
└─────────────────────┴────────┴─────────────────────────────────────────────────────────────┴─────────────────────────────────┴────────────────────────────┴────────────────────────────────────────────────────┘

@josdejong
Copy link
Author

Nice! Yeah the funny thing is that flat only supports dot notation like location.geo.1, and csv42 only supports array notation location.geo[1]. Benchmarks are so funny 😁 . I was planning to add support for dot notation like location.geo.1 to csv42 so we can compare both papaparse and csv42 with the same csv file, but I haven't found time for that today. Will let you know.

@josdejong
Copy link
Author

I've extended csv42 to support dot notation too so it is compatible with flat. However, one of the libraries that is part of the benchmark (json2csv) does not support nested arrays at all. Therefore I changed the test data of the benchmark to only contain nested objects, not nested arrays. The outcome of the benchmark doesn't really differ from before though.

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

the parseValue() improvements are 👍 . it was super weird that it received strings containing the quote escapes before.

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ csv42_nested_10k_dot.csv (1.6 MB, 9 cols x 10K rows)                                                                                                                                                           │
├─────────────────────┬────────┬─────────────────────────────────────────────────────────────┬─────────────────────────────────┬────────────────────────────┬────────────────────────────────────────────────────┤
│ Name                │ Rows/s │ Throughput (MiB/s)                                          │ RSS above 45 MiB baseline (MiB) │ Types                      │ Sample                                             │
├─────────────────────┼────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────┼────────────────────────────┼────────────────────────────────────────────────────┤
│ uDSV typed deep {}  │ 1.12M  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 175 │ ░░░░░░░░░░░░░░ 64.9             │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
│ csv42 typed deep {} │ 261K   │ ░░░░░░░░░░░░░ 41                                            │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 131 │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
│ PapaParse deep {}   │ 111K   │ ░░░░░░ 17.4                                                 │ ░░░░░░░░░░░░░░░░░░░░░░░░░░ 123  │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
└─────────────────────┴────────┴─────────────────────────────────────────────────────────────┴─────────────────────────────────┴────────────────────────────┴────────────────────────────────────────────────────┘

@josdejong
Copy link
Author

josdejong commented Sep 7, 2023

I've done a small test with adding uDSV to the benchmarks I did in csv42 to see if they give comparable results. The results you post above you claim that uDSV is 4.5x faster than csv42, however when I run tests I "only" see a difference of 2-3 times faster for both flat and nested objects. I had expected similar results and not such a large difference. Not sure where that can originate from (hardware? used benchmark lib? slightly differing data?). I get the same kind of results (factor 2 on the nested data set) when trying a simple console.time() -> parse -> console.timeEnd() with the two libraries.

How can I reproduce your 4.5x difference with the nested json data?

SECTION 3: FLAT CSV to JSON

benchmark: { name: 'flat csv to json', size: '15 KB', rows: 100 }
1:csv42              x 3,241 ops/sec ±0.54% (78 runs sampled)
3:csv                x 361 ops/sec ±1.20% (83 runs sampled)
4:papaparse          x 2,159 ops/sec ±2.22% (76 runs sampled)
5:fast-csv           x 615 ops/sec ±0.69% (83 runs sampled)
6:json-2-csv         x 1,346 ops/sec ±1.18% (83 runs sampled)
7:udsv               x 9,612 ops/sec ±0.27% (82 runs sampled)

benchmark: { name: 'flat csv to json', size: '147 KB', rows: 1000 }
1:csv42              x 343 ops/sec ±0.41% (84 runs sampled)
3:csv                x 49.69 ops/sec ±0.46% (68 runs sampled)
4:papaparse          x 234 ops/sec ±1.95% (78 runs sampled)
5:fast-csv           x 61.67 ops/sec ±0.23% (68 runs sampled)
6:json-2-csv         x 139 ops/sec ±0.39% (70 runs sampled)
7:udsv               x 1,117 ops/sec ±0.38% (83 runs sampled)

benchmark: { name: 'flat csv to json', size: '1.5 MB', rows: 10000 }
1:csv42              x 33.44 ops/sec ±0.70% (68 runs sampled)
3:csv                x 4.48 ops/sec ±2.31% (25 runs sampled)
4:papaparse          x 23.15 ops/sec ±2.49% (57 runs sampled)
5:fast-csv           x 6.18 ops/sec ±1.33% (33 runs sampled)
6:json-2-csv         x 13.63 ops/sec ±0.71% (59 runs sampled)
7:udsv               x 86.80 ops/sec ±4.11% (76 runs sampled)

benchmark: { name: 'flat csv to json', size: '15 MB', rows: 100000 }
1:csv42              x 3.00 ops/sec ±2.40% (19 runs sampled)
3:csv                x 0.42 ops/sec ±3.22% (7 runs sampled)
4:papaparse          x 2.24 ops/sec ±3.40% (15 runs sampled)
5:fast-csv           x 0.61 ops/sec ±1.47% (8 runs sampled)
6:json-2-csv         x 1.31 ops/sec ±0.73% (11 runs sampled)
7:udsv               x 7.96 ops/sec ±4.44% (41 runs sampled)

SECTION 4: NESTED CSV to JSON

benchmark: { name: 'nested csv to json', size: '16 KB', rows: 100 }
1:csv42              x 2,454 ops/sec ±0.32% (80 runs sampled)
3:csv (+flat)        x 306 ops/sec ±0.37% (84 runs sampled)
4:papaparse (+flat)  x 796 ops/sec ±2.38% (82 runs sampled)
5:fast-csv (+flat)   x 392 ops/sec ±0.31% (83 runs sampled)
6:json-2-csv         x 1,240 ops/sec ±0.44% (82 runs sampled)
7:udsv               x 5,627 ops/sec ±0.34% (83 runs sampled)

benchmark: { name: 'nested csv to json', size: '159 KB', rows: 1000 }
1:csv42              x 251 ops/sec ±0.23% (84 runs sampled)
3:csv (+flat)        x 32.71 ops/sec ±0.37% (68 runs sampled)
4:papaparse (+flat)  x 83.85 ops/sec ±1.91% (68 runs sampled)
5:fast-csv (+flat)   x 38.75 ops/sec ±0.32% (84 runs sampled)
6:json-2-csv         x 93.84 ops/sec ±26.50% (60 runs sampled)
7:udsv               x 527 ops/sec ±4.32% (65 runs sampled)

benchmark: { name: 'nested csv to json', size: '1.6 MB', rows: 10000 }
1:csv42              x 17.78 ops/sec ±6.37% (44 runs sampled)
3:csv (+flat)        x 2.34 ops/sec ±7.07% (16 runs sampled)
4:papaparse (+flat)  x 6.13 ops/sec ±6.13% (33 runs sampled)
5:fast-csv (+flat)   x 3.07 ops/sec ±6.68% (19 runs sampled)
6:json-2-csv         x 10.53 ops/sec ±4.33% (50 runs sampled)
7:udsv               x 51.29 ops/sec ±5.97% (60 runs sampled)

benchmark: { name: 'nested csv to json', size: '16 MB', rows: 100000 }
1:csv42              x 1.85 ops/sec ±3.40% (13 runs sampled)
3:csv (+flat)        x 0.22 ops/sec ±7.88% (6 runs sampled)
4:papaparse (+flat)  x 0.57 ops/sec ±6.24% (7 runs sampled)
5:fast-csv (+flat)   x 0.29 ops/sec ±3.60% (6 runs sampled)
6:json-2-csv         x 1.08 ops/sec ±4.52% (10 runs sampled)
7:udsv               x 5.28 ops/sec ±5.46% (29 runs sampled)

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

hmm, yeah when i run in a simple loop for 200 cycles and grab the geometric mean i'm also seeing about ~3x:

uDSV: 73 ops/s
csv42: 24 ops/s
Papa: 11 ops/s

will investigate!

i cannot run your benchmarks due to some ts-node/env issue:

image

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

okay, i figured out what it is. my bench loop has an await/setImmediate() statement between each cycle. it is there intentionally to allow the GC to collect garbage between cycles. this simulates a realistic load by not allowing junk to accumulate while you're running a hot loop without any opportunity for the GC to run. it looks like this:

function geoMean(arr) {
  let logSum = arr.reduce((acc, val) => acc + Math.log(val), 0);
  return Math.exp(logSum / arr.length);
}

const sleep = () => new Promise(resolve => setImmediate(resolve));

let cycles = 200;
let durs = [];
let rss = [];

(async () => {
  let i = 0;

  while (i++ < cycles) {
    let st = performance.now();
    let schema = inferSchema(csvStr);
    let parser = initParser(schema);
    await Promise.resolve(parser.typedDeep(csvStr));
    await sleep();                                   // let GC collect garbage between each cycle
    durs.push(performance.now() - st);
    rss.push(process.memoryUsage.rss());
  }

  console.log((1e3 / geoMean(durs)).toFixed(1) + ' ops/s', (Math.max(...rss) / 1024 / 1024).toFixed(1) + ' peak RSS (MiB)');
})();

if you comment out that await sleep() line, uDSV drops from 108 ops/s to 72 ops/s. this "GC break" also helps the other parsers, but to a much lesser extent. i don't know exactly why.

with sleep():

uDSV: 108.4 ops/s 101.7 peak RSS (MiB)
csv42: 26.5 ops/s 141.8 peak RSS (MiB)
Papa:   9.8 ops/s 176.0 peak RSS (MiB)

without sleep():

uDSV:  71.7 ops/s 172.7 peak RSS (MiB)
csv42: 24.7 ops/s 175.9 peak RSS (MiB)
Papa:  10.5 ops/s 197.1 peak RSS (MiB)

you can see that if we just allow garbage to accumulate, the high RSS has an outsized impact on performance in cases where the GC can theoretically be efficient if given the opportunity.

i think this is the correct way to run benchmarks. not because it makes uDSV win more, but because it represents real load patterns more accurately, except for the case (perhaps) of batch-processing a directory of CSV files in a loop. typically if you do this in an HTTP request handler, or invoke a script to do this one file at a time, the GC will have plenty of time to run between each workload.

here is the full code if you'd like to test it:

const fs = require('fs');
const csvStr = fs.readFileSync('./bench/data/csv42_nested_10k_dot.csv', 'utf8');

const Papa = require('papaparse');
const flat = require('flat');
const { inferSchema, initParser } = require('../dist/uDSV.cjs.js');
const { csv2json } = require('csv42');

function geoMean(arr) {
  let logSum = arr.reduce((acc, val) => acc + Math.log(val), 0);
  return Math.exp(logSum / arr.length);
}

const sleep = () => new Promise(resolve => setImmediate(resolve));

let cycles = 200;
let durs = [];
let rss = [];

(async () => {
  let i = 0;

  while (i++ < cycles) {
    let st = performance.now();
    let schema = inferSchema(csvStr);
    let parser = initParser(schema);
    await Promise.resolve(parser.typedDeep(csvStr));
    await sleep();
    durs.push(performance.now() - st);
    rss.push(process.memoryUsage.rss());
  }

  console.log((1e3 / geoMean(durs)).toFixed(1) + ' ops/s', (Math.max(...rss) / 1024 / 1024).toFixed(1) + ' peak RSS (MiB)');
})();

// (async () => {
//   let i = 0;

//   while (i++ < cycles) {
//     let st = performance.now();
//     await Promise.resolve(csv2json(csvStr, { nested: true, header: true }));
//     await sleep();
//     durs.push(performance.now() - st);
//     rss.push(process.memoryUsage.rss());
//   }

//   console.log((1e3 / geoMean(durs)).toFixed(1) + ' ops/s', (Math.max(...rss) / 1024 / 1024).toFixed(1) + ' peak RSS (MiB)');
// })();


// (async () => {
//   function transform(value) {
//     const number = Number(value)
//     if (!isNaN(number) && !isNaN(parseFloat(value))) {
//       return number;
//     }

//     if (value === 'true') {
//       return true;
//     }

//     if (value === 'false') {
//       return false;
//     }

//     if (value === 'null' || value === '') {
//       return null;
//     }

//     if (value[0] === '{' || value[0] === '[') {
//       return JSON.parse(value);
//     }

//     return value;
//   }

//   let i = 0;

//   while (i++ < cycles) {
//     let st = performance.now();
//     await Promise.resolve(Papa.parse(csvStr, { header: true, transform }).data.map(flat.unflatten));
//     await sleep();
//     durs.push(performance.now() - st);
//     rss.push(process.memoryUsage.rss());
//   }

//   console.log((1e3 / geoMean(durs)).toFixed(1) + ' ops/s', (Math.max(...rss) / 1024 / 1024).toFixed(1) + ' peak RSS (MiB)');
// })();

@josdejong
Copy link
Author

josdejong commented Sep 7, 2023

i cannot run your benchmarks due to some ts-node/env issue:

yes, I encountered the same. It is broken when using nodejs 20, works on 18. I have to fix that.

there intentionally to allow the GC to collect garbage between cycles. this simulates a realistic load by not allowing junk to accumulate while you're running a hot loop without any opportunity for the GC to run

O wow, that is really interesting. I totally agree that a good benchmark should test a real world situation where you run the operation once. I was just testing with tinybench and benchmark.js (see here), which both give results similar to your output without sleep in between the tests. I'm quite surprised that popular benchmark libraries like benchmark.js and tinybench apparently don't do this right?!

test with benchmark.js
udsv x 69.02 ops/sec ±2.22% (72 runs sampled)
csv42 x 25.13 ops/sec ±0.61% (45 runs sampled)
papaparse+flat x 7.45 ops/sec ±3.86% (23 runs sampled)

test with tinybench
┌─────────┬──────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │    Task Name     │ ops/sec │ Average Time (ns)  │  Margin  │ Samples │
├─────────┼──────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│    0    │      'udsv'      │  '66'   │ 15095203.010659471 │ '±3.81%' │   133   │
│    1    │     'csv42'      │  '25'   │ 39896554.90950043  │ '±0.90%' │   51    │
│    2    │ 'papaparse+flat' │   '7'   │ 132883587.47959137 │ '±3.46%' │   16    │
└─────────┴──────────────────┴─────────┴────────────────────┴──────────┴─────────┘

Do you understand why running the operation just a single time, measuring with a console.time, gives quite different (but consistent) results? That's a true cold-start, but I'm not sure how close that is to "reality".

// run once, measure with console.time
udsv: 57.861ms
csv42: 47.727ms
papaparse+flat: 170.413ms

@josdejong
Copy link
Author

josdejong commented Sep 7, 2023

The tinybench library has a method beforeEach that runs before each cycle. I added that with an async setImmediate and setTimeout but that gives about no difference at all in the results 🤔 .

I'll give your code a try.

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

Do you understand why running the operation just a single time, measuring with a console.time, gives quite different (but consistent) results? That's a true cold-start, but I'm not sure how close that is to "reality".

you need enough cycles to warm up the JIT. first run will always be slower by 2x-10x.

@josdejong
Copy link
Author

josdejong commented Sep 7, 2023

That makes sense indeed.

Running your benchmark code on my machine gives more or less the same results as with other benchmark libraries:

udsv 67.2 ops/s 123.6 peak RSS (MiB)
csv42 24.3 ops/s 95.9 peak RSS (MiB)
papaparse+flat 6.9 ops/s 145.8 peak RSS (MiB)

I added the code here: https://github.com/josdejong/csv-benchmark/blob/main/benchmark-custom.js

Maybe this has to do with OS/Node.js version etc? I'm running Windows 11, and tested with Node.js 18 and 20.

Edit: running on Ubuntu (Windows WSL) gives comparable results too:

udsv 75.2 ops/s 133.0 peak RSS (MiB)
csv42 27.4 ops/s 110.9 peak RSS (MiB)
papaparse+flat 7.8 ops/s 171.1 peak RSS (MiB)

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

totally probable. could be CPU differences, branch prediction, etc. it's the wild west!

@josdejong
Copy link
Author

Running on Github actions gives the same kind of result (not a dedicated machine I guess so we need to take that with a grain of salt, but the values have similar ratio and are reproducible).

So what OS/machine are you using?

It's quite odd that this 1.5 times speed increase for udsv when applying sleep is only happening on your machine , and that this is not yet reproducible elsewhere.

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

i agree it's weird.

my env is here: https://github.com/leeoniya/uDSV/tree/main/bench#environment

@josdejong
Copy link
Author

Can you reproduce the 1.5x speed increase on an other machine?

A good benchmark should be reproducible (if not reproducible it's not very trustworthy).

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

can you first run all benchmarks in this repo on your existing machine, instead of just the csv42 dataset benchmarks?

after you do this and we see significant differences across multiple benchmarks i'll do more work.

i have an Intel i7 / Windows 10 machine i can try without WSL2 . i can also try this in a Debian Unstable Linode.

50% fluctuation between hardware and architectures and environments is not unexpected. it is likely impossible to reproduce the same results across everything, because everything is different.

i would not consider these benchmarks invalid simply because one library goes from No 10 to No 6 on different hardware. if uDSV ends up being No 5 instead of No 1, then that would be problematic; i don't want to tell people it's the fastest when it isn't. by how much is a more subtle, and variable answer.

@josdejong
Copy link
Author

josdejong commented Sep 7, 2023

Yes I was indeed trying to run uDSV/bench, but that requires getting all data files in place. I'll give that a try tomorrow, I hope I'll get it running.

The performance will indeed differ on different machines, though I would expect a more or less similar ratio.

To me, how much faster a library is is definitely relevant. If some library is only faster by "a bit", I don't have a big reason to switch (switching costs time and effort etc). But if it is like a factor 5 or 10 it is definitely a worth considering.

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

If some library is only faster by "a bit", I don't have a big reason to switch (switching costs time and effort etc). But if it is like a factor 5 or 10 it is definitely a worth considering.

except that csv42 is nowhere near a "a bit" slower. i don't think you're going to find any hardware on earth that will materially change this. we can re-visit this discussion when the deficit is 33%, not either of 66% or 75%.

looking forward to your runs. if i have time today i'll try the other devices. it may very well be "only" 3x.

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

btw, you dont need to waste time benchmarking the obviously-slow libs; sometimes there's just no hope.

for typed parsing, i think we can make some educated guesses from just these:

uDSV
csv42
d3-dsv
csv-simple-parser
achilles-csv-parser
PapaParse
csv-rex

@leeoniya
Copy link
Owner

leeoniya commented Sep 7, 2023

here is my Win 10 / i7 Desktop (showing 3.5x)

image

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ csv42_nested_10k_dot.csv (1.6 MB, 9 cols x 10K rows)                                                                                                                                                           │
├─────────────────────┬────────┬─────────────────────────────────────────────────────────────┬─────────────────────────────────┬────────────────────────────┬────────────────────────────────────────────────────┤
│ Name                │ Rows/s │ Throughput (MiB/s)                                          │ RSS above 28 MiB baseline (MiB) │ Types                      │ Sample                                             │
├─────────────────────┼────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────┼────────────────────────────┼────────────────────────────────────────────────────┤
│ uDSV typed deep {}  │ 812K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 127 │ ░░░░░░░░░░░░░░ 57.4             │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
│ csv42 typed deep {} │ 230K   │ ░░░░░░░░░░░░░░░░ 36.2                                       │ ░░░░░░░░░░░░░░░░ 64.2           │ number,object,string       │ [{"_type":"item","name":"Item 2","description":"It │
│ PapaParse deep {}   │ 88.6K  │ ░░░░░░░ 13.9                                                │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 113 │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
└─────────────────────┴────────┴─────────────────────────────────────────────────────────────┴─────────────────────────────────┴────────────────────────────┴────────────────────────────────────────────────────┘

@josdejong
Copy link
Author

josdejong commented Sep 8, 2023

OK I did let my computer do some crunching on most of the benchmarks (skipped the streaming ones). Most of them are roughly comparable to your outcomes, that gives me confidence. As for the nested objects: So your linux machine gets a factor 4.5 difference, your windows machine 3.5, my windows machine 4, and my benchmark with benchmark.js a factor 2.5. I'm still surprised that the relative differences can vary that much.

Sorry if I overreacted yesterday, I was just triggered by the benchmarks giving such wildly different results than I had seen before with my own benchmarks and tried out myself (putting the library that I built with love in a bad spotlight). Large part of it is because of mixing object/array formats in a single benchmark, and part of it is apparently a combination of the hardware and the benchmarking tool used. I still don't understand why you find it important to differentiate between typed/untyped but not between object/array format (both are optional processing steps, and the latter has much more performance impact). I hope you'll split them someday (I like the typed {} and typed [] suffixes that are used here and there, that helps already). Anyway, I'll leave it at this. Benchmarks cost way too much time 😉.

Computer: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz (max 4.20GHz), 16 GB RAM, 64-bit, Windows 11 Home, Node.js 18.

### Sensors Time Series (untyped parsers)

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ data-large2.csv (36 MB, 37 cols x 130K rows)                                                                                                                                                    │
├────────────────────────┬────────┬─────────────────────────────────────────────────────────────┬───────────────────────────────────┬────────┬────────────────────────────────────────────────────┤
│ Name                   │ Rows/s │ Throughput (MiB/s)                                          │ RSS above 268 MiB baseline (MiB)  │ Types  │ Sample                                             │
├────────────────────────┼────────┼─────────────────────────────────────────────────────────────┼───────────────────────────────────┼────────┼────────────────────────────────────────────────────┤
│ uDSV                   │ 449K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 124 │ ░░░░░░░░░░░░░░░░░░░░░░░░░░ 1.85K  │ string │ [["1370045100","4869044.81","4630605.41","382.8270 │
│ csv-simple-parser      │ 399K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 111       │ ░░░░░░░░░░░░░░░░░░░░░░░░░░ 1.87K  │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ d3-dsv                 │ 306K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 84.8                 │ ░░░░░░░░░░░░░░░░░░░░ 1.4K         │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ PapaParse              │ 284K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 78.6                    │ ░░░░░░░░░░░░░░░░░░░ 1.34K         │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ but-csv                │ 271K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 75.2                     │ ░░░░░░░░░░░░░░░░░░░ 1.36K         │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ ACsv                   │ 215K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 59.6                            │ ░░░░░░░░░░░░░░░ 1.05K             │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ csv-rex                │ 210K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░ 58.3                             │ ░░░░░░░░░░░░░░░ 1.05K             │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ comma-separated-values │ 156K   │ ░░░░░░░░░░░░░░░░░░░░ 43.3                                   │ ░░░░░░░░░░░░░░░░░ 1.24K           │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ csv42                  │ 129K   │ ░░░░░░░░░░░░░░░░ 35.8                                       │ ░░░░░░░░░░░░░░░░░░░ 1.33K         │ string │ [{"A":"1370045100","B":"4869044.81","C":"4630605.4 │
│ CSVtoJSON              │ 123K   │ ░░░░░░░░░░░░░░░░ 34.2                                       │ ░░░░░░░░░░░░░░░░░░ 1.31K          │ string │ [{"A":"1370045100","B":"4869044.81","C":"4630605.4 │
│ SheetJS                │ 120K   │ ░░░░░░░░░░░░░░░ 33.3                                        │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1.98K │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ node-csvtojson         │ 108K   │ ░░░░░░░░░░░░░░ 30                                           │ ░░░░░░░░░░░░ 845                  │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ achilles-csv-parser    │ 105K   │ ░░░░░░░░░░░░░ 29.1                                          │ ░░░░░░░░░░░░░░░░░░░ 1.33K         │ string │ [{"A":"1370045100","B":"4869044.81","C":"4630605.4 │
│ dekkai                 │ 103K   │ ░░░░░░░░░░░░░ 28.5                                          │ ░░░░░░░ 491                       │ string │ [["1370045100","4869044.81","4630605.41","382.8270 │
│ @vanillaes/csv         │ 101K   │ ░░░░░░░░░░░░░ 28.1                                          │ ░░░░░░░░░░ 706                    │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ @gregoranders/csv      │ 72.3K  │ ░░░░░░░░░ 20.1                                              │ ░░░░░░░░ 534                      │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ csv-js                 │ 71.2K  │ ░░░░░░░░░ 19.7                                              │ ░░░░░░░░ 534                      │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ csv-parser (neat-csv)  │ 70.6K  │ ░░░░░░░░░ 19.6                                              │ ░░░░░░░░░░░ 759                   │ string │ [{"A":"1370045100","B":"4869044.81","C":"4630605.4 │
│ csv-parse/sync         │ 47K    │ ░░░░░░ 13                                                   │ ░░░░░░░░░ 627                     │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ jquery-csv             │ 33K    │ ░░░░░ 9.15                                                  │ ░░░░░░░░░░░░░░░░░░░░░░░ 1.64K     │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
│ @fast-csv/parse        │ 25.5K  │ ░░░░ 7.06                                                   │ ░░░░░░░░░░░░ 845                  │ string │ [{"A0":"1370045100","B1":"4869044.81","C2":"463060 │
│ utils-dsv-base-parse   │ 22.7K  │ ░░░ 6.29                                                    │ ░░░░░ 347                         │ string │ [["1370044800","4819440.062","4645092.555","382.84 │
└────────────────────────┴────────┴─────────────────────────────────────────────────────────────┴───────────────────────────────────┴────────┴────────────────────────────────────────────────────┘

### USA ZIP Codes (untyped parsers)

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ uszips.csv (6 MB, 18 cols x 34K rows)                                                                                                                                                           │
├────────────────────────┬────────┬─────────────────────────────────────────────────────────────┬───────────────────────────────────┬────────┬────────────────────────────────────────────────────┤
│ Name                   │ Rows/s │ Throughput (MiB/s)                                          │ RSS above 98 MiB baseline (MiB)   │ Types  │ Sample                                             │
├────────────────────────┼────────┼─────────────────────────────────────────────────────────────┼───────────────────────────────────┼────────┼────────────────────────────────────────────────────┤
│ uDSV                   │ 598K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 107 │ ░░░░░░░ 339                       │ string │ [["00602","18.36075","-67.17541","Aguada","PR","Pu │
│ csv-simple-parser      │ 495K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 88.4         │ ░░░░░░ 281                        │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ d3-dsv                 │ 317K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 56.6                         │ ░░░░░░░░░ 419                     │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ achilles-csv-parser    │ 303K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 54.1                           │ ░░░░░░ 260                        │ string │ [{"zip":"00602","lat":"18.36075","lng":"-67.17541" │
│ csv-rex                │ 222K   │ ░░░░░░░░░░░░░░░░░░░░░ 39.6                                  │ ░░░░░░ 290                        │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ csv42                  │ 208K   │ ░░░░░░░░░░░░░░░░░░░░ 37.2                                   │ ░░░░░░ 260                        │ string │ [{"zip":"00602","lat":"18.36075","lng":"-67.17541" │
│ dekkai                 │ 195K   │ ░░░░░░░░░░░░░░░░░░ 34.8                                     │ ░░░░░░░░░ 425                     │ string │ [["00602","18.36075","-67.17541","Aguada","PR","Pu │
│ PapaParse              │ 188K   │ ░░░░░░░░░░░░░░░░░░ 33.6                                     │ ░░░░░░░░ 388                      │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ comma-separated-values │ 179K   │ ░░░░░░░░░░░░░░░░░ 32                                        │ ░░░░░ 238                         │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ SheetJS                │ 166K   │ ░░░░░░░░░░░░░░░░ 29.7                                       │ ░░░░░░░░░░ 452                    │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ csv-parser (neat-csv)  │ 146K   │ ░░░░░░░░░░░░░░ 26.2                                         │ ░░░░░ 224                         │ string │ [{"zip":"00602","lat":"18.36075","lng":"-67.17541" │
│ csv-js                 │ 145K   │ ░░░░░░░░░░░░░░ 25.9                                         │ ░░░░░░░░░░ 453                    │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ ACsv                   │ 136K   │ ░░░░░░░░░░░░░ 24.4                                          │ ░░░░░ 233                         │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ @vanillaes/csv         │ 136K   │ ░░░░░░░░░░░░░ 24.3                                          │ ░░░░░░░ 337                       │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ CSVtoJSON              │ 118K   │ ░░░░░░░░░░░ 21.1                                            │ ░░░░░░░ 320                       │ string │ [{"\"zip\"":"00602","\"lat\"":"18.36075","\"lng\"" │
│ node-csvtojson         │ 117K   │ ░░░░░░░░░░░ 20.8                                            │ ░░░░░░░░░ 429                     │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ csv-parse/sync         │ 75.9K  │ ░░░░░░░ 13.6                                                │ ░░░░ 178                          │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ @fast-csv/parse        │ 45.2K  │ ░░░░░ 8.08                                                  │ ░░░░ 161                          │ string │ [{"zip0":"00602","lat1":"18.36075","lng2":"-67.175 │
│ jquery-csv             │ 37.1K  │ ░░░░ 6.62                                                   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1.32K │ string │ [["00601","18.18027","-66.75266","Adjuntas","PR"," │
│ but-csv                │ ---    │ Wrong row count! Expected: 33790, Actual: 1                 │ ---                               │ ---    │ ---                                                │
│ @gregoranders/csv      │ ---    │ Invalid CSV at 1:109                                        │ ---                               │ ---    │ ---                                                │
│ utils-dsv-base-parse   │ ---    │ unexpected error. Encountered an invalid record. Field 17 o │ ---                               │ ---    │ ---                                                │
└────────────────────────┴────────┴─────────────────────────────────────────────────────────────┴───────────────────────────────────┴────────┴────────────────────────────────────────────────────┘

### House Price Index (untyped parsers)

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ HPI_master.csv (10 MB, 10 cols x 120K rows)                                                                                                                                                     │
├────────────────────────┬────────┬─────────────────────────────────────────────────────────────┬───────────────────────────────────┬────────┬────────────────────────────────────────────────────┤
│ Name                   │ Rows/s │ Throughput (MiB/s)                                          │ RSS above 126 MiB baseline (MiB)  │ Types  │ Sample                                             │
├────────────────────────┼────────┼─────────────────────────────────────────────────────────────┼───────────────────────────────────┼────────┼────────────────────────────────────────────────────┤
│ uDSV                   │ 1.38M  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 116 │ ░░░░░░░░░░ 543                    │ string │ [["traditional","purchase-only","monthly","USA or  │
│ csv-simple-parser      │ 1.29M  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 108    │ ░░░░░░░░░░░ 584                   │ string │ [["traditional","purchase-only","monthly","USA or  │
│ d3-dsv                 │ 1.08M  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 90.9           │ ░░░░░░░░░░░░░ 680                 │ string │ [["traditional","purchase-only","monthly","USA or  │
│ but-csv                │ 974K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 81.9                │ ░░░░░░░░░░░ 557                   │ string │ [["traditional","purchase-only","monthly","USA or  │
│ PapaParse              │ 943K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 79.2                 │ ░░░░░░░░░░░░ 633                  │ string │ [["traditional","purchase-only","monthly","USA or  │
│ csv-rex                │ 760K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 63.9                        │ ░░░░░░░░░ 461                     │ string │ [["traditional","purchase-only","monthly","USA or  │
│ ACsv                   │ 751K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 63.1                         │ ░░░░░░░░░░░░░ 683                 │ string │ [["traditional","purchase-only","monthly","USA or  │
│ csv42                  │ 715K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 60.1                          │ ░░░░░░░░░░ 495                    │ string │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ achilles-csv-parser    │ 589K   │ ░░░░░░░░░░░░░░░░░░░░░░░░ 49.5                               │ ░░░░░░░░░░ 528                    │ string │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ comma-separated-values │ 477K   │ ░░░░░░░░░░░░░░░░░░░░ 40.1                                   │ ░░░░░░░░░░ 549                    │ string │ [["traditional","purchase-only","monthly","USA or  │
│ SheetJS                │ 422K   │ ░░░░░░░░░░░░░░░░░ 35.4                                      │ ░░░░░░░░░░░░░░░░░ 901             │ string │ [["traditional","purchase-only","monthly","USA or  │
│ @vanillaes/csv         │ 365K   │ ░░░░░░░░░░░░░░░ 30.7                                        │ ░░░░░░░░ 411                      │ string │ [["traditional","purchase-only","monthly","USA or  │
│ dekkai                 │ 333K   │ ░░░░░░░░░░░░░░ 28                                           │ ░░░░░░░░░░ 521                    │ string │ [["traditional","purchase-only","monthly","USA or  │
│ node-csvtojson         │ 311K   │ ░░░░░░░░░░░░░ 26.1                                          │ ░░░░░░░░░░░ 594                   │ string │ [["traditional","purchase-only","monthly","USA or  │
│ csv-parser (neat-csv)  │ 286K   │ ░░░░░░░░░░░░ 24                                             │ ░░░░░░░ 350                       │ string │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ CSVtoJSON              │ 271K   │ ░░░░░░░░░░░ 22.8                                            │ ░░░░░░░░░░░ 602                   │ string │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ csv-js                 │ 265K   │ ░░░░░░░░░░░ 22.2                                            │ ░░░░░░░░░░░░░ 692                 │ string │ [["traditional","purchase-only","monthly","USA or  │
│ @gregoranders/csv      │ 214K   │ ░░░░░░░░░ 18                                                │ ░░░░░░░░░░░ 589                   │ string │ [["traditional","purchase-only","monthly","USA or  │
│ csv-parse/sync         │ 154K   │ ░░░░░░░ 12.9                                                │ ░░░░ 205                          │ string │ [["traditional","purchase-only","monthly","USA or  │
│ jquery-csv             │ 136K   │ ░░░░░░ 11.4                                                 │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1.49K │ string │ [["traditional","purchase-only","monthly","USA or  │
│ @fast-csv/parse        │ 87.5K  │ ░░░░ 7.35                                                   │ ░░░░░░░░ 437                      │ string │ [{"hpi_type0":"traditional","hpi_flavor1":"purchas │
│ utils-dsv-base-parse   │ 83.1K  │ ░░░░ 6.99                                                   │ ░░░░ 195                          │ string │ [["traditional","purchase-only","monthly","USA or  │
└────────────────────────┴────────┴─────────────────────────────────────────────────────────────┴───────────────────────────────────┴────────┴────────────────────────────────────────────────────┘

### Earthquakes (untyped parsers)

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ large-dataset.csv (1.1 MB, 15 cols x 7.3K rows)                                                                                                                                               │
├────────────────────────┬────────┬─────────────────────────────────────────────────────────────┬─────────────────────────────────┬────────┬────────────────────────────────────────────────────┤
│ Name                   │ Rows/s │ Throughput (MiB/s)                                          │ RSS above 43 MiB baseline (MiB) │ Types  │ Sample                                             │
├────────────────────────┼────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────┼────────┼────────────────────────────────────────────────────┤
│ uDSV                   │ 1.64M  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 245 │ ░░░░░ 40.1                      │ string │ [["2015-12-22T18:38:34.000Z","62.9616","-148.7532" │
│ csv-simple-parser      │ 1.48M  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 221      │ ░░░░░░ 43.8                     │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ but-csv                │ 1.03M  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 153                     │ ░░░░░░ 41.7                     │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ d3-dsv                 │ 870K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 130                          │ ░░░░░ 34.7                      │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ ACsv                   │ 675K   │ ░░░░░░░░░░░░░░░░░░░░░░░ 101                                 │ ░░░░░ 36.9                      │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ PapaParse              │ 584K   │ ░░░░░░░░░░░░░░░░░░░░ 87.3                                   │ ░░░░░░░░░░░░░ 105               │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ csv-rex                │ 532K   │ ░░░░░░░░░░░░░░░░░░ 79.6                                     │ ░░░░░░░░░░░░░░░░ 124            │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ csv42                  │ 523K   │ ░░░░░░░░░░░░░░░░░░ 78.2                                     │ ░░░░░░ 42.7                     │ string │ [{"time":"2015-12-22T18:38:34.000Z","latitude":"62 │
│ comma-separated-values │ 413K   │ ░░░░░░░░░░░░░░ 61.8                                         │ ░░░░░░ 43.3                     │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ achilles-csv-parser    │ 405K   │ ░░░░░░░░░░░░░░ 60.6                                         │ ░░░░░ 40                        │ string │ [{"time":"2015-12-22T18:38:34.000Z","latitude":"62 │
│ node-csvtojson         │ 364K   │ ░░░░░░░░░░░░░ 54.4                                          │ ░░░░░░░░░ 71.6                  │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ SheetJS                │ 281K   │ ░░░░░░░░░░ 41.9                                             │ ░░░░░░░░░░░░░░░░░░░ 147         │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ @vanillaes/csv         │ 242K   │ ░░░░░░░░░ 36.1                                              │ ░░░░░░░░░░░░░░ 107              │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ csv-parser (neat-csv)  │ 176K   │ ░░░░░░ 26.3                                                 │ ░░░░░░░░░░░░░░░░ 123            │ string │ [{"time":"2015-12-22T18:38:34.000Z","latitude":"62 │
│ dekkai                 │ 171K   │ ░░░░░░ 25.6                                                 │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 220 │ string │ [["2015-12-22T18:38:34.000Z","62.9616","-148.7532" │
│ csv-js                 │ 139K   │ ░░░░░ 20.7                                                  │ ░░░░░░░░░░░░░░░░░░░░░░░ 181     │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ CSVtoJSON              │ 128K   │ ░░░░░ 19.2                                                  │ ░░░░░░░░░░░░░░░░░░░░░░░ 182     │ string │ [{"time":"2015-12-22T18:38:34.000Z","latitude":"62 │
│ @gregoranders/csv      │ 113K   │ ░░░░ 16.8                                                   │ ░░░░░░░░░░░░░░░░░░░░░ 166       │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ csv-parse/sync         │ 100K   │ ░░░░ 15                                                     │ ░░░░░░ 46.4                     │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ jquery-csv             │ 93.7K  │ ░░░░ 14                                                     │ ░░░░░░░░░░░░░░░░░░░░ 162        │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
│ @fast-csv/parse        │ 60.5K  │ ░░░ 9.04                                                    │ ░░░░░░░░░░░░ 95.8               │ string │ [{"time0":"2015-12-22T18:38:34.000Z","latitude1":" │
│ utils-dsv-base-parse   │ 50.1K  │ ░░ 7.49                                                     │ ░░░░░ 40                        │ string │ [["2015-12-22T18:45:11.000Z","59.9988","-152.7191" │
└────────────────────────┴────────┴─────────────────────────────────────────────────────────────┴─────────────────────────────────┴────────┴────────────────────────────────────────────────────┘

### Sensors Time Series (typed parsers)

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ data-large2.csv (36 MB, 37 cols x 130K rows)                                                                                                                                                              │
├─────────────────────────────────┬────────┬──────────────────────────────────────────────────────────────┬───────────────────────────────────┬────────┬────────────────────────────────────────────────────┤
│ Name                            │ Rows/s │ Throughput (MiB/s)                                           │ RSS above 253 MiB baseline (MiB)  │ Types  │ Sample                                             │
├─────────────────────────────────┼────────┼──────────────────────────────────────────────────────────────┼───────────────────────────────────┼────────┼────────────────────────────────────────────────────┤
│ uDSV typed []                   │ 277K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 76.9 │ ░░░░░ 304                         │ number │ [[1370045100,4869044.81,4630605.41,382.8270592,382 │
│ csv-simple-parser typed []      │ 216K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 59.8             │ ░░░░░░ 352                        │ number │ [[1370044800,4819440.062,4645092.555,382.8436706,3 │
│ csv-rex typed []                │ 160K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 44.3                        │ ░░░░░ 280                         │ number │ [[1370044800,4819440.062,4645092.555,382.8436706,3 │
│ comma-separated-values typed {} │ 144K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 39.8                           │ ░░░░░░░░░░░░░ 865                 │ number │ [{"A":1370045100,"B":4869044.81,"C":4630605.41,"D" │
│ d3-dsv typed []                 │ 129K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░ 35.9                              │ ░░░░░ 289                         │ number │ [[1370044800,4819440.062,4645092.555,382.8436706,3 │
│ csv42 typed {}                  │ 100K   │ ░░░░░░░░░░░░░░░░░░░░ 27.8                                    │ ░░░░░░░░░░░░░░░ 1.01K             │ number │ [{"A":1370045100,"B":4869044.81,"C":4630605.41,"D" │
│ dekkai typed []                 │ 98.5K  │ ░░░░░░░░░░░░░░░░░░░░ 27.3                                    │ ░░░░ 249                          │ number │ [[1370045100,4869044.81,4630605.41,382.8270592,382 │
│ achilles-csv-parser typed {}    │ 83.6K  │ ░░░░░░░░░░░░░░░░░ 23.2                                       │ ░░░░░░░░░░░░ 762                  │ number │ [{"A":1370045100,"B":4869044.81,"C":4630605.41,"D" │
│ @vanillaes/csv typed []         │ 83.6K  │ ░░░░░░░░░░░░░░░░░ 23.2                                       │ ░░░ 177                           │ number │ [[1370044800,4819440.062,4645092.555,382.8436706,3 │
│ SheetJS typed {}                │ 79.9K  │ ░░░░░░░░░░░░░░░░ 22.2                                        │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1.84K │ number │ [[1370044800,4819440.062,4645092.555,382.8436706,3 │
│ CSVtoJSON typed {}              │ 79.4K  │ ░░░░░░░░░░░░░░░░ 22                                          │ ░░░░░░░░░░░░ 770                  │ number │ [{"A":1370045100,"B":4869044.81,"C":4630605.41,"D" │
│ PapaParse typed []              │ 76K    │ ░░░░░░░░░░░░░░░░ 21.1                                        │ ░░░░░░░░░░ 619                    │ number │ [[1370044800,4819440.062,4645092.555,382.8436706,3 │
│ csv-parser (neat-csv) typed {}  │ 58.7K  │ ░░░░░░░░░░░░ 16.3                                            │ ░░░░░░░░░░░░░ 847                 │ number │ [{"A":1370045100,"B":4869044.81,"C":4630605.41,"D" │
│ csv-js typed []                 │ 55.1K  │ ░░░░░░░░░░░ 15.3                                             │ ░░░░ 260                          │ number │ [[1370044800,4819440.062,4645092.555,382.8436706,3 │
│ csv-parse/sync typed []         │ 35.4K  │ ░░░░░░░░ 9.82                                                │ ░░░ 150                           │ number │ [[1370044800,4819440.062,4645092.555,382.8436706,3 │
└─────────────────────────────────┴────────┴──────────────────────────────────────────────────────────────┴───────────────────────────────────┴────────┴────────────────────────────────────────────────────┘

### USA ZIP Codes (typed parsers)

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ uszips.csv (6 MB, 18 cols x 34K rows)                                                                                                                                                                                              │
├─────────────────────────────────┬────────┬──────────────────────────────────────────────────────────────┬─────────────────────────────────┬───────────────────────────────────┬────────────────────────────────────────────────────┤
│ Name                            │ Rows/s │ Throughput (MiB/s)                                           │ RSS above 98 MiB baseline (MiB) │ Types                             │ Sample                                             │
├─────────────────────────────────┼────────┼──────────────────────────────────────────────────────────────┼─────────────────────────────────┼───────────────────────────────────┼────────────────────────────────────────────────────┤
│ uDSV typed []                   │ 404K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 72.2 │ ░░░░░░░░░░░░░░░░░░ 259          │ boolean,null,number,object,string │ [[602,18.36075,-67.17541,"Aguada","PR","Puerto Ric │
│ csv-simple-parser typed []      │ 285K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 50.8                 │ ░░░░░░░░░░░░░░░░░ 249           │ boolean,null,number,object,string │ [[601,18.18027,-66.75266,"Adjuntas","PR","Puerto R │
│ achilles-csv-parser typed {}    │ 226K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 40.4                         │ ░░░░░░░░░░░░░░░░░ 247           │ boolean,null,number,object,string │ [{"zip":602,"lat":18.36075,"lng":-67.17541,"city": │
│ comma-separated-values typed {} │ 185K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░ 33                                │ ░░░░░░░░░░░░░░░░ 230            │ number,string                     │ [{"zip":602,"lat":18.36075,"lng":-67.17541,"city": │
│ d3-dsv typed []                 │ 172K   │ ░░░░░░░░░░░░░░░░░░░░░░░░ 30.6                                │ ░░░░░░░░░░░░░░░░░ 249           │ null,number,string                │ [[601,18.18027,-66.75266,"Adjuntas","PR","Puerto R │
│ dekkai typed []                 │ 168K   │ ░░░░░░░░░░░░░░░░░░░░░░░ 30.1                                 │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 396 │ NaN,number,string                 │ [[602,18.36075,-67.17541,"Aguada","PR","Puerto Ric │
│ csv-rex typed []                │ 163K   │ ░░░░░░░░░░░░░░░░░░░░░░░ 29.1                                 │ ░░░░░░░░░░░░░░░░░ 247           │ boolean,null,number,object,string │ [[601,18.18027,-66.75266,"Adjuntas","PR","Puerto R │
│ csv42 typed {}                  │ 142K   │ ░░░░░░░░░░░░░░░░░░░░ 25.4                                    │ ░░░░░░░░░░░░░░░░ 233            │ number,object,string              │ [{"zip":602,"lat":18.36075,"lng":-67.17541,"city": │
│ csv-js typed []                 │ 117K   │ ░░░░░░░░░░░░░░░░ 20.9                                        │ ░░░░░░░░░░░░░░░ 208             │ boolean,number,string             │ [[601,18.18027,-66.75266,"Adjuntas","PR","Puerto R │
│ csv-parser (neat-csv) typed {}  │ 116K   │ ░░░░░░░░░░░░░░░░ 20.7                                        │ ░░░░░░░░░░░░░░ 194              │ boolean,null,number,object,string │ [{"zip":602,"lat":18.36075,"lng":-67.17541,"city": │
│ PapaParse typed []              │ 111K   │ ░░░░░░░░░░░░░░░░ 19.8                                        │ ░░░░░░░░░░░░░░░░░ 243           │ boolean,null,number,string        │ [[601,18.18027,-66.75266,"Adjuntas","PR","Puerto R │
│ @vanillaes/csv typed []         │ 108K   │ ░░░░░░░░░░░░░░░ 19.4                                         │ ░░░░░░░░░░░░░ 180               │ NaN,number,string                 │ [[601,18.18027,-66.75266,"Adjuntas","PR","Puerto R │
│ CSVtoJSON typed {}              │ 102K   │ ░░░░░░░░░░░░░░ 18.2                                          │ ░░░░░░░░░░░░░░ 202              │ number,string                     │ [{"\"zip\"":602,"\"lat\"":18.36075,"\"lng\"":-67.1 │
│ SheetJS typed {}                │ 72.4K  │ ░░░░░░░░░░ 12.9                                              │ ░░░░░░░░░░░░░░░░░░░░░░░░░ 358   │ boolean,number,string             │ [[601,18.18027,-66.75266,"Adjuntas","PR","Puerto R │
│ csv-parse/sync typed []         │ 12.5K  │ ░░ 2.23                                                      │ ░░░░░░░░░░░░░░░░░░░░░ 293       │ number,string                     │ [[601,18.18027,-66.75266,"Adjuntas","PR","Puerto R │
└─────────────────────────────────┴────────┴──────────────────────────────────────────────────────────────┴─────────────────────────────────┴───────────────────────────────────┴────────────────────────────────────────────────────┘

### House Price Index (typed parsers)

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ HPI_master.csv (10 MB, 10 cols x 120K rows)                                                                                                                                                                         │
├─────────────────────────────────┬────────┬──────────────────────────────────────────────────────────────┬──────────────────────────────────┬───────────────────┬────────────────────────────────────────────────────┤
│ Name                            │ Rows/s │ Throughput (MiB/s)                                           │ RSS above 127 MiB baseline (MiB) │ Types             │ Sample                                             │
├─────────────────────────────────┼────────┼──────────────────────────────────────────────────────────────┼──────────────────────────────────┼───────────────────┼────────────────────────────────────────────────────┤
│ uDSV typed []                   │ 1.11M  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 93.6 │ ░░░░░░░░░░░░░░░░░░░░░░░░░ 598    │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
│ csv-simple-parser typed []      │ 688K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 57.8                      │ ░░░░░░░░░░░░░░░░░░░ 442          │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
│ csv-rex typed []                │ 549K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 46.1                            │ ░░░░░░░░░░░░░░░░░░░░░░ 512       │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
│ csv42 typed {}                  │ 510K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░ 42.9                              │ ░░░░░░░░░░░░░░░ 348              │ number,string     │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ achilles-csv-parser typed {}    │ 459K   │ ░░░░░░░░░░░░░░░░░░░░░░░ 38.5                                 │ ░░░░░░░░░░░░░ 311                │ number,string     │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ comma-separated-values typed {} │ 449K   │ ░░░░░░░░░░░░░░░░░░░░░░░ 37.7                                 │ ░░░░░░░░░░░░░░░░░░░░░░ 517       │ number,string     │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ d3-dsv typed []                 │ 441K   │ ░░░░░░░░░░░░░░░░░░░░░░ 37.1                                  │ ░░░░░░░░░░░░░░░░ 374             │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
│ PapaParse typed []              │ 348K   │ ░░░░░░░░░░░░░░░░░░ 29.3                                      │ ░░░░░░░░░░░░░░░░ 368             │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
│ dekkai typed []                 │ 308K   │ ░░░░░░░░░░░░░░░░ 25.9                                        │ ░░░░░░░░░░░░░░░░░░░░░ 498        │ NaN,number,string │ [["traditional","purchase-only","monthly","USA or  │
│ @vanillaes/csv typed []         │ 278K   │ ░░░░░░░░░░░░░░ 23.4                                          │ ░░░░░░░░░░░ 245                  │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
│ csv-parser (neat-csv) typed {}  │ 238K   │ ░░░░░░░░░░░░ 20                                              │ ░░░░░░░░░ 204                    │ number,string     │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ CSVtoJSON typed {}              │ 236K   │ ░░░░░░░░░░░░ 19.8                                            │ ░░░░░░░░░░ 227                   │ number,string     │ [{"hpi_type":"traditional","hpi_flavor":"purchase- │
│ csv-js typed []                 │ 214K   │ ░░░░░░░░░░░ 18                                               │ ░░░░░░░░░ 200                    │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
│ SheetJS typed {}                │ 150K   │ ░░░░░░░░ 12.6                                                │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 654  │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
│ csv-parse/sync typed []         │ 24.4K  │ ░░ 2.05                                                      │ ░░░░░░░░░░░░░░░░░░░░ 471         │ number,string     │ [["traditional","purchase-only","monthly","USA or  │
└─────────────────────────────────┴────────┴──────────────────────────────────────────────────────────────┴──────────────────────────────────┴───────────────────┴────────────────────────────────────────────────────┘

### Earthquakes (typed parsers)

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ large-dataset.csv (1.1 MB, 15 cols x 7.3K rows)                                                                                                                                                                          │
├─────────────────────────────────┬────────┬──────────────────────────────────────────────────────────────┬─────────────────────────────────┬─────────────────────────┬────────────────────────────────────────────────────┤
│ Name                            │ Rows/s │ Throughput (MiB/s)                                           │ RSS above 44 MiB baseline (MiB) │ Types                   │ Sample                                             │
├─────────────────────────────────┼────────┼──────────────────────────────────────────────────────────────┼─────────────────────────────────┼─────────────────────────┼────────────────────────────────────────────────────┤
│ csv-simple-parser typed []      │ 497K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 74.3 │ ░░░░░░ 37.6                     │ null,number,string      │ [["2015-12-22T18:45:11.000Z",59.9988,-152.7191,100 │
│ uDSV typed []                   │ 488K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 72.9  │ ░░░░░░░░ 46.4                   │ date,null,number,string │ [["2015-12-22T18:38:34.000Z",62.9616,-148.7532,65. │
│ comma-separated-values typed {} │ 340K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 50.9                  │ ░░░░░░░ 43.9                    │ number,string           │ [{"time":"2015-12-22T18:38:34.000Z","latitude":62. │
│ csv-rex typed []                │ 331K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 49.4                   │ ░░░░░░░░░░░░░░░░ 97             │ null,number,string      │ [["2015-12-22T18:45:11.000Z",59.9988,-152.7191,100 │
│ csv42 typed {}                  │ 298K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 44.6                      │ ░░░░░░░ 38.9                    │ null,number,string      │ [{"time":"2015-12-22T18:38:34.000Z","latitude":62. │
│ achilles-csv-parser typed {}    │ 263K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 39.3                          │ ░░░░░░░ 41                      │ null,number,string      │ [{"time":"2015-12-22T18:38:34.000Z","latitude":62. │
│ d3-dsv typed []                 │ 201K   │ ░░░░░░░░░░░░░░░░░░░░░░░ 30.1                                 │ ░░░░░░░░ 46.7                   │ date,null,number,string │ [["2015-12-22T18:45:11.000Z",59.9988,-152.7191,100 │
│ PapaParse typed []              │ 165K   │ ░░░░░░░░░░░░░░░░░░░ 24.6                                     │ ░░░░░░░░░░░░░░░░░░░░ 125        │ date,null,number,string │ [["2015-12-22T18:45:11.000Z",59.9988,-152.7191,100 │
│ @vanillaes/csv typed []         │ 160K   │ ░░░░░░░░░░░░░░░░░░ 23.9                                      │ ░░░░░░░░░░░░░░░ 91.5            │ NaN,number,string       │ [[2015,59.9988,-152.7191,100,3,"ml",null,null,null │
│ dekkai typed []                 │ 151K   │ ░░░░░░░░░░░░░░░░░ 22.5                                       │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 174 │ NaN,number,string       │ [["2015-12-22T18:38:34.000Z",62.9616,-148.7532,65. │
│ csv-parser (neat-csv) typed {}  │ 140K   │ ░░░░░░░░░░░░░░░░ 20.9                                        │ ░░░░░░░░░░░░░░░░░░ 114          │ null,number,string      │ [{"time":"2015-12-22T18:38:34.000Z","latitude":62. │
│ CSVtoJSON typed {}              │ 113K   │ ░░░░░░░░░░░░░ 16.9                                           │ ░░░░░░░░░░░░░░░░░ 107           │ number,string           │ [{"time":"2015-12-22T18:38:34.000Z","latitude":62. │
│ csv-js typed []                 │ 109K   │ ░░░░░░░░░░░░░ 16.3                                           │ ░░░░░░░░░░░░░░░░░ 105           │ number,string           │ [["2015-12-22T18:45:11.000Z",59.9988,-152.7191,100 │
│ SheetJS typed {}                │ 63.5K  │ ░░░░░░░░ 9.49                                                │ ░░░░░░░░░░░░░░░░░ 107           │ number,string           │ [["2015-12-22T18:45:11.000Z",59.9988,-152.7191,100 │
│ csv-parse/sync typed []         │ 17.4K  │ ░░ 2.6                                                       │ ░░░░░░░░░░░░░░░░░ 109           │ date,number,string      │ [["2015-12-22T18:45:11.000Z",59.9988,-152.7191,100 │
└─────────────────────────────────┴────────┴──────────────────────────────────────────────────────────────┴─────────────────────────────────┴─────────────────────────┴────────────────────────────────────────────────────┘

### Nested Objects

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ csv42_nested_10k_dot.csv (1.6 MB, 9 cols x 10K rows)                                                                                                                                                           │
├─────────────────────┬────────┬─────────────────────────────────────────────────────────────┬─────────────────────────────────┬────────────────────────────┬────────────────────────────────────────────────────┤
│ Name                │ Rows/s │ Throughput (MiB/s)                                          │ RSS above 36 MiB baseline (MiB) │ Types                      │ Sample                                             │
├─────────────────────┼────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────┼────────────────────────────┼────────────────────────────────────────────────────┤
│ uDSV typed deep {}  │ 691K   │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 109 │ ░░░░░░░░░░░░ 55                 │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
│ csv42 typed deep {} │ 173K   │ ░░░░░░░░░░░░░░ 27.2                                         │ ░░░░░░░░░░░░░░░░░░░░░░░░░░ 119  │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
│ PapaParse deep {}   │ 76K    │ ░░░░░░░ 11.9                                                │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░ 126 │ array,number,object,string │ [{"_type":"item","name":"Item 2","description":"It │
└─────────────────────┴────────┴─────────────────────────────────────────────────────────────┴─────────────────────────────────┴────────────────────────────┴────────────────────────────────────────────────────┘

@leeoniya
Copy link
Owner

leeoniya commented Sep 8, 2023

no worries, thanks for running them :)

I still don't understand why you find it important to differentiate between typed/untyped but not between object/array format (both are optional processing steps, and the latter has much more performance impact).

because typed output is necessarily expensive. all libs must take a hit here (uDSV eats 30%, and that's the best case). additionally, typed output is not something you can make optional -- if your app need types, there's simply no way around it. object vs array output only impacts libs that do it poorly. if someone insists on using a particular parser, it's possible to change their app to accept tuples rather than objects...yes it's uglier in some cases, but possible if they're willing to trade ergonomics for perf. hope that clarifies why they're in different categories.

i'm gonna start working on another unplanned thing now...because Github finally completely fucked up the home page feed 🤮

https://github.com/orgs/community/discussions/categories/feed
https://github.com/orgs/community/discussions/13130#discussioncomment-6931320

@josdejong
Copy link
Author

josdejong commented Sep 8, 2023

if your app need types, there's simply no way around it

ha ha, we're starting to run in circles here. The same holds for objects: if your app needs objects, there is simply no way around it then to convert into objects. Taking X months to write your own array based Table Component Pro® because the GUI Table Component used company wide only supports objects may not be practical. I guess we have a different definition of a "necessary" operation here.

Anyways, I won't bother you anymore. Good luck with the Github feed issues 😓

@leeoniya
Copy link
Owner

leeoniya commented Sep 8, 2023

you're right. if you have to integrate with software that's out of your control, and you need objects+types+nested, then csv42 is 3x faster than Papa+typing+flat. nothing i've measured invalidates your article or benchmarks...and uDSV is another 3x-4x faster than that :)

@josdejong
Copy link
Author

and uDSV is another 3x-4x faster than that

yes true, it's really impressive!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants