Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[PERF] evaluation: use custom data structures instead of Map and Set
This commit drastically improves the evaluation execution time (~43% faster with the large formula demo dataset) `Map` and `Set` data structures are slow compared to objects and arrays (especially with `BigInt`, more on that later). Why use `Map` and `Set` then ? We needed a set data structure to hold which positions should be evaluated. We obviously don't want duplicates (hence the set). However, we can't blindly add a position object `{ sheetId, col, row }` in a `Set` though because the same position could be added twice by adding the same position as two different objects that are not referentially equal. That's why we came up with a way to encode the position object to a single value that can be added safely to the set without worrying about object references being equal. This encoded value is a `BigInt` (see `PositionBitsEncoder`). That's why we used `Set<BigInt>`. Since we had the position encoded as a single value, we naturally used that value as a key in a `Map` (regular `object` can only have `string` or `number` keys, not `BigInt`). Using this `BigInt` key also avoid having to decode the encoded value, then have a nested `[sheetId][col][row]` objects. In the evaluation process, there's a lot going on. To the point were lookup in the map is a bottleneck. This commit replaces the `Set` and `Map` in the evaluation by custom data structure `PositionSet` and `PositionMap<T>` with faster addition and lookup. In addition to being faster, this makes the evaluation code simpler as we no longer need encoding/decoding everywhere. Those data structures are also easier to debug. With the `BigInt` encoding, it was hard to find back which position is the number `28416511n`. Some measures ---------------------- *(all measures are averages over 8 runs)* ```js console.time("eval"); o_spreadsheet.__DEBUG__.model.dispatch("EVALUATE_CELLS") console.timeEnd("eval"); ``` On the large formula demo data set (260k cells): -43% | | Time | Allocated memory | |-------- |------- |------------------ | | Before | 512ms | 190MB | | After | 291ms | 122MB | I also tried implementing `PositionSet` with a `PositionMap<boolean>` instead of `BinaryGrid`. It's ~10% slower and allocates ~20MB more (but it's 120 LoC less). On RNG's Timesheet spreadsheet: -23% | | Time | Allocated memory | |-------- |------- |------------------ | | Before | 1894ms | 580MB | | After | 1469ms | 460MB | Basic tests on the raw data structures with 1M positions (10k rows and 100 columns) | | | add | has | |--- |---------------- |------- |------- | | 1 | `Set<BigInt>` | 314ms | 262ms | | 2 | `Set` + encoding | 440ms | | | 3 | `PositionSet` | 43ms | 46ms | | | | set | get | |--- |--------------------- |------- |------- | | 4 | `Map<BigInt, number>` | 365ms | 304ms | | 5 | `PositionMap<number>` | 47ms | 51ms | Setup code shared by all benchmarks: ```js const sheetId = "1"; const postions = []; for (let i = 0; i < 10000; i++) { for (let j = 0; j < 100; j++) { postions.push({ sheetId, row: i, col: j }); } } ``` 1. `Set<BigInt>` 314ms 262ms ```js const set = new Set(); const length = BigInt(postions.length); performance.mark("add"); for (let i = 0n; i < length; i++) { set.add(i); } performance.measure("add", "add").duration); performance.mark("has"); for (let i = 0n; i < length; i++) { set.has(i); } performance.measure("has", "has").duration; ``` 2. `Set<BigInt> + encoding` 440ms ```js const encoder = new PositionBitsEncoder(); const set = new Set(); performance.mark("add"); for (const position of postions) { set.add(encoder.encode(position)); } performance.measure("add", "add").duration; ``` 3. `PositionSet` 43ms 46ms ```js const set = new PositionSet({ [sheetId]: { rows: 10000, cols: 100 }, }); performance.mark("add"); for (const position of postions) { set.add(position); } performance.measure("add", "add").duration; performance.mark("has"); let r; for (const position of postions) { r = set.has(position); } performance.measure("has", "has").duration; ``` 4. `Map<BigInt, number>` 365ms 304ms ```js const map = new Map(); const length = BigInt(postions.length); performance.mark("set"); for (let i = 0n; i < length; i++) { map.set(i, 4); } performance.measure("set", "set").duration; performance.mark("get"); let r; for (let i = 0n; i < length; i++) { r = map.get(i); } performance.measure("get", "get").duration; ``` 5. `PositionMap` 47ms 51ms ```js performance.mark("add"); for (const position of postions) { map.set(position, 4); } performance.measure("add", "add").duration; performance.mark("get"); let r; for (const position of postions) { r = map.get(position); } performance.measure("get", "get").duration; ``` closes #3823 Task: 3802246 Signed-off-by: Vincent Schippefilt (vsc) <vsc@odoo.com>
- Loading branch information