This commit drastically improves the evaluation execution time (~43% faster
with the large formula demo dataset)
`Map` and `Set` data structures are slow compared to objects and arrays
(especially with `BigInt`, more on that later). Why use `Map` and `Set` then ?
We needed a set data structure to hold which positions should be evaluated. We
obviously don't want duplicates (hence the set). However, we can't blindly add
a position object `{ sheetId, col, row }` in a `Set` though because the same
position could be added twice by adding the same position as two different
objects that are not referentially equal.
That's why we came up with a way to encode the position object to a single
value that can be added safely to the set without worrying about object
references being equal. This encoded value is a `BigInt` (see
`PositionBitsEncoder`). That's why we used `Set<BigInt>`. Since we had the
position encoded as a single value, we naturally used that value as a key in a
`Map` (regular `object` can only have `string` or `number` keys, not `BigInt`).
Using this `BigInt` key also avoid having to decode the encoded value, then
have a nested `[sheetId][col][row]` objects.
In the evaluation process, there's a lot going on. To the point were lookup in
the map is a bottleneck.
This commit replaces the `Set` and `Map` in the evaluation by custom data
structure `PositionSet` and `PositionMap<T>` with faster addition and lookup.
In addition to being faster, this makes the evaluation code simpler as we no
longer need encoding/decoding everywhere. Those data structures are also easier
to debug. With the `BigInt` encoding, it was hard to find back which position
is the number `28416511n`.
Some measures
----------------------
*(all measures are averages over 8 runs)*
```js
console.time("eval");
o_spreadsheet.__DEBUG__.model.dispatch("EVALUATE_CELLS")
console.timeEnd("eval");
```
On the large formula demo data set (260k cells): -43%
| | Time | Allocated memory |
|-------- |------- |------------------ |
| Before | 512ms | 190MB |
| After | 291ms | 122MB |
I also tried implementing `PositionSet` with a `PositionMap<boolean>` instead
of `BinaryGrid`. It's ~10% slower and allocates ~20MB more (but it's 120 LoC
less).
On RNG's Timesheet spreadsheet: -23%
| | Time | Allocated memory |
|-------- |------- |------------------ |
| Before | 1894ms | 580MB |
| After | 1469ms | 460MB |
Basic tests on the raw data structures with 1M positions (10k rows and 100
columns)
| | | add | has |
|--- |---------------- |------- |------- |
| 1 | `Set<BigInt>` | 314ms | 262ms |
| 2 | `Set` + encoding | 440ms | |
| 3 | `PositionSet` | 43ms | 46ms |
| | | set | get |
|--- |--------------------- |------- |------- |
| 4 | `Map<BigInt, number>` | 365ms | 304ms |
| 5 | `PositionMap<number>` | 47ms | 51ms |
Setup code shared by all benchmarks:
```js
const sheetId = "1";
const postions = [];
for (let i = 0; i < 10000; i++) {
for (let j = 0; j < 100; j++) {
postions.push({ sheetId, row: i, col: j });
}
}
```
1. `Set<BigInt>` 314ms 262ms
```js
const set = new Set();
const length = BigInt(postions.length);
performance.mark("add");
for (let i = 0n; i < length; i++) {
set.add(i);
}
performance.measure("add", "add").duration);
performance.mark("has");
for (let i = 0n; i < length; i++) {
set.has(i);
}
performance.measure("has", "has").duration;
```
2. `Set<BigInt> + encoding` 440ms
```js
const encoder = new PositionBitsEncoder();
const set = new Set();
performance.mark("add");
for (const position of postions) {
set.add(encoder.encode(position));
}
performance.measure("add", "add").duration;
```
3. `PositionSet` 43ms 46ms
```js
const set = new PositionSet({
[sheetId]: { rows: 10000, cols: 100 },
});
performance.mark("add");
for (const position of postions) {
set.add(position);
}
performance.measure("add", "add").duration;
performance.mark("has");
let r;
for (const position of postions) {
r = set.has(position);
}
performance.measure("has", "has").duration;
```
4. `Map<BigInt, number>` 365ms 304ms
```js
const map = new Map();
const length = BigInt(postions.length);
performance.mark("set");
for (let i = 0n; i < length; i++) {
map.set(i, 4);
}
performance.measure("set", "set").duration;
performance.mark("get");
let r;
for (let i = 0n; i < length; i++) {
r = map.get(i);
}
performance.measure("get", "get").duration;
```
5. `PositionMap` 47ms 51ms
```js
performance.mark("add");
for (const position of postions) {
map.set(position, 4);
}
performance.measure("add", "add").duration;
performance.mark("get");
let r;
for (const position of postions) {
r = map.get(position);
}
performance.measure("get", "get").duration;
```
Task: 3802246