Skip to content

Commit

Permalink
[PERF] evaluation: use custom data structures instead of Map and Set
Browse files Browse the repository at this point in the history
This commit drastically improves the evaluation execution time (~43% faster
with the large formula demo dataset)

`Map` and `Set` data structures are slow compared to objects and arrays
(especially with `BigInt`, more on that later). Why use `Map` and `Set` then ?

We needed a set data structure to hold which positions should be evaluated. We
obviously don't want duplicates (hence the set). However, we can't blindly add
a position object `{ sheetId, col, row }` in a `Set` though because the same
position could be added twice by adding the same position as two different
objects that are not referentially equal.
That's why we came up with a way to encode the position object to a single
value that can be added safely to the set without worrying about object
references being equal. This encoded value is a `BigInt` (see
`PositionBitsEncoder`). That's why we used `Set<BigInt>`. Since we had the
position encoded as a single value, we naturally used that value as a key in a
`Map` (regular `object` can only have `string` or `number` keys, not `BigInt`).
Using this `BigInt` key also avoid having to decode the encoded value, then
have a nested `[sheetId][col][row]` objects.

In the evaluation process, there's a lot going on. To the point were lookup in
the map is a bottleneck.

This commit replaces the `Set` and `Map` in the evaluation by custom data
structure `PositionSet` and `PositionMap<T>` with faster addition and lookup.

In addition to being faster, this makes the evaluation code simpler as we no
longer need encoding/decoding everywhere. Those data structures are also easier
to debug. With the `BigInt` encoding, it was hard to find back which position
is the number `28416511n`.

Some measures
----------------------
*(all measures are averages over 8 runs)*

```js
console.time("eval");
o_spreadsheet.__DEBUG__.model.dispatch("EVALUATE_CELLS")
console.timeEnd("eval");
```
On the large formula demo data set (260k cells): -43%

|        	| Time  	| Allocated memory 	|
|--------	|-------	|------------------	|
| Before 	| 512ms 	| 190MB            	|
| After  	| 291ms 	| 122MB            	|

I also tried implementing `PositionSet` with a `PositionMap<boolean>` instead
of `BinaryGrid`. It's ~10% slower and allocates ~20MB more (but it's 120 LoC
less).

On RNG's Timesheet spreadsheet: -23%

|        	| Time  	| Allocated memory 	|
|--------	|-------	|------------------	|
| Before 	| 1894ms 	| 580MB            	|
| After  	| 1469ms 	| 460MB            	|

Basic tests on the raw data structures with 1M positions (10k rows and 100
columns)

|   	|                	| add   	| has   	|
|---	|----------------	|-------	|-------	|
| 1 	| `Set<BigInt>`    	| 314ms 	| 262ms 	|
| 2 	| `Set` + encoding 	| 440ms 	|       	|
| 3 	| `PositionSet`    	| 43ms  	| 46ms  	|

|   	|                     	| set   	| get   	|
|---	|---------------------	|-------	|-------	|
| 4 	| `Map<BigInt, number>` 	| 365ms 	| 304ms 	|
| 5 	| `PositionMap<number>` 	| 47ms  	| 51ms  	|

Setup code shared by all benchmarks:
```js
const sheetId = "1";
const postions = [];
for (let i = 0; i < 10000; i++) {
  for (let j = 0; j < 100; j++) {
    postions.push({ sheetId, row: i, col: j });
  }
}
```
1. `Set<BigInt>` 314ms	262ms
```js
const set = new Set();
const length = BigInt(postions.length);
performance.mark("add");
for (let i = 0n; i < length; i++) {
  set.add(i);
}
performance.measure("add", "add").duration);

performance.mark("has");
for (let i = 0n; i < length; i++) {
  set.has(i);
}
performance.measure("has", "has").duration;
```

2. `Set<BigInt> + encoding` 440ms
```js
const encoder = new PositionBitsEncoder();
const set = new Set();
performance.mark("add");
for (const position of postions) {
  set.add(encoder.encode(position));
}
performance.measure("add", "add").duration;
```

3. `PositionSet` 43ms	46ms
```js
const set = new PositionSet({
    [sheetId]: { rows: 10000, cols: 100 },
});
performance.mark("add");
for (const position of postions) {
  set.add(position);
}
performance.measure("add", "add").duration;

performance.mark("has");
let r;
for (const position of postions) {
  r = set.has(position);
}
performance.measure("has", "has").duration;
```

4. `Map<BigInt, number>` 365ms	304ms

```js
const map = new Map();
const length = BigInt(postions.length);
performance.mark("set");
for (let i = 0n; i < length; i++) {
  map.set(i, 4);
}
performance.measure("set", "set").duration;

performance.mark("get");
let r;
for (let i = 0n; i < length; i++) {
  r = map.get(i);
}
performance.measure("get", "get").duration;
```

5. `PositionMap` 47ms	51ms

```js
performance.mark("add");
for (const position of postions) {
  map.set(position, 4);
}
performance.measure("add", "add").duration;

performance.mark("get");
let r;
for (const position of postions) {
  r = map.get(position);
}
performance.measure("get", "get").duration;
```

Task: 3802246
  • Loading branch information
LucasLefevre committed Mar 18, 2024
1 parent fdff1cf commit f021d4f
Show file tree
Hide file tree
Showing 10 changed files with 592 additions and 282 deletions.
16 changes: 0 additions & 16 deletions src/helpers/misc.ts
Expand Up @@ -461,22 +461,6 @@ export function isConsecutive(iterable: Iterable<number>): boolean {
return true;
}

export class JetSet<T> extends Set<T> {
addMany(iterable: Iterable<T>): this {
for (const element of iterable) {
super.add(element);
}
return this;
}
deleteMany(iterable: Iterable<T>): boolean {
let wasDeleted = false;
for (const element of iterable) {
wasDeleted ||= super.delete(element);
}
return wasDeleted;
}
}

/**
* Creates a version of the function that's memoized on the value of its first
* argument, if any.
Expand Down
98 changes: 98 additions & 0 deletions src/plugins/ui_core_views/cell_evaluation/binary_grid.ts
@@ -0,0 +1,98 @@
import { CellPosition } from "../../../types";

type Bit = 0 | 1;

/**
* Implements a fixed-sized grid or 2D matrix of bits.
* based on https://github.com/zandaqo/structurae
*
* The grid is implemented as a 1D array of 32-bit integers, where each bit represents a cell in the grid.
* It follows row-major order, with each row stored consecutively in 32-bit blocks.
* Pads the number of columns to the next power of 2 to allow quick lookups with bitwise operations.
*
* Key terminology:
* - bucket: Index of an item in the Uint32Array, a 32-bit integer.
* - bitPosition: The position of a bit within the bucket 32-bit integer.
*/
export class BinaryGrid extends Uint32Array {
private columnOffset = 0;
cols = 0;
rows = 0;

/**
* Creates a binary grid of specified dimensions.
*/
static create(rows: number, columns: number): BinaryGrid {
const columnOffset = log2Ceil(columns);
const length = (rows << columnOffset) >> 5;
const grid = new this(length + 1);
grid.columnOffset = columnOffset;
grid.cols = columns;
grid.rows = rows;
return grid;
}

/**
* Returns the bit at given coordinates.
*/
getValue(position: CellPosition): Bit {
const [bucket, bitPosition] = this.getCoordinates(position);
return ((this[bucket] >> bitPosition) & 1) as Bit;
}

/**
* Sets the bit at given coordinates.
*/
setValue(position: CellPosition, value: Bit) {
const [bucket, bitPosition] = this.getCoordinates(position);
const currentValue = (this[bucket] >> bitPosition) & 1;
const hasBeenInserted = currentValue === 0 && value === 1;
this[bucket] = (this[bucket] & ~(1 << bitPosition)) | (value << bitPosition);
return hasBeenInserted;
// Let's breakdown of the above line:
// with an example with a 4-bit integer (instead of 32-bit).
//
// Let's say we want to set the bit at position 2 to 1 and the existing
// bit sequence this[bucket] is 1001. The final bit sequence should be 1101.
//
// First, we clear the bit at position 2 by AND-ing this[bucket] with a
// mask having all 1s except a 0 at the bit position (~ (1 << bitPosition)).
// 1 << bitPosition is 0100 (shifting 0001 to the left by 2)
// Inverting the bits with ~ gives the final mask ~(1 << bitPosition): 1011
//
// Then, we shift the value by the bit position (value << bitPosition: 0100)
// and OR the result with the previous step's result:
// (1001 & 1011) | 0100 = 1101
}

isEmpty() {
return !this.some((bucket) => bucket !== 0);
}

fillAllPositions() {
const thirtyTwoOnes = -1 >>> 0; // same as 2 ** 32 - 1, a 32-bit number with all bits set to 1
this.fill(thirtyTwoOnes);
}

clear() {
this.fill(0);
}

private getCoordinates(position: CellPosition): [bucket: number, position: number] {
const { row, col } = position;
const index = (row << this.columnOffset) + col;
const bucket = index >> 5;
return [bucket, index - (bucket << 5)];
}
}

function log2Ceil(value: number) {
// A faster version of Math.ceil(Math.log2(value)).
if (value === 0) {
return -Infinity;
} else if (value < 0) {
return NaN;
}
// --value handles the case where value is a power of 2
return 32 - Math.clz32(--value);
}
@@ -1,5 +1,5 @@
import { compileTokens } from "../../../formulas/compiler";
import { Token, isExportableToExcel } from "../../../formulas/index";
import { isExportableToExcel, Token } from "../../../formulas/index";
import { getItemId, positions, toXC } from "../../../helpers/index";
import { CellErrorType } from "../../../types/errors";
import {
Expand All @@ -13,11 +13,11 @@ import {
Format,
FormattedValue,
FormulaCell,
invalidateDependenciesCommands,
Matrix,
Range,
UID,
Zone,
invalidateDependenciesCommands,
} from "../../../types/index";
import { FormulaCellWithDependencies } from "../../core";
import { UIPlugin, UIPluginConfig } from "../../ui_plugin";
Expand Down Expand Up @@ -189,6 +189,10 @@ export class EvaluationPlugin extends UIPlugin {
this.evaluator.updateDependencies(cmd);
}
break;
case "DUPLICATE_SHEET":
case "CREATE_SHEET":
this.shouldRebuildDependenciesGraph = true;
break;
case "EVALUATE_CELLS":
this.evaluator.evaluateAllCells();
break;
Expand Down

0 comments on commit f021d4f

Please sign in to comment.