Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] evaluation: use custom data structures instead of Map and Set #3823

Closed
wants to merge 1 commit into from

Commits on Mar 18, 2024

  1. [PERF] evaluation: use custom data structures instead of Map and Set

    This commit drastically improves the evaluation execution time (~43% faster
    with the large formula demo dataset)
    
    `Map` and `Set` data structures are slow compared to objects and arrays
    (especially with `BigInt`, more on that later). Why use `Map` and `Set` then ?
    
    We needed a set data structure to hold which positions should be evaluated. We
    obviously don't want duplicates (hence the set). However, we can't blindly add
    a position object `{ sheetId, col, row }` in a `Set` though because the same
    position could be added twice by adding the same position as two different
    objects that are not referentially equal.
    That's why we came up with a way to encode the position object to a single
    value that can be added safely to the set without worrying about object
    references being equal. This encoded value is a `BigInt` (see
    `PositionBitsEncoder`). That's why we used `Set<BigInt>`. Since we had the
    position encoded as a single value, we naturally used that value as a key in a
    `Map` (regular `object` can only have `string` or `number` keys, not `BigInt`).
    Using this `BigInt` key also avoid having to decode the encoded value, then
    have a nested `[sheetId][col][row]` objects.
    
    In the evaluation process, there's a lot going on. To the point were lookup in
    the map is a bottleneck.
    
    This commit replaces the `Set` and `Map` in the evaluation by custom data
    structure `PositionSet` and `PositionMap<T>` with faster addition and lookup.
    
    In addition to being faster, this makes the evaluation code simpler as we no
    longer need encoding/decoding everywhere. Those data structures are also easier
    to debug. With the `BigInt` encoding, it was hard to find back which position
    is the number `28416511n`.
    
    Some measures
    ----------------------
    *(all measures are averages over 8 runs)*
    
    ```js
    console.time("eval");
    o_spreadsheet.__DEBUG__.model.dispatch("EVALUATE_CELLS")
    console.timeEnd("eval");
    ```
    On the large formula demo data set (260k cells): -43%
    
    |        	| Time  	| Allocated memory 	|
    |--------	|-------	|------------------	|
    | Before 	| 512ms 	| 190MB            	|
    | After  	| 291ms 	| 122MB            	|
    
    I also tried implementing `PositionSet` with a `PositionMap<boolean>` instead
    of `BinaryGrid`. It's ~10% slower and allocates ~20MB more (but it's 120 LoC
    less).
    
    On RNG's Timesheet spreadsheet: -23%
    
    |        	| Time  	| Allocated memory 	|
    |--------	|-------	|------------------	|
    | Before 	| 1894ms 	| 580MB            	|
    | After  	| 1469ms 	| 460MB            	|
    
    Basic tests on the raw data structures with 1M positions (10k rows and 100
    columns)
    
    |   	|                	| add   	| has   	|
    |---	|----------------	|-------	|-------	|
    | 1 	| `Set<BigInt>`    	| 314ms 	| 262ms 	|
    | 2 	| `Set` + encoding 	| 440ms 	|       	|
    | 3 	| `PositionSet`    	| 43ms  	| 46ms  	|
    
    |   	|                     	| set   	| get   	|
    |---	|---------------------	|-------	|-------	|
    | 4 	| `Map<BigInt, number>` 	| 365ms 	| 304ms 	|
    | 5 	| `PositionMap<number>` 	| 47ms  	| 51ms  	|
    
    Setup code shared by all benchmarks:
    ```js
    const sheetId = "1";
    const postions = [];
    for (let i = 0; i < 10000; i++) {
      for (let j = 0; j < 100; j++) {
        postions.push({ sheetId, row: i, col: j });
      }
    }
    ```
    1. `Set<BigInt>` 314ms	262ms
    ```js
    const set = new Set();
    const length = BigInt(postions.length);
    performance.mark("add");
    for (let i = 0n; i < length; i++) {
      set.add(i);
    }
    performance.measure("add", "add").duration);
    
    performance.mark("has");
    for (let i = 0n; i < length; i++) {
      set.has(i);
    }
    performance.measure("has", "has").duration;
    ```
    
    2. `Set<BigInt> + encoding` 440ms
    ```js
    const encoder = new PositionBitsEncoder();
    const set = new Set();
    performance.mark("add");
    for (const position of postions) {
      set.add(encoder.encode(position));
    }
    performance.measure("add", "add").duration;
    ```
    
    3. `PositionSet` 43ms	46ms
    ```js
    const set = new PositionSet({
        [sheetId]: { rows: 10000, cols: 100 },
    });
    performance.mark("add");
    for (const position of postions) {
      set.add(position);
    }
    performance.measure("add", "add").duration;
    
    performance.mark("has");
    let r;
    for (const position of postions) {
      r = set.has(position);
    }
    performance.measure("has", "has").duration;
    ```
    
    4. `Map<BigInt, number>` 365ms	304ms
    
    ```js
    const map = new Map();
    const length = BigInt(postions.length);
    performance.mark("set");
    for (let i = 0n; i < length; i++) {
      map.set(i, 4);
    }
    performance.measure("set", "set").duration;
    
    performance.mark("get");
    let r;
    for (let i = 0n; i < length; i++) {
      r = map.get(i);
    }
    performance.measure("get", "get").duration;
    ```
    
    5. `PositionMap` 47ms	51ms
    
    ```js
    performance.mark("add");
    for (const position of postions) {
      map.set(position, 4);
    }
    performance.measure("add", "add").duration;
    
    performance.mark("get");
    let r;
    for (const position of postions) {
      r = map.get(position);
    }
    performance.measure("get", "get").duration;
    ```
    
    Task: 3802246
    LucasLefevre committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    f021d4f View commit details
    Browse the repository at this point in the history