Load evaluted values in parallel #609

doczoidberg · 2021-02-27T10:08:16Z

We have the problem that the loading of our data takes a long time because all cell formulas are evaluated when the data is set.

Excel saves the evaluated cell value in parallel and cells are only evaluated on changes. Because of that the first data loading is much faster (<1 second instead of 1 minute).

I don't know how complicated is that to implement. If you don't have the time maybe you could give a hint what I could do?

bardek8 · 2021-02-27T14:42:04Z

We understand your pain point, but this would be difficult to change, as we assume that after setting some values we (re)compute all the affected cells. Actually, in our benchmarks we saw that the most time-consuming part is the construction of the graph of dependencies between cells, in particular when you have big ranges / whole columns, not the evaluation.
I provided some ideas how to speed up HF on your sheet by slightly modifying its structure or the formulas here.

Do you remember that you can use batch to add all your data at once? This will ensure that you evaluate all cells exactly once.

In general, HF does not parallelize well, but if you need to speed up the first loading phase and later you can wait for the result of evaluation, you can first add the (raw) data and later add only the cells with formulas. This will simply run all the computations later.

To sum up, I wouldn't recommend changing the behavior of HF. I hope that some of the above hints will already help you speed HF up.

wojciechczerniak · 2021-02-27T16:29:13Z

Excel stores values as cache and IMO the cache management is one layer higher than the calculation engine. It depends on the use case so it's up to the developers to implement it according to their needs / tech stack. I wouldn't change that in HyperFormula, as this is already possible to do with an abstraction layer over the calculation engine.

Note: Pseudo-code, just to show the idea. Incomplete.

Server payload should have both, cached value and the formula assigned to the cell:

{
  "status": 200,
  "error": null,
  "data": [ 
    [ { "value": 3, "formula": "=SUM(B1:B2)" }, { "value": 1, "formula": 1 } ],
    [ { "value": null, "formula": null }, { "value": 2, "formula": 2 } ]
  ],
  "namedExpressions": [  ]
}

App initialization will load the data and create an instance of our cached formula engine:

const response = await fetchDataFromServer();

if (reposne.error) {
  throw new Error(response.error);
}

const data = response.data;
const namedExpressions = response.namedExpressions;
const formulas = new MyFormulaEngine({ data, namedExpressions });

const value = formulas.getValue('A1'); 
// returns value "3" instantly, from server cache

Then we need an engine to manage HyperFormula, and if it's not ready, get the value from server cache:

class MyFormulaEngine {
  private hyperformulaReady = false;
  private dataCache;

  constructor({ data }) {
    this.dataCache = data.map(row => row.map(cell => cell.value));
    
    const hfData = data.map(row => row.map(cell => cell.formula));

    // JS is single threaded so we will need a Worker or Node.js process,
    // this highly depends on the tech stack used by your company
    this.hyperformula = new MyHyperFormulaWorker();
    this.hyperformula.addEventListener('hyperformulaReady', () => { 
      this.hyperformulaReady = true;
      this.dataCache = null; // release the memory
    });
    this.hyperformula.loadData(hfData);
  }

  /**
   * Method returns the value for cell from HyperFormula OR fallbacks to the cache
   * until the HyperFormula is ready.
   */
  getValue(cellAddress) {
    const simpleCellAddress = this.hyperformula.toSimpleCellAddress(cellAddress);

    if (this.hyperformulaReady) {
       return this.hyperformula.getCellValue(simpleCellAddress);
    }

   return this.dataCache[simpleCellAddress.row][simpleCellAddress.column];
  }

  /**
   * Disable editing until the calculation engine is ready. Show loader or whatever you like.
   */
  get readOnly() {
    return !this.hyperformulaReady;
  }
}

doczoidberg · 2021-03-02T11:36:52Z

Thanks for your answer.

But I am not really sure if the generation of the cell dependency graph could be faster. When I handover our quite complex Excel file to another machine it is also loaded in one second without any caching.

As far as I understand to get all dependencies of a Cell one have to evaluate the precedents of all cells. Is this what takes so much time? I think there should be a way to build the dependecy graph faster?

wojciechczerniak · 2021-03-03T17:31:43Z

Excel file IS the cache.

We're parsing the XLSX files for https://handsontable.com/spreadsheet-viewer/ so we got to know their structure a little bit. Inside the file they keep a lot of meta data for each cell:

Key	Description
v	raw value (see Data Types section for more info)
w	formatted text (if applicable)
t	type: b Boolean, e Error, n Number, d Date, s Text, z Stub
f	cell formula encoded as an A1-style string (if applicable)
F	range of enclosing array if formula is array formula (if applicable)
r	rich text encoding (if applicable)
h	HTML rendering of the rich text (if applicable)
c	comments associated with the cell
z	number format string associated with the cell (if requested)
l	cell hyperlink object (.Target holds link, .Tooltip is tooltip)
s	the style/theme of the cell (if applicable)

Notice the v, w and f. The value is already there as a v or w and is ready to be displayed instantly. That's why the file is opened in a second. Formulas from f field are loaded later and used when there are changes to the content. From the user perspective this looks like an instant evaluation, but is a clever trick for "perceived performance". They don't even have to apply the formatting! Formats will be re-calculated on change.

Most viewers work like that. Excel by default opens the file for viewing purposes only and has "Edit this file" button. We even do this! The Spreadsheet Viewer (https://handsontable.com/spreadsheet-viewer/) don't have a graph parser.

My idea from #609 (comment) is based on the Excel file. You store the calculated value on the database (file) and send it to the app together with the formula. Perceived performance is an app loaded instantly while the graph is build in the background. All the animations, fade ins, finding the cell you want to edit, opening the editor, validation and saving the data takes time. At this point the engine should be initialized and the user won't even notice that the calculation engine was not ready from the beginning.

Edit: Moreover, you can use the SSR technique to send the view with cached values as RAW HTML with the first page load and "hydrate" the data with interactive calculation engine afterwards. Should work with any front-end framework that has SSR support. This way you could deliver the perfect perceived performance to the user.

doczoidberg · 2021-03-03T17:43:49Z

Thanks for your explanation @wojciechczerniak ! I know the spreadsheet viewer and also looked into xlsx format.

I didn't know that excel does the graph building in background after I opened the file. Than I was "tricked" by that because of the already shown values. I thought at least the graph is build when I open the file (not in background). I already knew that only the values are shown but to edit of course the graph has to be built before.

doczoidberg added the Feature Something we can add later on without introducing a breaking change label Feb 27, 2021

doczoidberg closed this as completed Mar 3, 2021

handsontable locked and limited conversation to collaborators Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Load evaluted values in parallel #609

Load evaluted values in parallel #609

doczoidberg commented Feb 27, 2021

bardek8 commented Feb 27, 2021

wojciechczerniak commented Feb 27, 2021

doczoidberg commented Mar 2, 2021

wojciechczerniak commented Mar 3, 2021 •

edited

Loading

doczoidberg commented Mar 3, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Load evaluted values in parallel #609

Load evaluted values in parallel #609

Comments

doczoidberg commented Feb 27, 2021

bardek8 commented Feb 27, 2021

wojciechczerniak commented Feb 27, 2021

doczoidberg commented Mar 2, 2021

wojciechczerniak commented Mar 3, 2021 • edited Loading

doczoidberg commented Mar 3, 2021

This issue was moved to a discussion.

wojciechczerniak commented Mar 3, 2021 •

edited

Loading