Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load evaluted values in parallel #609

Closed
doczoidberg opened this issue Feb 27, 2021 · 5 comments
Closed

Load evaluted values in parallel #609

doczoidberg opened this issue Feb 27, 2021 · 5 comments
Labels
Feature Something we can add later on without introducing a breaking change

Comments

@doczoidberg
Copy link

We have the problem that the loading of our data takes a long time because all cell formulas are evaluated when the data is set.

Excel saves the evaluated cell value in parallel and cells are only evaluated on changes. Because of that the first data loading is much faster (<1 second instead of 1 minute).

I don't know how complicated is that to implement. If you don't have the time maybe you could give a hint what I could do?

@doczoidberg doczoidberg added the Feature Something we can add later on without introducing a breaking change label Feb 27, 2021
@bardek8
Copy link
Collaborator

bardek8 commented Feb 27, 2021

We understand your pain point, but this would be difficult to change, as we assume that after setting some values we (re)compute all the affected cells. Actually, in our benchmarks we saw that the most time-consuming part is the construction of the graph of dependencies between cells, in particular when you have big ranges / whole columns, not the evaluation.
I provided some ideas how to speed up HF on your sheet by slightly modifying its structure or the formulas here.

Do you remember that you can use batch to add all your data at once? This will ensure that you evaluate all cells exactly once.

In general, HF does not parallelize well, but if you need to speed up the first loading phase and later you can wait for the result of evaluation, you can first add the (raw) data and later add only the cells with formulas. This will simply run all the computations later.

To sum up, I wouldn't recommend changing the behavior of HF. I hope that some of the above hints will already help you speed HF up.

@wojciechczerniak
Copy link
Contributor

Excel stores values as cache and IMO the cache management is one layer higher than the calculation engine. It depends on the use case so it's up to the developers to implement it according to their needs / tech stack. I wouldn't change that in HyperFormula, as this is already possible to do with an abstraction layer over the calculation engine.

Note: Pseudo-code, just to show the idea. Incomplete.

Server payload should have both, cached value and the formula assigned to the cell:

{
  "status": 200,
  "error": null,
  "data": [ 
    [ { "value": 3, "formula": "=SUM(B1:B2)" }, { "value": 1, "formula": 1 } ],
    [ { "value": null, "formula": null }, { "value": 2, "formula": 2 } ]
  ],
  "namedExpressions": [  ]
}

App initialization will load the data and create an instance of our cached formula engine:

const response = await fetchDataFromServer();

if (reposne.error) {
  throw new Error(response.error);
}

const data = response.data;
const namedExpressions = response.namedExpressions;
const formulas = new MyFormulaEngine({ data, namedExpressions });

const value = formulas.getValue('A1'); 
// returns value "3" instantly, from server cache

Then we need an engine to manage HyperFormula, and if it's not ready, get the value from server cache:

class MyFormulaEngine {
  private hyperformulaReady = false;
  private dataCache;

  constructor({ data }) {
    this.dataCache = data.map(row => row.map(cell => cell.value));
    
    const hfData = data.map(row => row.map(cell => cell.formula));

    // JS is single threaded so we will need a Worker or Node.js process,
    // this highly depends on the tech stack used by your company
    this.hyperformula = new MyHyperFormulaWorker();
    this.hyperformula.addEventListener('hyperformulaReady', () => { 
      this.hyperformulaReady = true;
      this.dataCache = null; // release the memory
    });
    this.hyperformula.loadData(hfData);
  }

  /**
   * Method returns the value for cell from HyperFormula OR fallbacks to the cache
   * until the HyperFormula is ready.
   */
  getValue(cellAddress) {
    const simpleCellAddress = this.hyperformula.toSimpleCellAddress(cellAddress);

    if (this.hyperformulaReady) {
       return this.hyperformula.getCellValue(simpleCellAddress);
    }

   return this.dataCache[simpleCellAddress.row][simpleCellAddress.column];
  }

  /**
   * Disable editing until the calculation engine is ready. Show loader or whatever you like.
   */
  get readOnly() {
    return !this.hyperformulaReady;
  }
}

@doczoidberg
Copy link
Author

Thanks for your answer.

But I am not really sure if the generation of the cell dependency graph could be faster. When I handover our quite complex Excel file to another machine it is also loaded in one second without any caching.

As far as I understand to get all dependencies of a Cell one have to evaluate the precedents of all cells. Is this what takes so much time? I think there should be a way to build the dependecy graph faster?

@wojciechczerniak
Copy link
Contributor

wojciechczerniak commented Mar 3, 2021

Excel file IS the cache.

We're parsing the XLSX files for https://handsontable.com/spreadsheet-viewer/ so we got to know their structure a little bit. Inside the file they keep a lot of meta data for each cell:

Key Description
v raw value (see Data Types section for more info)
w formatted text (if applicable)
t type: b Boolean, e Error, n Number, d Date, s Text, z Stub
f cell formula encoded as an A1-style string (if applicable)
F range of enclosing array if formula is array formula (if applicable)
r rich text encoding (if applicable)
h HTML rendering of the rich text (if applicable)
c comments associated with the cell
z number format string associated with the cell (if requested)
l cell hyperlink object (.Target holds link, .Tooltip is tooltip)
s the style/theme of the cell (if applicable)

Notice the v, w and f. The value is already there as a v or w and is ready to be displayed instantly. That's why the file is opened in a second. Formulas from f field are loaded later and used when there are changes to the content. From the user perspective this looks like an instant evaluation, but is a clever trick for "perceived performance". They don't even have to apply the formatting! Formats will be re-calculated on change.

Most viewers work like that. Excel by default opens the file for viewing purposes only and has "Edit this file" button. We even do this! The Spreadsheet Viewer (https://handsontable.com/spreadsheet-viewer/) don't have a graph parser.

My idea from #609 (comment) is based on the Excel file. You store the calculated value on the database (file) and send it to the app together with the formula. Perceived performance is an app loaded instantly while the graph is build in the background. All the animations, fade ins, finding the cell you want to edit, opening the editor, validation and saving the data takes time. At this point the engine should be initialized and the user won't even notice that the calculation engine was not ready from the beginning.


Edit: Moreover, you can use the SSR technique to send the view with cached values as RAW HTML with the first page load and "hydrate" the data with interactive calculation engine afterwards. Should work with any front-end framework that has SSR support. This way you could deliver the perfect perceived performance to the user.

@doczoidberg
Copy link
Author

Thanks for your explanation @wojciechczerniak ! I know the spreadsheet viewer and also looked into xlsx format.

I didn't know that excel does the graph building in background after I opened the file. Than I was "tricked" by that because of the already shown values. I thought at least the graph is build when I open the file (not in background). I already knew that only the values are shown but to edit of course the graph has to be built before.

@handsontable handsontable locked and limited conversation to collaborators Mar 4, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Feature Something we can add later on without introducing a breaking change
Projects
None yet
Development

No branches or pull requests

3 participants