Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async functions support #9

Open
wojciechczerniak opened this issue Nov 14, 2019 · 14 comments
Open

Async functions support #9

wojciechczerniak opened this issue Nov 14, 2019 · 14 comments
Labels
Feature Something we can add later on without introducing a breaking change Function Feature or bug in formula function Verified Verified by Handsoncode

Comments

@wojciechczerniak
Copy link
Contributor

Description

Common request for our previous formula engine. Let's keep it as a feature request.

Sample

parser.setFunction('SUM_ASYNC', (params) => {
  return new Promise((resolve) => {
    return setTimeout(() => resolve(params[0] + params[1]), 1000));
  }
});

parser.parse('SUM_ASYNC(2, 3)').then((result, error) => { });
@wojciechczerniak wojciechczerniak added the Feature Something we can add later on without introducing a breaking change label Dec 6, 2019
@wojciechczerniak wojciechczerniak added this to the Future milestone Dec 10, 2019
@krzysztofspilka krzysztofspilka added the Verified Verified by Handsoncode label Jun 23, 2020
@wojciechczerniak wojciechczerniak changed the title Async formulas support Async functions support Jul 7, 2020
@wojciechczerniak wojciechczerniak added the Function Feature or bug in formula function label Jul 7, 2020
@wojciechczerniak wojciechczerniak removed this from the Next milestone Feb 1, 2021
@MartinDawson
Copy link
Contributor

MartinDawson commented Nov 6, 2021

This would be an extremely useful feature, especially for custom functions that need to load data from the backend like our FINANCIAL API.

Currently we preload data on page load but later on will need async functions I think as we can't preload all financial data for everything.

Is this on the roadmap soon and is their an idea of how this will need to be done? Because I could try and do it in a few months when we will need it if there's no plans to implement it yet

@MartinDawson
Copy link
Contributor

MartinDawson commented Nov 22, 2021

@wojciechczerniak I'm going to try and implement this feature because we need it for our site on custom functions.

Could you give some rough guidance on the steps that need to be taken to do this please?

Unless this feature is a massive re-write or something.

@wojciechczerniak
Copy link
Contributor Author

wojciechczerniak commented Nov 22, 2021

Key Concepts will be a good start to understand how HyperFormula works with AST. I would start with Evaluator or Interpreter, who calls function and when. Then go from there through each caller to make it async as well. This would stop at public methods, I think. Maybe @bardek8 or @izulin have more detailed ideas. Sorry, I couldn't be more helpful here, not my expertise.

Some of my thoughts and design considerations:

Async and non async

Both function types should work. Preferably without changing everything to automatically resolved promises. We need to discover when function is async or not.

Parallel evaluation

The naive implementation may traverse AST as synchronous code does. One function at a time. But while for synchronous code we cannot do anything about this, the asynchronous code is a different story. If asynchronous functions are used to make HTTP requests, we may parallelize them with Promise.all and continue resolving formulas after all input is ready.

Repetitive ASTs

As mentioned in Key Concepts guide, same functions and ranges are grouped to avoid calculating the same value twice. But does it apply to asynchronous functions? I bet they will not always return same value. ie =REMOTE_GUID() to generate some unique ID, or =RANDOM_PRODUCT() that may return random output for each use case.

Error handling

What will happen if one of the promises is rejected? We need a new ErrorType, and ErrorCode. But most importantly, we need to make sure that we handle Promise.reject well and gracefully.

Retries

Should we retry if Promise is rejected? Or what if a function author loop promises inside his code? Or there are more than one promise and HTTP request within a single function.

Circuit breaker

If we have retries, how many is too many? We should break the loop at some point and display an error (different one, than before?). Even if retries are not our code, we should have a timeout for each asynchronous function in case they take to long to finish

Circular dependencies

I totally have no idea how to handle those. An async function may want to make request based on results of some other functions that are also a result of the formula. Edge case, but may bite us 😵

Named Ranges

They're more or less cells with different addressing. But async functions should be tested with scopes.

Volatile

Does asynchronous functions are always volatile and should be re-requested each time a change to workbook is made? Probably yes if want to keep most of HF logic. Maybe responses should be read from cache when possible then? An external cache should be sufficient, ServiceWorker can be used for example. So maybe we don't have to worry about it within HF code.

Undo-redo

What will happen on undo? Should we restore previous value or make a new async request?

API has to be async

Each method has to be considered separately. But most of them return changes array. Therefore they need to all calculations to end before code can be evaluated further. That means most of API methods has to return Promise<changes> instead just changes.

Before:

const changes = hfInstance.setCellContents({ col: 3, row: 0, sheet: 0 }, [['=B1']]);

After:

const changes = await hfInstance.setCellContents({ col: 3, row: 0, sheet: 0 }, [['=B1']]);

Helper functions

ie. calculateFormula when called will always make a new request when async function was used or referenced ranges contain one. It also should become asynchronous: async calculateFormula(formula: string): Promise<result>

Async functions + Array functions

This combination of features blows my mind 🤯 Returned arraySize should be async as well then. But can we wait until function is fully evaluated and all promises resolved? or should we predict a value first?

Batch

I don't see why it wouldn't work if all the above is done correctly. Just make sure we don't make any requests or resolve promises while in suspendEvaluation state.

Mocks

If you're going to make a PR, please consult library for creating network mocks. I believe we don't have one yet in this project as it wasn't needed. Any new tool should be convenient for @handsontable/devs, as they will maintain the functionality later on.

@MartinDawson
Copy link
Contributor

MartinDawson commented Nov 22, 2021

Thanks for the information. That definitely helps. Will start on it tomorrow and see how it goes.

@MartinDawson
Copy link
Contributor

MartinDawson commented Nov 24, 2021

@wojciechczerniak I've got initial async functions working in custom-functions.spec.ts. Ignore the test file changes. Evaluator.ts & Interpreter.ts are the main changes.

https://github.com/TrackTak/hyperformula/tree/async-functions

Async and non async
Both function types should work. Preferably without changing everything to automatically resolved promises. We need to discover when function is async or not.

I'm having some trouble with understanding how this would be possible & specifically what you mean by this?

Currently in the branch I've changed any public function that internally calls recomputeIfDependencyGraphNeedsIt to be async and return a promise.

If we have async functions such as ASYNC_SUM which affect other cells then even functions like addRows() will need to return promises I think. Because it will need to recalculate these cells in an async way.

If the async functions are just functions that hit backend API's and don't depend on other cells then I can see how addRows could stay synchronous because it would never affect another cell and I could add seperate intenral methods that are sync & async versions.

Is this correct or am I wrong here?

I just wanted clarification before I end up changing all the test files if I am wrong.

@wojciechczerniak
Copy link
Contributor Author

Async and non async
Both function types should work. Preferably without changing everything to automatically resolved promises. We need to discover when function is async or not.

I'm having some trouble with understanding how this would be possible & specifically what you mean by this?

I see two options, runFunctionSync + runFunctionAsync or recognize functions that are returning Promise with instanceof Promise.

Currently in the branch I've changed any public function that internally calls recomputeIfDependencyGraphNeedsIt to be async and return a promise.

If we have async functions such as ASYNC_SUM which affect other cells then even functions like addRows() will need to return promises I think. Because it will need to recalculate these cells in an async way.

We're not mixing functions with api methods? All API methods will be async always.

If the async functions are just functions that hit backend API's and don't depend on other cells then I can see how addRows could stay synchronous because it would never affect another cell and I could add seperate intenral methods that are sync & async versions.

Unless we decide to drop requirement to return changes. But IMO it's a very useful functionality.

@MartinDawson
Copy link
Contributor

MartinDawson commented Nov 25, 2021

  • Async and non async - Not possible (see below). And what's benefit of this anyway even if possible? runFunction should always be async.
  • Parallel evaluation - Complex Enhancement
  • Repetitive ASTs - volatile: true should handle this automatically. Not async specific.
  • Error handling - Added TIMEOUT error & handling normal errors.
  • Retries - Need clarification if @handsontable/devs actually want this or not
  • Circuit breaker - Need clarification if @handsontable/devs actually want this or not
  • Circular dependencies - Shouldn't matter. It's already handled by existing code unless it's part of parallel evaluation which is mentioned blelow.
  • Named Ranges - Handled by existing code tests.
  • Volatile - Users should implement their own cache on backend for requests. If a function is volatile then use volatile: true
  • Undo-redo - Should re-do the function again like normal. Google Sheets does this.
  • API has to be async
  • Helper functions
  • Batch - Handled by existing code.
  • Mocks - Should not be needed. There's no specific source code around networks needed if users implement their own caching on backend.

Error handling
What will happen if one of the promises is rejected? We need a new ErrorType, and ErrorCode. But most importantly, we need to make sure that we handle Promise.reject well and gracefully.

I've added a timeoutTime property on the config & a new #TIMEOUT! error. TrackTak/hyperformula@af124a0

If this is fine I can add the translations.

Parallel evaluation
The naive implementation may traverse AST as synchronous code does. One function at a time. But while for synchronous code we cannot do anything about this, the asynchronous code is a different story. If asynchronous functions are used to make HTTP requests, we may parallelize them with Promise.all and continue resolving formulas after all input is ready.

Not too sure how to do this. I think it needs to a change to the algorithm for the sorting of formulaVertexes, because what happens if the async formulas have cell references in it?

For example, the following code:

  const engine = await HyperFormula.buildFromArray([
        [1, '=ASYNC_FOO(A1)', '=ASYNC_FOO(A1+A4)', 6],
        [2, '=ASYNC_FOO(B1)']
      ])

I think requires algorithm change to group the separate tree's so this will be possible:

Promise.all([A2,A3]) & Promise.all([B2])

Currently this.dependencyGraph.topSortWithScc() returns a sorted flat array.

I see two options, runFunctionSync + runFunctionAsync or recognize functions that are returning Promise with instanceof Promise.

This is not possible.

Cannot split out runFunction & runFunctionAsync. All of them must be runFunctionAsync because non-async functions such as SUM() could contain async AST's like =SUM(=PRICE("AMZN"), 2).

So the evaluateAst in runFunction must be done in an async manner always like so:

Volatile
Does asynchronous functions are always volatile and should be re-requested each time a change to workbook is made? Probably yes if want to keep most of HF logic. Maybe responses should be read from cache when possible then? An external cache should be sufficient, ServiceWorker can be used for example. So maybe we don't have to worry about it within HF code.

I do not beleive that all async functions should be automatically marked as volatile.

Consider this function: =PRICE("AMZN", "price", DATE(2020,12,31), 1). It needs to be async but it will always return the same value.

Users of the FunctonPlugin should just mark which functions are volatile with the volatile: true flag.

Also for the ServiceWorker cache, I read the docs and I'm not sure how this can work. Different function params could require different cache update times: i.e

=PRICE("AMZN", "price").
=PRICE("AMZN", "revenue")

The second functions network request should be cached for a long time whereas the first should probably be cached for a short period (perhaps every 15 minutes for example) because price updates a lot faster than revenue.

The ServiceWorker cannot know whether to refetch the data or not.

I took a look at the caching strategies here: https://developers.google.com/web/tools/workbox/modules/workbox-strategies

None of these seem to be what I think is needed. These seem to be suited more towards Hyperformula working in offline mode (which seems like another ticket).

Perhaps we should not implement a cache and instead just let users of Hyperformula implement their own caching for these functions on their backend?

This way the network request always fires but it returns a cached value or the updated one always so it's always up to date.

What will happen on undo? Should we restore previous value or make a new async request?

It should make a new network request as mentioned above with the backend cache this should be fine anyway. This means there's no stale data either.

@MartinDawson MartinDawson mentioned this issue Nov 26, 2021
5 tasks
@wojciechczerniak
Copy link
Contributor Author

Error handling
What will happen if one of the promises is rejected? We need a new ErrorType, and ErrorCode. But most importantly, we need to make sure that we handle Promise.reject well and gracefully.

I've added a timeoutTime property on the config & a new #TIMEOUT! error. TrackTak@af124a0

If this is fine I can add the translations.

LGTM 👍 I wouldn't bother with translation until PR is accepted

Parallel evaluation
The naive implementation may traverse AST as synchronous code does. One function at a time. But while for synchronous code we cannot do anything about this, the asynchronous code is a different story. If asynchronous functions are used to make HTTP requests, we may parallelize them with Promise.all and continue resolving formulas after all input is ready.

Not too sure how to do this. I think it needs to a change to the algorithm for the sorting of formulaVertexes, because what happens if the async formulas have cell references in it?

For example, the following code:

  const engine = await HyperFormula.buildFromArray([
        [1, '=ASYNC_FOO(A1)', '=ASYNC_FOO(A1+A4)', 6],
        [2, '=ASYNC_FOO(B1)']
      ])

I think requires algorithm change to group the separate tree's so this will be possible:

Promise.all([A2,A3]) & Promise.all([B2])

Currently this.dependencyGraph.topSortWithScc() returns a sorted flat array.

I'm not sure either. Not my expertise, sorry. CC @bardek8 or @izulin

I see two options, runFunctionSync + runFunctionAsync or recognize functions that are returning Promise with instanceof Promise.

This is not possible.

Cannot split out runFunction & runFunctionAsync. All of them must be runFunctionAsync because non-async functions such as SUM() could contain async AST's like =SUM(=PRICE("AMZN"), 2).

So the evaluateAst in runFunction must be done in an async manner always like so:

      const promises = args.map((ast) => this.evaluateAst(ast, state))
      const values = await Promise.all(promises)

      argValues = values.map((value) => [value, false])

I see. Well, that's unfortunate and another breaking change. Just async public methods are enough to make it a feature for HF 2.0, but still another pain to migrate. Unless @bardek8 @izulin @voodoo11 see some other way to do it.

Volatile
Does asynchronous functions are always volatile and should be re-requested each time a change to workbook is made? Probably yes if want to keep most of HF logic. Maybe responses should be read from cache when possible then? An external cache should be sufficient, ServiceWorker can be used for example. So maybe we don't have to worry about it within HF code.

I do not beleive that all async functions should be automatically marked as volatile.

Consider this function: =PRICE("AMZN", "price", DATE(2020,12,31), 1). It needs to be async but it will always return the same value.

Users of the FunctonPlugin should just mark which functions are volatile with the volatile: true flag.

I agree. That's a good design decision 👍 Thank you.

Also for the ServiceWorker cache, (...)

Perhaps we should not implement a cache and instead just let users of Hyperformula implement their own caching for these functions on their backend?

Yeah, that's what I had in mind. ServiceWorker is application developer responsibility. If we have volatile and undo-redo sorted out, caching wont be necessary internally.

What will happen on undo? Should we restore previous value or make a new async request?

It should make a new network request as mentioned above with the backend cache this should be fine anyway. This means there's no stale data either.

Agree 👍

@wojciechczerniak
Copy link
Contributor Author

Retries - Need clarification if @handsontable/devs actually want this or not
Circuit breaker - Need clarification if @handsontable/devs actually want this or not

IMO timeout is enough of a circuit breaker. Function author may loop and retry inside function as long as it's within the time limit.

@MartinDawson
Copy link
Contributor

New draft PR: #863

@warpech
Copy link
Member

warpech commented Feb 4, 2022

The PR #863 was abandoned after a sincere attempt by @MartinDawson. Maybe Martin would like to explain here what's his current approach to async functions, if there is any?

FWIW, right now, the easiest way at achieve the asynchronicity is in the user code outside of HyperFormula. I mean a two-step strategy, in which your custom function e.g. SHARE_PRICE("AMZN") synchronously resolves to a value from cache (or gives an error if the cache is empty) while asynchronously fetching the current value. When fetching is done, update the cache and reevaluate getCellValue/getSheetValues. We will soon add such example to the docs #892.

Or, if you need it to run in one pass, simply resolve your async code before feeding the data to Hyperformula.

Implementing async functions as discussed here is not something that we are working on right now. The focus for this quarter is to improve what's available under the existing API (bug fixes, docs).

We might put async functions on the roadmap in the upcoming quarter, because it is obviously useful for some scenarios.

@MartinDawson
Copy link
Contributor

@warpech I'm using this (buggy) PR still: #879

I couldn't really understand how to fully get your suggestion to work from trying it so I am not doing it that way.

Issues with my draft PR is that I should not have changed evaluator.ts to be async and instead calculated async values in a separate file like the ArraySize is done basically.

There's also issues with async dependencies, array formulas & parallelisation that don't properly work in it in more complex scenario's.

I'm working on it again though bit by bit (as I need it) as I know how to do it much better now I understand HF more so when async functions is picked up in this or a future quarter I can make another PR and should have majority of it done by then.

@JohannCooper
Copy link

FWIW, right now, the easiest way at achieve the asynchronicity is in the user code outside of HyperFormula. I mean a two-step strategy, in which your custom function e.g. SHARE_PRICE("AMZN") synchronously resolves to a value from cache (or gives an error if the cache is empty) while asynchronously fetching the current value. When fetching is done, update the cache and reevaluate getCellValue/getSheetValues.

@warpech, I know this thread hasn't been active in a while but is there any update? Is the strategy quoted above still the recommended approach? Thanks!

@MartinDawson
Copy link
Contributor

MartinDawson commented Apr 30, 2024

@JohannCooper Sorry for (very late) reply, I got it working a while back with async cells at each level (i.e promise.all() at each level down the ayclic directed graph tree) but moved onto something else so never got back to this PR

You can see the repo here: https://github.com/TrackTak/hyperformula

Specifically: TrackTak/hyperformula@ca9bbad
TrackTak/hyperformula@d873985
TrackTak/hyperformula@ee54bab

And (unfinished) chunking of async values (which is more performant): https://github.com/TrackTak/hyperformula/commits/chunk-async-formulas

IIRC @warpech suggestion didn't fully work for me for some reason or was not efficient enough, I can't 100% remember

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Something we can add later on without introducing a breaking change Function Feature or bug in formula function Verified Verified by Handsoncode
Projects
None yet
Development

No branches or pull requests

5 participants