-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMP] evaluation: introduce array formulas #2090
[IMP] evaluation: introduce array formulas #2090
Conversation
0a48814
to
402b964
Compare
42165bd
to
80fe630
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to implement some array functions, some problems I found so far.
src/types/misc.ts
Outdated
format?: Format; | ||
format?: FunctionReturnFormat; | ||
} | ||
export type FunctionReturnValue = ScalarValue | MatrixValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure for array formula that MatrixValue
should be able to contain undefined values.
For example the function transpose can definetevly return undefined values if some cells of the original matrix are empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to stay consistent here.
When you refer to an empty cell, e.g. A1:"=B1" with empty B1
A1 --> 0
We should have the same behavior for the matrix result.
That's what Excel does anyway.
I know that the current transpose function don't take this behavior into account.
This will be done in the associated PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer GSheet results for array functions, but I guess it make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create a page of doc with how the evaluation works, why it is lke that, and what other options were explored.
I'm too lazy to read the evaluation details myself, I'll let smarter peoples handle this 😅
ef4eb9e
to
80fe630
Compare
Missing functionality : cannot make an array function that returns some spreaded cells in error, eg. EXPAND function |
Missing functionality: array function handling in formulas.
To keep in mind, but probably for a follow-up PR/task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also purge the lazyEvaluation
from Model :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting my comments before leaving, still a lot of files to read
58a631d
to
d7be0ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some little comments
d7be0ce
to
f65e8da
Compare
Will be repaired in the same task that aims to accept errors on the lookup formula |
f65e8da
to
c50c8f7
Compare
7c03c0b
to
9f54fac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work guys! 😳 @laa-odoo @anhe-odoo
I'm still trying to get my brain around all this to get the full picture
Many questions in my comments, some cleaning suggestions. I'll have more later
} | ||
}; | ||
|
||
const updateSpreadCandidates = (i: number, j: number) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @laa-odoo about all those nested funtion declarations which are not particularly easy to follow and read.
Maybe see if we can solve the problem (accessing state from the parent within the functions) with a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more questions/reflexions
No explanations will be provided to understand this commit, be a little imaginative !!! Just kidding, here it is: This commit introduces the base ground for array functions: functions that can return an array or matrix of results and spreading the result over several cells A2:'=SPLIT("KAYAK", "A")' Resulting in: ----------------- | | A | B | C | |---+---+---+---| | 1 | | | | | 2 | K | Y | K | | 3 | | | | ----------------- There's currently no implemented array formula. They are coming with task 3184778. The evaluation plugin is in charge of computing the values of the cells. This is a fairly complex task for several reasons: Reason n°1: Cells can contain formulas that must be interpreted to know the final value of the cell. And these formulas can depend on other cells. ex A1:"=SUM(B1:B2)" we have to evaluate B1:B2 first to be able to evaluate A1. We say here that we have a 'formula dependency' between A1 and B1:B2. Reason n°2: A cell can assign value to other cells that haven't content. This concerns cells containing a formula that returns an array of values. ex A1:"=SPLIT('Odoo','d')" Evaluating A1 must assign the value "O" to A1 and "oo" to B1. We say here that we have a 'spread relation' between A1 and B1. B1 have a spread value from A1. Note that a cell can contain a formula which depends on other cells which themselves can: - contain formulas which depends on other cells (and so on). - contain a spread value from other formulas which depends on other cells (and so on). I - How to build the evaluation ? If we had only formulas dependencies to treat, the evaluation would be simple: the formulas dependencies are directly deduced from the content of the formulas. With the dependencies we are able to calculate which formula must be evaluated before another. Cycles ------ We can also easily detect if the cells are included in reference cycles and return an error in this case. ex: A1:"=B1" B1:"=C1" C1:"=A1" The "#CYCLE" error must be returned for all three cells. But there's more! There are formulas referring to other cells but never use them. This is the case for example with the "IF" formula. ex: A1:"=IF(D1,A2,B1)" A2:"=A1" In this case it is obvious that we have a cyclic dependency. But in practice this will only exist if D1 is true. For this reason, we believe that the evaluation should be **partly recursive**: The function computing a formula cell starts by marking them as 'being evaluated' and then call itself on the dependencies of the concerned cell. This allows to evaluate the dependencies before the cell itself and to detect if the cell that is being evaluated isn't part of a cycle. II - The spread relation anticipation problem The biggest difficulty to solve with the evaluation lies in the fact that we cannot anticipate the spread relations: cells impacted by the result array of a formula are only determined after the array formula has been evaluated. In the case where the impacted cells are used in other formulas, this will require to re-evaluation other formulas (and so on...). ex: A1:"=B2" A2:"=SPLIT('Odoo','d')" in the example above, A2 spreads on B2, but we will know it only after the evaluation of A2. To be able to evaluate A1 correctly, we must therefore reevaluate A1 after the evaluation of A2. We could evaluate which formula spreads first. Except that the array formulas can themselves depend on the spreads of other formulas. ex: A1:"=SPLIT(B3,'d')" A2:="odoo odoo" A3:"=SPLIT(A2,' ')" In the example above, A3 must be evaluated before A1 because A1 needs B3 which can be modified by A3. Therefore, there would be a spatial evaluation order to be respected between the array formulas. We could imagine that, when an array formula depends on a cell, then we evaluate the first formula that spreads located in the upper left corner of this cell. Although this possibility has been explored, it remains complicated to spatially predict which formula should be evaluated before another, especially when the array formulas are located in different sheets or when the array formulas depends on the spreads of each other. ex: A1:"=ARRAY_FORMULA_ALPHA(B2)" A2:"=ARRAY_FORMULA_BETA(B1)" In the example above, ARRAY_FORMULA_ALPHA and ARRAY_FORMULA_BETA are some formulas that could spread depending on the value of B2 and B1. This could be a cyclic dependency that we cannot anticipate. And as with the "IF" formula, array formulas may not use their dependency. It then becomes very difficult to manage... Also, if we have a cycle, that doesn't mean it's bad. The cycle can converge to a stable state at the scale of the sheets. Functionally, we don't want to forbid convergent cycles. It is an interesting feature but which requires to re-evaluate the cycle as many times as convergence is not reached. Thus, in order to respect the relations between the cells (formula dependencies and spread relations), the evaluation of the cells must: - respect a precise order (cells values used by another must be evaluated first) : As we cannot anticipate which dependencies are really used by the formulas, we must evaluate the cells in a recursive way; - be done as many times as necessary to ensure that all the cells have been correctly evaluated in the correct order (in case of, for example, spreading relation cycles). The chosen solution is to reevaluate the formulas impacted by spreads as many times as necessary in several iterations, where evaluated cells can trigger the evaluation of other cells depending on it, at the next iteration. Unfortunately, as all values should be computed inside its round, we had to remove the lazy evaluation process. Performance impacts ------------------- Loading a sheet and several other operations (adding or deleting columns or rows) are slower than before because the dependencies must be (re)built. However, other operations are much faster because they don't require to evaluate the entire sheet. Some numbers with the large formula dataset (260k cells) (averages over 5 runs): Loading: before: 2937ms after: 4052ms => +37% The next commit reduces this overhead to +17% Updating a cell before: 193ms after: 1.9ms => -99% ! formulas support matrices format -------------------------------- We unify the input and output type of the formula compute functions. As the output are already an scalar object with value(s) and format(s) as a scalar/matrix, we decided to have the same type of object for the input, instead of the current matrix of object(s) coming from the range function. So instead of an array of objects, it's now an object of arrays. Co-authored-by: Alexis Lacroix <laa@odoo.com> Co-authored-by: Anthony Hendrickx (anhe) <anhe@odoo.com> Co-authored-by: Lucas Lefèvre <lul@odoo.com>
With this commit, the dependency graph is build lazily. The cost of building the graph is only paid: - at the first update - if there is an array formulas, which shouldn't be most basic spreadsheets With this, loading the large formula data set (260k cells) is ~20% faster Before: 4241ms After: 3457ms (average of 5 runs)
d545657
to
fcde878
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
robodoo r+
This commit prepares the array formula evaluation (next commit) by moving the evaluation plugin file to its own folder Part-of: #2090
No explanations will be provided to understand this commit, be a little imaginative !!! Just kidding, here it is: This commit introduces the base ground for array functions: functions that can return an array or matrix of results and spreading the result over several cells A2:'=SPLIT("KAYAK", "A")' Resulting in: ----------------- | | A | B | C | |---+---+---+---| | 1 | | | | | 2 | K | Y | K | | 3 | | | | ----------------- There's currently no implemented array formula. They are coming with task 3184778. The evaluation plugin is in charge of computing the values of the cells. This is a fairly complex task for several reasons: Reason n°1: Cells can contain formulas that must be interpreted to know the final value of the cell. And these formulas can depend on other cells. ex A1:"=SUM(B1:B2)" we have to evaluate B1:B2 first to be able to evaluate A1. We say here that we have a 'formula dependency' between A1 and B1:B2. Reason n°2: A cell can assign value to other cells that haven't content. This concerns cells containing a formula that returns an array of values. ex A1:"=SPLIT('Odoo','d')" Evaluating A1 must assign the value "O" to A1 and "oo" to B1. We say here that we have a 'spread relation' between A1 and B1. B1 have a spread value from A1. Note that a cell can contain a formula which depends on other cells which themselves can: - contain formulas which depends on other cells (and so on). - contain a spread value from other formulas which depends on other cells (and so on). I - How to build the evaluation ? If we had only formulas dependencies to treat, the evaluation would be simple: the formulas dependencies are directly deduced from the content of the formulas. With the dependencies we are able to calculate which formula must be evaluated before another. Cycles ------ We can also easily detect if the cells are included in reference cycles and return an error in this case. ex: A1:"=B1" B1:"=C1" C1:"=A1" The "#CYCLE" error must be returned for all three cells. But there's more! There are formulas referring to other cells but never use them. This is the case for example with the "IF" formula. ex: A1:"=IF(D1,A2,B1)" A2:"=A1" In this case it is obvious that we have a cyclic dependency. But in practice this will only exist if D1 is true. For this reason, we believe that the evaluation should be **partly recursive**: The function computing a formula cell starts by marking them as 'being evaluated' and then call itself on the dependencies of the concerned cell. This allows to evaluate the dependencies before the cell itself and to detect if the cell that is being evaluated isn't part of a cycle. II - The spread relation anticipation problem The biggest difficulty to solve with the evaluation lies in the fact that we cannot anticipate the spread relations: cells impacted by the result array of a formula are only determined after the array formula has been evaluated. In the case where the impacted cells are used in other formulas, this will require to re-evaluation other formulas (and so on...). ex: A1:"=B2" A2:"=SPLIT('Odoo','d')" in the example above, A2 spreads on B2, but we will know it only after the evaluation of A2. To be able to evaluate A1 correctly, we must therefore reevaluate A1 after the evaluation of A2. We could evaluate which formula spreads first. Except that the array formulas can themselves depend on the spreads of other formulas. ex: A1:"=SPLIT(B3,'d')" A2:="odoo odoo" A3:"=SPLIT(A2,' ')" In the example above, A3 must be evaluated before A1 because A1 needs B3 which can be modified by A3. Therefore, there would be a spatial evaluation order to be respected between the array formulas. We could imagine that, when an array formula depends on a cell, then we evaluate the first formula that spreads located in the upper left corner of this cell. Although this possibility has been explored, it remains complicated to spatially predict which formula should be evaluated before another, especially when the array formulas are located in different sheets or when the array formulas depends on the spreads of each other. ex: A1:"=ARRAY_FORMULA_ALPHA(B2)" A2:"=ARRAY_FORMULA_BETA(B1)" In the example above, ARRAY_FORMULA_ALPHA and ARRAY_FORMULA_BETA are some formulas that could spread depending on the value of B2 and B1. This could be a cyclic dependency that we cannot anticipate. And as with the "IF" formula, array formulas may not use their dependency. It then becomes very difficult to manage... Also, if we have a cycle, that doesn't mean it's bad. The cycle can converge to a stable state at the scale of the sheets. Functionally, we don't want to forbid convergent cycles. It is an interesting feature but which requires to re-evaluate the cycle as many times as convergence is not reached. Thus, in order to respect the relations between the cells (formula dependencies and spread relations), the evaluation of the cells must: - respect a precise order (cells values used by another must be evaluated first) : As we cannot anticipate which dependencies are really used by the formulas, we must evaluate the cells in a recursive way; - be done as many times as necessary to ensure that all the cells have been correctly evaluated in the correct order (in case of, for example, spreading relation cycles). The chosen solution is to reevaluate the formulas impacted by spreads as many times as necessary in several iterations, where evaluated cells can trigger the evaluation of other cells depending on it, at the next iteration. Unfortunately, as all values should be computed inside its round, we had to remove the lazy evaluation process. Performance impacts ------------------- Loading a sheet and several other operations (adding or deleting columns or rows) are slower than before because the dependencies must be (re)built. However, other operations are much faster because they don't require to evaluate the entire sheet. Some numbers with the large formula dataset (260k cells) (averages over 5 runs): Loading: before: 2937ms after: 4052ms => +37% The next commit reduces this overhead to +17% Updating a cell before: 193ms after: 1.9ms => -99% ! formulas support matrices format -------------------------------- We unify the input and output type of the formula compute functions. As the output are already an scalar object with value(s) and format(s) as a scalar/matrix, we decided to have the same type of object for the input, instead of the current matrix of object(s) coming from the range function. So instead of an array of objects, it's now an object of arrays. Part-of: #2090 Co-authored-by: Alexis Lacroix <laa@odoo.com> Co-authored-by: Anthony Hendrickx (anhe) <anhe@odoo.com> Co-authored-by: Lucas Lefèvre <lul@odoo.com>
With this commit, the dependency graph is build lazily. The cost of building the graph is only paid: - at the first update - if there is an array formulas, which shouldn't be most basic spreadsheets With this, loading the large formula data set (260k cells) is ~20% faster Before: 4241ms After: 3457ms (average of 5 runs) closes #2090 Signed-off-by: Lucas Lefèvre (lul) <lul@odoo.com>
Congrats guys 🥳 |
The recent fix in #3622 combined with a slight refactoring of the export for Excel in #2090 let to a situation where cells with non-exportable formulas containing references did not have their content replaced by the evaluated result. closes #4146 Task: 3895465 Signed-off-by: Lucas Lefèvre (lul) <lul@odoo.com>
The recent fix in #3622 combined with a slight refactoring of the export for Excel in #2090 let to a situation where cells with non-exportable formulas containing references did not have their content replaced by the evaluated result. closes #4147 Task: 3895465 X-original-commit: c8df5d0 Signed-off-by: Lucas Lefèvre (lul) <lul@odoo.com>
The recent fix in #3622 combined with a slight refactoring of the export for Excel in #2090 let to a situation where cells with non-exportable formulas containing references did not have their content replaced by the evaluated result. closes #4164 Task: 3895465 X-original-commit: e8d1444 Signed-off-by: Lucas Lefèvre (lul) <lul@odoo.com>
The recent fix in #3622 combined with a slight refactoring of the export for Excel in #2090 let to a situation where cells with non-exportable formulas containing references did not have their content replaced by the evaluated result. closes #4167 Task: 3895465 X-original-commit: 0ef57de Signed-off-by: Lucas Lefèvre (lul) <lul@odoo.com>
The recent fix in #3622 combined with a slight refactoring of the export for Excel in #2090 let to a situation where cells with non-exportable formulas containing references did not have their content replaced by the evaluated result. closes #4166 Task: 3895465 X-original-commit: 0ef57de Signed-off-by: Lucas Lefèvre (lul) <lul@odoo.com>
Description:
[IMP] evaluation: keep in memory evaluated cells
When updating a cell (via the UPDATE_CELL command) we now re-evaluate the
impacted cells only, keeping in memory the up-to-date previously evaluated
cells.
[IMP] evaluation: introduce evaluation by round
With the incoming spreading matrix formulas, we may have formulas spreading a matrix
able to depend on a matrix spread from another formula.
As the current evaluation process was not enough to handle cross-spread dependencies, we
needed to introduce a new evaluation scheme. We decided to make evaluation per round, meaning
that every updated cell during the evaluation process will trigger the re-evaluation of all
its references in the next evaluation round. We then go from one round to another as long as
we still have cell to recompute.
This new scheme has been needed to be able to handle these cross-dependencies without knowing
the size of the future spread before the evaluation, which is something that won't be available
without evaluating the formula.
Unfortunately, as all values should be computed inside its round, we had to remove the lazy evaluation
process.
[IMP] evaluation: introduce graph
Introduce the graph class to facilitate access to formula dependencies and spread dependencies.
[IMP] evaluation: introduce matrix result formula
No explanations will be provided to understand this commit,
be a little imaginative !!!
[IMP] evaluation: formulas support matrix format
Odoo task ID : 3018032
review checklist