-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve efficiency of the dependency graph (56% speed-up) #1293
Conversation
Performance comparison of head (69095fc) vs base (0e39178)
|
…aps and Sets of nodes - 15% speedup
…s not look like a date/time value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🥇
I tested the changes using the demo provided in the PR's description, and here are my results:
- On Chrome (browser), there is a 60% improvement (from ~25s to ~10s);
- Node (v20), there is a 58% improvement (from ~29s to ~12s);
@budnix Could you also run the import { AlwaysDense, HyperFormula } from 'hyperformula';
function columnIndexToLabel(column) {
let result = ''
while (column >= 0) {
result = String.fromCharCode((column % 26) + 97) + result
column = Math.floor(column / 26) - 1
}
return result.toUpperCase()
}
function simpleCellAddressToString(address) {
const column = columnIndexToLabel(address.col)
return `${column}${address.row + 1}`
}
const cols = 50;
const data = [];
const firstRow = [1];
for (let i = 1; i < cols; ++i) {
const adr = simpleCellAddressToString({sheet: 0, row: 0, col: i - 1});
firstRow.push(`=${adr} + 1`);
}
data.push(firstRow);
for (let i = 1; i < cols; ++i) {
const rowToPush = Array(i).fill(null);
const startColumn = columnIndexToLabel(i - 1);
for (let j = i - 1; j < cols - 1; ++j) {
const endColumn = columnIndexToLabel(j);
rowToPush.push(`=SUM(${startColumn}:${endColumn})`);
}
data.push(rowToPush);
}
const sheetId = 0;
const ty1 = (new Date()).getTime();
for (let i = 1; i < 200 ; ++i) {
const hf = HyperFormula.buildFromArray([], {
licenseKey: 'gpl-v3',
maxRows: 500100,
useStats: true,
chooseAddressMappingPolicy: new AlwaysDense(),
});
hf.setSheetContent(sheetId, data);
}
const ty2 = (new Date()).getTime();
console.log(ty2 - ty1); |
Using your code, I spotted performance degradation. However, considering it's a rare use (hundreds of sheets created by |
Released in v2.6.0 |
awesome! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1293 +/- ##
===========================================
+ Coverage 97.23% 97.25% +0.01%
===========================================
Files 167 169 +2
Lines 14304 14408 +104
Branches 3065 3092 +27
===========================================
+ Hits 13909 14012 +103
- Misses 395 396 +1
|
Test case
A spreadsheet with 500k rows and 10 columns filled with string data. Total of 5M data cells, no formulas.
Script:
Ideas for improvement
getTopSortedWithSccSubgraphFrom
, which is the iterative implementation of the Tarjan algorithm that performs the topological sorting of the dependency graph and finds cycles (SCCs) in the graph. In this test case, the dependency graph is trivial; it contains only isolated nodes without any edges. It seems that it can be done more efficiently.parseDateTimeFromConfigFormats
, which tries to parse all string data as date/time values. This test case contains no date/time values, so there might be some way of saving time by determining it quickly and avoiding running the heavy parsing operations.This PR focuses on optimizing topological sorting of the dependency graph
I made the Tarjan algorithm more efficient by changing the data structures it uses. Initially, the information about the graph nodes was stored in maps and sets with nodes as keys:
My approach was to use simple arrays indexed by integer ids and keep a single array of nodes as a mapping from id to node data.
Results
Total time:
Before: 25923ms
After: 11160ms
Function
getTopSortedWithSccSubgraphFrom
:Before: 51.6%
After: 23.1%
Profiler:
Before:
After:
Profiler: Chrome Dev Tools
Overall, I achieved the 56,02% speed-up of HyperFormula for this use-case.
How did you test your changes?
Comparison on our existing performance benchmarks on my PC:
Column ranges benchmark
The 40% slow-down is observed only in Node environment. Running this benchmark in V8 yields similar results before and after applying the change.
Types of changes
Related issues:
Fixes #876
Checklist: