Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1842: New xlsx option for ignoring certain nodes for improved performance #2132

Merged
merged 8 commits into from
Sep 21, 2023

Conversation

hofnarwillie
Copy link
Contributor

@hofnarwillie hofnarwillie commented Sep 9, 2022

Summary

This PR is to mitigate the issue described here: #1842
When the xlsx document contains a single dataValidation applied to the entire worksheet (or a very large range) using a cell address like A1:AAA1048000 then the DataValidationsXform.parseClose() function loops through each cell in the range and unless you have a super computer it runs out of memory. In some cases these validations are not relevant to the context in which the file is being parsed, so a simple performance improvement is to skip over specific XLSX nodes (in this case dataValidations).

Some unrelated linting issues were fixed automatically by the husky pre-commit. I've left them in this PR.

Test plan

Included a couple of integration tests and ensured all the other tests still pass. Also updated the README to describe the new option.

Related to source code (for typings update)

Added typescript typings for options passed into workbook.xlsx.readFile() and workbook.xlsx.load(). I have not made any changes to the stream.xslx.* interfaces. Would appreciate some input here from the maintainers of the package as I don't know if it is necessary (i.e. if the stream based implementations will use the same options and parsing techniques).

@aonamrata
Copy link

can someone from the maintainers group review this?
@guyonroche , @alubbe or @Siemienik

@Siemienik
Copy link
Member

I would like, however currently I'm overloaded

@Siemienik Siemienik self-assigned this Apr 6, 2023
Copy link
Member

@Siemienik Siemienik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea on the topic of performance optimization. Thank you for your contribution! I added some proposals to update Readme and reduce unnecessary if. I hope that's not a problem for you 😄 @hofnarwillie, I appreciate your effort, the tests added, and your code is cool 👍

@skypesky or @zurmokeeper, would you like to update README_zh?

Will it merge on the next MergeFest

This PR was reviewed during the MergeFest session.

lib/xlsx/xform/sheet/worksheet-xform.js Outdated Show resolved Hide resolved
spec/integration/data/.gitignore Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@zurmokeeper
Copy link
Contributor

Great idea on the topic of performance optimization. Thank you for your contribution! I added some proposals to update Readme and reduce unnecessary if. I hope that's not a problem for you 😄 @hofnarwillie, I appreciate your effort, the tests added, and your code is cool 👍

@skypesky or @zurmokeeper, would you like to update README_zh?

Will it merge on the next MergeFest

This PR was reviewed during the MergeFest session.

Sure, I'm happy to do that, and after this pr is merged into master, I'll raise a new PR to update README_zh.

@Siemienik Siemienik merged commit 3178efd into exceljs:master Sep 21, 2023
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants