Skip to content

Conversation

@TimG1964
Copy link
Contributor

@TimG1964 TimG1964 commented Feb 1, 2025

This PR follows previous attempts and helpful feedback on Discourse here, and here. There was another thread, too, but I can't find it now. (EDIT Not a discourse thread but comments on a previous PR, here. I think I've followed all of @nhz2's advice in this PR.)

I've removed all dependence on both ZipFiles.jl (in favour of ZipArchives.jl) and EzXML.jl (in favour of XML.jl). I could not replace ZipFiles until after I had replaced EzXML, but EzXML was used ubiquitously throughout the XLSX package so replacing it was a comparatively big task.

Some important caveats:

  • My skill levels are low
  • I am unfamiliar with the internals of XLSX files
  • I am unfamiliar with XML

I have relied on the heavy lifting done by @felipenoris throughout.

This PR passes all tests locally and works perfectly in my use case, which gives some confidence in making this PR. However, I think it will still need review by a competent person both for correctness and for efficiency. I haven't benchmarked, but it seems, subjectively, to be about the same speed as before.

@TimG1964
Copy link
Contributor Author

TimG1964 commented Feb 1, 2025

There are a couple of other things to point out:

  • I had a problem with the escaped characters (&<>"'). There was an element of coding to the test here to get the tests to pass but I also changed the tests a little. I had particular problems with trailing apostrophes! Although things now appear fine, and I think (mis) use of these characters (e.g. in tab names or column headers) is rare, some further validation might be valuable.
  • I've changed the escape.xlsx test file. I think the sheet name in the original was illegal (too long), so I've shortened it to a valid length.
  • The previous runtests.jl script itself relied on the EzXML.findall function, and there is no equivalent in XML.jl. I've included my own code in the runtests.jl script to address this but recognise that using my own code to check itself isn't ideal!

@nhz2
Copy link
Contributor

nhz2 commented Feb 14, 2025

This is great. I can try and help review how ZipArchives.jl is used here. One thing to look into is memory usage when reading a large spreadsheet.

@TimG1964
Copy link
Contributor Author

TimG1964 commented Feb 14, 2025

Thank you @nhz2. I saw that ZipArchives.jl can be used with mmap but I haven't looked at the possibilities there - I'm a bit intimidated at present, tbh. Your advice would be much appreciated!

@TimG1964
Copy link
Contributor Author

TimG1964 commented Feb 14, 2025

Regarding escaped characters, it seems there was a small bug in XML.jl (here). A PR to fix this has been merged into XML.jl but not yet released. I've included an interim work-around for this in #282.

@TimG1964 TimG1964 mentioned this pull request Feb 15, 2025
@felipenoris
Copy link
Owner

Pure awesomeness. Thanks @TimG1964 !

@felipenoris felipenoris merged commit 577b341 into felipenoris:master Feb 27, 2025
22 checks passed
@TimG1964 TimG1964 deleted the Remove-dependency-on-ZipFile branch February 28, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants