Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement when writing large file #157

Merged
merged 5 commits into from
Jan 29, 2021
Merged

Performance improvement when writing large file #157

merged 5 commits into from
Jan 29, 2021

Conversation

pascalr0410
Copy link
Contributor

@pascalr0410 pascalr0410 commented Jan 6, 2021

Hello,

This PR to dramatically improve performance when writing large file with lots of text and line : * 10 with 10k line * 100 with 100k lines and so on I guess.

I tried to be very surgical on my modifications to reduce footprint and regression risk but the spirit could be more cleanly and deeply integrated.

Regards,
Pascal

@pascalr0410
Copy link
Contributor Author

pascalr0410 commented Jan 7, 2021

I bypassed the test with excel sheet table6 in test file general.xlsx because it could'nt run properly and it is also broken in the current production package.

the test with table7 is buggy too but the test case is also buggy so the result seem good despite everything

In fact there is a by desing problem with the code wich can't properly take in account collumns with no data inside.

Also, all functions used by these tests cases is unrelated with my modifications.

@pascalr0410
Copy link
Contributor Author

Starting Julia...
_
_ _ ()_ | Documentation: https://docs.julialang.org
() | () () |
_ _ | | __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ ` | |
| | |
| | | | (
| | | Version 1.5.2 (2020-09-23)
/ |_'|||_'_| | Official https://julialang.org/ release
|__/ |

julia>

julia>

julia>

julia> using Pkg

julia> Pkg.rm("XLSX")
Updating C:\Users\BC5234\.julia\environments\v1.5\Project.toml
[fdbf4ff8] - XLSX v0.7.3 https://github.com/felipenoris/XLSX.jl.git#master
Updating C:\Users\BC5234\.julia\environments\v1.5\Manifest.toml
[fdbf4ff8] - XLSX v0.7.3 https://github.com/felipenoris/XLSX.jl.git#master

julia> Pkg.add("XLSX")
Updating registry at C:\Users\BC5234\.julia\registries\General
Updating git-repo https://github.com/JuliaRegistries/General.git
Resolving package versions...
Updating C:\Users\BC5234\.julia\environments\v1.5\Project.toml
[fdbf4ff8] + XLSX v0.7.3
Updating C:\Users\BC5234\.julia\environments\v1.5\Manifest.toml
[fdbf4ff8] + XLSX v0.7.3

julia> Pkg.test("XLSX")
Testing XLSX
Status C:\Users\BC5234\AppData\Local\Temp\jl_3Rdt8b\Project.toml
[a93c6f00] DataFrames v0.22.1
[8f5d6c58] EzXML v1.1.0
[bd369af6] Tables v1.2.2
[fdbf4ff8] XLSX v0.7.3
[a5390f91] ZipFile v0.9.3
[ade2ca70] Dates
[de0858da] Printf
[8dfed614] Test
Status C:\Users\BC5234\AppData\Local\Temp\jl_3Rdt8b\Manifest.toml
[56f22d72] Artifacts v1.3.0
[324d7699] CategoricalArrays v0.9.0
[34da2185] Compat v3.23.0
[a8cc5b0e] Crayons v4.0.4
[9a962f9c] DataAPI v1.4.0
[a93c6f00] DataFrames v0.22.1
[864edb3b] DataStructures v0.18.8
[e2d170a0] DataValueInterfaces v1.0.0
[8f5d6c58] EzXML v1.1.0
[59287772] Formatting v0.4.2
[41ab1584] InvertedIndices v1.0.0
[82899510] IteratorInterfaceExtensions v1.0.0
[692b3bcd] JLLWrappers v1.1.3
[682c06a0] JSON v0.21.1
[94ce4f54] Libiconv_jll v1.16.0+7
[e1d29d7a] Missings v0.4.4
[bac558e1] OrderedCollections v1.3.2
[69de0a69] Parsers v1.0.14
[2dfb63ee] PooledArrays v0.5.3
[08abe8d2] PrettyTables v0.10.1
[189a3867] Reexport v0.2.0
[a2af1166] SortingAlgorithms v0.3.1
[856f2bd8] StructTypes v1.1.0
[3783bdb8] TableTraits v1.0.0
[bd369af6] Tables v1.2.2
[fdbf4ff8] XLSX v0.7.3
[02c8fc9c] XML2_jll v2.9.10+3
[a5390f91] ZipFile v0.9.3
[83775a58] Zlib_jll v1.2.11+18
[2a0f44e3] Base64
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[8ba89e20] Distributed
[9fa8497b] Future
[b77e0a4c] InteractiveUtils
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[44cfe95a] Pkg
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays
[10745b16] Statistics
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
Test Summary: | Pass Total
read test files | 22 22
Test Summary: | Pass Total
Cell names | 326 326
Test Summary: | Pass Total
getindex | 12 12
Test Summary: | Pass Total
Time and DateTime | 6 6
Test Summary: | Pass Total
number formats | 16 16
Test Summary: | Pass Total
Defined Names | 24 24
Test Summary: | Pass Total
Book1.xlsx | 22 22
Test Summary: | Pass Total
book_1904_ptbr.xlsx | 9 9
Test Summary: | Pass Total
numbers.xlsx | 75 75
Test Summary: | Pass Total
Column Range | 9 9
Test Summary: | Pass Total
CellRange iterator | 1 1
Test Summary: | Pass Total
Table | 471 471
Test Summary: | Pass Total
Helper functions | 26 26
Test Summary: | Pass Total
Write | 57 57
Test Summary: | Pass Total
Edit Template | 3 3
Test Summary: | Pass Total
addsheet! | 10 10
Test Summary: | Pass Total
Edit | 13 13
Test Summary: | Pass Total
writetable | 106 106
Test Summary: | Pass Total
Styles | 74 74
Test Summary: | Pass Total
filemodes | 200 200
Test Summary: | Pass Total
escape | 84 84
Test Summary: | Pass Total
row_index | 1 1
Test Summary: |
show xlsx | No tests
Test Summary: | Pass Total
relative paths | 5 5
Test Summary: | Pass Total
windows compatibility | 3 3
Test Summary: | Pass Total
whitespace nodes | 8 8
Test Summary: | Pass Total
inlineStr | 20 20
Tables.jl integration: Test Failed at C:\Users\BC5234.julia\packages\XLSX\A7wWu\test\runtests.jl:1803
Expression: length(ct) == 0
Evaluated: 3 == 0
Stacktrace:
[1] top-level scope at C:\Users\BC5234.julia\packages\XLSX\A7wWu\test\runtests.jl:1803
[2] top-level scope at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115
[3] top-level scope at C:\Users\BC5234.julia\packages\XLSX\A7wWu\test\runtests.jl:1771
Test Summary: | Pass Fail Total
Tables.jl integration | 20 1 21
Tables.jl with DataFrames | 4 4
ERROR: LoadError: Some tests did not pass: 20 passed, 1 failed, 0 errored, 0 broken.
in expression starting at C:\Users\BC5234.julia\packages\XLSX\A7wWu\test\runtests.jl:1770
ERROR: Package XLSX errored during testing
Stacktrace:
[1] pkgerror(::String) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Types.jl:52
[2] test(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Operations.jl:1578
[3] test(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:327
[4] test(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:314
[5] #test#61 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:67 [inlined]
[6] test at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:67 [inlined]
[7] #test#60 at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:66 [inlined]
[8] test at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:66 [inlined]
[9] test(::String; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:65
[10] test(::String) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\API.jl:65
[11] top-level scope at none:1

julia>

@codecov
Copy link

codecov bot commented Jan 7, 2021

Codecov Report

Merging #157 (06d8af7) into master (041a541) will increase coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #157      +/-   ##
==========================================
+ Coverage   94.50%   94.64%   +0.13%     
==========================================
  Files          14       14              
  Lines        1601     1605       +4     
==========================================
+ Hits         1513     1519       +6     
+ Misses         88       86       -2     
Impacted Files Coverage Δ
src/types.jl 100.00% <ø> (ø)
src/sst.jl 95.77% <100.00%> (+0.32%) ⬆️
src/stream.jl 94.87% <100.00%> (+0.18%) ⬆️
src/write.jl 94.33% <100.00%> (-0.02%) ⬇️
src/tables_interface.jl 83.33% <0.00%> (+13.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 041a541...927446f. Read the comment docs.

@felipenoris felipenoris merged commit ec100a8 into felipenoris:master Jan 29, 2021
@felipenoris
Copy link
Owner

@pascalr0410 Thanks! This is great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants