Shared string table should be faster
use consisten name for benchmark xlsx files
Merge branch 'master' of git://github.com/randym/axlsx
shared string should be faster than non-shared string serialization
What is happening here?
Isn't it better/faster to use real hashes (String#hash / equivalents). Just appending a few strings seems very inefficient/hackish ;)
I think it's better to include new benchmarks which do not duplicate the rows
@ochko grab me tomorrow please. I need to understand what you are trying to do here.
put only plain string cells in shared string table
Dealing with custom styles in shared string table is overhead.
Because there will be not so many custom styles in program generated sheets.
One of my sheets had over 50,000 string cells, but there is only 103 unique strings.
Why not use cell.value.hash?
That will be done implicitly when object is put in a hash.
That's true, but benchmarking shows inserting hashes is faster: https://gist.github.com/2318872
Which is weird :)
Worth to know. It seems putting hash of object straight in hash key helps because maybe it cuts one step in internals of Hash.
Note: wrote without any reference into actual source code of Hash.
It'll be nice if someone can test this parial shared string table thing on other versions of Spreadsheet softwares.
I tested it on Latest version of MS Office and OpenOffice on Mac OS X.
Merging this - with the arrogance that we can fix it if anything goes wrong!