Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change compression from LZMA to ZStandart #17

Closed
juarezr opened this issue Oct 29, 2018 · 8 comments
Closed

Change compression from LZMA to ZStandart #17

juarezr opened this issue Oct 29, 2018 · 8 comments

Comments

@juarezr
Copy link
Contributor

juarezr commented Oct 29, 2018

ZStandart has same compression ratios and better decompression time:

https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/

https://commons.apache.org/proper/commons-compress/

@RomanIakovlev
Copy link
Owner

Hi, I've had a quick look at this issue, and it seems the change is a bit more involved than it looks on the surface. ZStandard is a compression algorithm, not an archival format, that is, it can't organize multiple files into a single archive. The approach used now, 7Zip, supports both archival and compression (to be precise, it supports a collection of different compression formats, but not ZStandard).

Now, in order to use ZStandard compression method, it's required to come up with an alternative archival method, probably tar or something similar, to put all individual GeoJSON features into a single file first, and then apply compression to that file.

It's all definitely doable, and might be beneficial for the Timeshape project, if artifact size doesn't increase substantially, and decompression time is decreased, so let's do it! I obviously can't promise any timelines here, but if you're interested in giving it a try, I'll gladly accept a pull request from you. If you're interested in giving it a stab, and if you need guidance on the project setup or any other details about the project, besides what's already written in the README, please ask here, and I'll do my best to provide clarification.

@juarezr
Copy link
Contributor Author

juarezr commented Oct 31, 2018

I did'nt read the source code of timeshape and did'nt know about it.
I have imagined that geojson was compressed first with lzma and then stored with no compression in a folder inside the library jar/zip file.
But now I found that my assumption was false. :)

@RomanIakovlev
Copy link
Owner

Ok, I've tried using ZStandard and I must say I'm positively surprised by its performance! I was expecting it to have a better decompression time, but in practice it also offers better compression time (mostly irrelevant for Timeshape users) and compression rate (5% better). Decompression time, at least on my laptop, has dropped significantly, and now TimeZoneEngine#initialize method takes only 1300 milliseconds for Tar+ZStandard, versus 4300 milliseconds for 7z+LZMA4. This new approach is definitely worth using, so I'll make a release with it.

RomanIakovlev added a commit that referenced this issue Nov 9, 2018
Changes archival and compression methods, setup Travis build and artifact publishing.
@RomanIakovlev
Copy link
Owner

I've just released 2018d.6 that uses ZStandard for an increased compression rate and faster speed. Could you please give it a try and close this issue if all is fine on your side?

@RomanIakovlev
Copy link
Owner

Ok, I'll go ahead and close this one now.

@juarezr
Copy link
Contributor Author

juarezr commented Dec 11, 2018

I will test the new version for getting the improvements.
Thanks for the hard work.

@juarezr
Copy link
Contributor Author

juarezr commented Dec 14, 2018

It worked fine. 👍
Wondering if replacing Protobuff with Capn'Proto can give any performance while querying and if the performace/memory tradeoffs worths.

@RomanIakovlev
Copy link
Owner

Interesting regarding Cap'n Proto, I've created #29 to track this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants