Skip to content

Tiles are not compressed by libzim #47

@benoit74

Description

@benoit74

From openzim/zim-requests#1309 (comment)

also, I do not know how much does this come from different compressionm algo or deduplication, but i compressed the extracted folder with dwarfs (that is how i usually store it, it is basically squashfs with deduplication), and it went from 3.3 GiB zim -> 3.3 GiB (dir) -> 1.8 GiB dwarfs file (zstd compression level 11). just to see if this was because of any depdup, i also tried to do tar.zst (default compression of zstd - level 3), and got 2 GiB (1.8 vs 2 might be because zstd was running at higher compress level). So was the zim in this case just not compressed at all? usually (for example for wiki zims), my compressions reduce size by 10-15% as compared to zims, and then it is reasonable, as zim is meant to be operated in some compute constrained devices, and a bit larger size due to lower compressed level is easier to compute and stream data from.

libzim should indeed try to compress these files, this is kinda weird, at least unexpected, and probably a bug. tbc.

Regarding compression level + algo used, libzim indeed makes a compromise when it comes to this, and while we can probably debate again about this, this is not something we can change at scraper level. Feel free to open an issue in https://github.com/openzim/libzim if you feel like you have sufficiently sound arguments to justify a change, not something I'm confident with given my technical limitations (and the implications such a change would have on the whole ecosystem) ^^

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions