Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-r option has 5% worse compression than going through .tar #3412

Closed
mirh opened this issue Jan 6, 2023 · 8 comments
Closed

-r option has 5% worse compression than going through .tar #3412

mirh opened this issue Jan 6, 2023 · 8 comments
Assignees

Comments

@mirh
Copy link

mirh commented Jan 6, 2023

Describe the bug
Title says it all.
I was trying to compress 7GBs of textures with zstd --ultra -22 --long=31 -r folder if it can matter.

Expected behavior
It's not like you could hope zstd becomes an archiver, but once it gets into the whole "concatenation" business it shouldn't loose to the most plain tar.

Desktop (please complete the following information):

  • OS: Windows 10
  • Version 1.5.2 and latest git compiled today
  • Compiler MSVC 2022

Additional context
I guess I could provide my source data too, though it would be a bit convoluted process.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Jan 6, 2023

-r is very different from tar: it produces one independent compressed file for each source file.

In contrast, tar produces a single stream, thus allowing inter-correlations between files. But it's also not possible to decompress just one file : the entire tar stream must be decoded to access a wanted file.

These are very different modes of operation, which have each their use case, though arguably tar is the most commonly found on posix systems.

@mirh
Copy link
Author

mirh commented Jan 6, 2023

Oh, I see.
It's a bit like the difference between non-solid and solid mode in 7z.

Well, if it's by design I wished that was clearer though?

@terrelln
Copy link
Contributor

terrelln commented Jan 6, 2023

Well, if it's by design I wished that was clearer though?

A few versions back we added a warning message if you try to use -r -o output.zst:

zstd: WARNING: all input files will be processed and concatenated into a single output file: output.zst
The concatenated output CANNOT regenerate the original directory tree.

Does that help clarify?

@mirh
Copy link
Author

mirh commented Jan 6, 2023

Not at all? It just says that the directory structure is going to be lost.
In fact, from the bottom of my english proficiency, I would argue that "concatenate" very remotely suggests something akin to tape archival.

@terrelln
Copy link
Contributor

terrelln commented Jan 6, 2023

How about:

zstd: WARNING: all input files will be compressed independently then concatenated into a single output file: output.zst
The concatenated output CANNOT regenerate the original directory tree.

If that still isn't clear to you, I'd love to hear another suggestion!

@mirh
Copy link
Author

mirh commented Jan 9, 2023

Mhh, that's already better, and it goes straight to the point of the implementation detail.
Though I would guess users would also like for the other effect to be spelt out directly.

zstd: all input files will be compressed independently then concatenated into a single output file: output.zst
WARNING: this will NOT retain the original directory tree, and it will result in a higher final size than compressing later the already concatenated files [or compressing the files concatenated beforehand]

@terrelln terrelln self-assigned this Jan 17, 2023
@sedlund
Copy link

sedlund commented Feb 16, 2023

zstd: WARNING: each file will be separately compressed and then combined into a single archive, without utilizing patterns between files to reduce final size or preserving original directory structure.

@mirh
Copy link
Author

mirh commented May 27, 2023

I couldn't see this noted/changed anywhere tbh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants