New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decompressing individual .gz files in a directory (compression without tar envelope) #5
Comments
Clarification: The ultimate feature would be something like the venerable https://www.freshports.org/sysutils/fusefs-gunzip/ (which disappeared from the face of Earth). |
The gzip file format can optionally contain the original file name (look for "FNAME" in the https://datatracker.ietf.org/doc/html/rfc1952 gzip specification). If a .gz file has it, fuse-archive will use it. Otherwise libarchive will fall back to "data" as a default: https://github.com/libarchive/libarchive/blob/6d56dfd6ef13625561da83c605d2a12cb146088c/libarchive/archive_read_support_format_raw.c#L119 Maybe try
It is intentional to decompress the whole thing. Not to see if another gzip stream might follow, but to determine the decompressed size (which fuse-archive would need to show if we did a
I'm not super-excited about adding that option. For your original use case, I'd probably use symlinks.
Sorry, but the answers are mostly "no". |
Thank you for your time, it all makes sense. Have a great day! |
Hi @pspacek Maybe You can use I hope that I could helped You. |
Hello,
this is more a question than issue. First, let me describe my use-case:
I have a directory full of .gz files with vastly differing sizes - from 159 bytes to 4.4 GiB per file:
The use-case would be to "mount" the source directory and then transparently decompress .gz files in it, so I can run an utility on it which requires seeking in the files (and thus cannot simply use gzip output piped in via stdin).
This is not supported (I did not expect it to work, but was curious about error handling :-)):
To my surprise it somewhat works with a single .gz file:
File sizes & also md5 sums are all okay:
Okay cool, so at the first glance it seems I should use your software as it is - just loop through list of files and mount each file separately. Bunch of symlinks would then solve naming etc.
Nevertheless, I have couple questions for you:
gzip
just does not output it?):Another randomly selected .gz file contains file named
data
, which is unrelated to the original file namenet.txt.gz
.ls
operation on the mount takes ages - time comparable to decompressing the whole archive, presumably because it looks for end of first gzip stream to see if another gzip stream might follow. Is this intentional? Would you be willing to add an option to treat gz files as one-item archives and thus make initial listing fast?Would you be interested in an option which uses the original file name without .gz suffix for names in the mount?
And finally, assuming answers above were mostly "yes", are you interested in more complete feature request description to mount directories and transparently decompress files in them? I could write it down if it is not waste of time.
Thank you very much for your work on this project!
BTW you did impressive work on decompression speed (or selection or decompressor): This mount-hack decompresses the file like 4x faster than stock
gzip
andpigz
!The text was updated successfully, but these errors were encountered: