Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] why the zstd don't preserve original file name and modification date? #1402

Closed
stokito opened this issue Nov 3, 2018 · 11 comments
Labels

Comments

@stokito
Copy link
Contributor

stokito commented Nov 3, 2018

Is any reason why this wasn't added? Or maybe there is some plans to implement this?

@gyscos
Copy link
Contributor

gyscos commented Nov 4, 2018

zstd itself only compresses a stream of bytes - the file content. To preserve metadata (among other things), something like tar is a great solution (even for a single file).

@stokito
Copy link
Contributor Author

stokito commented Nov 4, 2018

yes, we can use tar but then to get an original file name and the file permissions we'll need to decompress the archive.tar.zst to archive.tar file.
I added the zstd support to engrampa archive manager but we can't show a real archive content to the user because information about file name is just absent.
gzip and lzop tools optionally can add the meta information so why not to support this in zstd?

@gyscos
Copy link
Contributor

gyscos commented Nov 4, 2018

Note that you can directly extract a .tar.zst by piping zstd and tar, without the need for intermediate file:

$ tar cf - my_file | zstd > my_file.tar.zst
$ zstdcat my_file.tar.zst | tar xf -

@stokito
Copy link
Contributor Author

stokito commented Nov 5, 2018

Yes, but the problem is that to get a real file name, it's perms and modification date we should decompress the whole file.

@gyscos
Copy link
Contributor

gyscos commented Nov 5, 2018

Tar can read metadata and stop the reader, which will stop zstd after decoding the very beginning of the file:

$ zstdcat my_file.tar.zst | tar tvf -

@terrelln
Copy link
Contributor

terrelln commented Nov 5, 2018

Zstd does attempt to preserve permissions and modification date, starting with zstd version 1.1.2, released in December 2016. However, it isn't enabled on all platforms.

If the latest zstd version isn't preserving file permissions for you, please tell us about your platform, and how you're invoking zstd, so we can help debug the issue.

@stokito
Copy link
Contributor Author

stokito commented Nov 6, 2018

Hi @terrelln
Is any way to see those permissions/date without extracting an archive?
On my Ubuntu 18.10 I have zstd v1.3.5 but zstd -lv doesn't shows this information:

$ zstd -lv file.txt.zst 
*** zstd command line interface 64-bits v1.3.7, by Yann Collet ***
file.txt.zst 
# Zstandard Frames: 1
Window Size: 1024.00 KB (1048576 B)
Compressed Size: 687.80 KB (704307 B)
Decompressed Size: 3373.43 KB (3454395 B)
Ratio: 4.9047

@terrelln
Copy link
Contributor

terrelln commented Nov 6, 2018

@stokito zstd doesn't store the permissions in the format, we simply copy the permissions and modification times from file.txt to file.txt.zst, and from file.txt.zst to file.txt. This is the same way that gzip works.

If you want to store the permissions in the format, you'll have to use a tool like tar.

@mgorny
Copy link

mgorny commented Nov 30, 2018

Storing the original filename is not a feature but a design hack that shouldn't be copied. The only reason gzip does that is because it dates back to DOS times with 8.3 names. There you couldn't just append .gz to the existing filename, so you had to do a bigger rename and store the original name somewhere. On modern systems, you can safely append .zst and strip it when decompressing, so there's no need for such ugly hacks anymore.

@cschanzlenist
Copy link

I notice this has been closed for ~3yrs, let me know if I should open a new issue.
Modification times are not being reset to the original file timestamp on Linux (RHEL/CentOS, Fedora).

$ cd /tmp
$ rm foo*
$ cp -a ~/.bashrc foo
$ zstd foo
foo                  : 48.92%   (  8720 =>   4266 bytes, foo.zst)              
$ mv foo{,.orig}
$ unzstd foo.zst 
foo.zst             : 8720 bytes                                               
$ ls -l foo*
-rw-r--r-- 1 testuser div 8720 Oct 19 13:07 foo
-rw-r--r-- 1 testuser div 8720 Sep 22 21:00 foo.orig
-rw-r--r-- 1 testuser div 4266 Oct 19 13:06 foo.zst

[Captain Obvious points out timestamp of foo is the current time (when unzstd ran) and differs from foo.orig.]

Noting 'touch' uses the utimensat system call, I also notice no calls by unzstd when traced with:
$ strace -e trace=utimensat unzstd foo.zst
Glancing at the source, I see utimensat being referred to in UTIL_setFileStat() but I don't see any functions call it.

@felixhandte
Copy link
Contributor

@cschanzlenist, yes, this is a bug that was introduced in v1.5.0: #2739. We've landed a fix that will go out in the next release: #2742.

Thanks for the report and sorry for the churn!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants