Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hardlinks pointing to itself should be skipped on extraction #1381

Closed
mathstuf opened this issue May 22, 2020 · 5 comments
Closed

hardlinks pointing to itself should be skipped on extraction #1381

mathstuf opened this issue May 22, 2020 · 5 comments

Comments

@mathstuf
Copy link

Tarball: https://projects.horms.net/projects/kexec/kexec-tools/kexec-tools-2.0.17.tar.xz
MD5: f72c11e3bd80de23cae144ce8683d96b

The tarball seems to have some file entries out-of-order or something (tar tf doesn't show directories?):

openat(AT_FDCWD, "kexec-tools-2.0.17/purgatory/arch/i386/entry32-16.S", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0640) = -1 ENOENT (No such file or directory)
stat("kexec-tools-2.0.17/purgatory/arch/i386", 0x7ffdf8be2110) = -1 ENOENT (No such file or directory)
stat("kexec-tools-2.0.17/purgatory/arch", {st_mode=S_IFDIR|0750, st_size=20, ...}) = 0
mkdir("kexec-tools-2.0.17/purgatory/arch/i386", 0750) = 0

Originally reported to CMake here: https://discourse.cmake.org/t/externalproject-add-fails-to-extract-kexec-tools-2-0-17-tar-xz/1251

@mmatuska
Copy link
Member

The tar archive contains hardlinks pointing to themselves which is nonsense. GNU Tar seems to simply ignore such entries. I am going to contact the author and get some info how this archive was created.
@jsonn @kientzle any opinion on this?

@kientzle
Copy link
Contributor

First, I'll point out that the system call trace above is perfectly normal. Libarchive optimistically assumes the directory already exists (it usually does). If it can't create the file then it will do the time-consuming check to see if the directory already exists and create it if necessary. The series of calls above is exactly what I would expect to see in that case. (Tar format does not require that directories be separately recorded at all; the dearchiver is expected to create any directories as needed.)

Hard links pointing to themselves is indeed nonsense. No sane filesystem can represent such an object. I think it would probably make the most sense to skip and warn about such entries.

I'd also be curious how this archive was created. I can't really think of how this would happen unless the filesystem was being modified during the archiving process in some way that managed to confuse the hardlink tracking logic.

@mmatuska mmatuska changed the title out-of-order tarballs cannot be extracted hardlinks pointing to itself should be skipped on extraction May 25, 2020
@mmatuska
Copy link
Member

mmatuska commented May 25, 2020

@kientzle I have not response from the author and have prepared a patch that detects this early in _archive_write_disk_header().
Should we skip this silently like GNU tar does or return ARCHIVE_WARN with an error message like ""Skipping hardlink pointing to itself: %s"?

@mmatuska
Copy link
Member

@kientzle @jsonn I can now reproduce the behavior with GNU tar:

$ echo 1234 > file
$ tar -cf test.tar file file
$ tar -tvf test.tar
-rw-rw-r-- root/root             5 2020-05-25 17:03 file
hrw-rw-r-- root/root             0 2020-05-25 17:03 file link to file

The second file is stored as a hard link to the first file, pointing to the same path.

@kientzle
Copy link
Contributor

Oh. That. Hmmm.... I vaguely recall some discussion about this years ago on the GNU tar mailing list. I think this occurs mostly when someone specifies a directory to be backed up and a list of files to back up as well:

$ mkdir d
$ touch d/a d/b d/c
$ tar -cf d d/a d/b

The program creating the archive can make this less likely by only generating hard link entries when the on-disk link count is greater than one. Libarchive used to do this. However, I don't think there's any strategy on the archiving side that can completely avoid this. (Consider the above where one or more files are actually hard linked to some totally separate directory.)

Putting in a warning for the case where the link target is literally the exact same string would be good. I suspect that would suffice to address the most common cases of this.

There are probably many ways to generate link entries that cannot be restored because they technically are self-referential. Maybe some of these would do it?

$ tar -cf test.tar file ./file subdir/../file /full/path/to/file ../thisdir/file

It might be quite hard to solve this in any very robust way. I suspect you could achieve it by being very clever about how you handle failure when trying to restore hard links, but I'm not sure it's worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants