Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for storing symlinks in tar and zip archives #92

Merged
merged 6 commits into from
Dec 4, 2018
Merged

Add support for storing symlinks in tar and zip archives #92

merged 6 commits into from
Dec 4, 2018

Conversation

jandubois
Copy link
Contributor

Also implement extraction of symlinks from zip archives.

This PR also adds a relative symlink in the testdata directory. It passes all tests on OS X, but has not been tested on Windows.

This PR is a superset of changes from the following issues and PRs:

Fixes #21
Fixes #31
Fixes #60
Fixes #74

@mholt
Copy link
Owner

mholt commented Oct 2, 2018

Nice! Looking forward to going through this, unless another collaborator beats me to it.

@johnarok
Copy link
Contributor

Adding my notes from testing using the archiver cli

Windows to Windows, Zip format -> Symlinks are not preserved
Windows to Linux, Zip format -> Symlinks are not preserved
Windows to Windows, Tar format -> Symlinks are preserved (breaks symlink with absolute path)
Windows to Linux, Tar format -> Symlinks are preserved (breaks symlink with absolute path)

lrwxrwxrwx 1 user01 user01 29 Oct 23 20:42 hello-link -> 'testdata\hello-dir\hello-link'

Linux to Linux, Zip format -> Symlinks are preserved (retains relative path)
Linux to Windows, Zip format -> Symlinks are not preserved
Linux to Linux, Tar format > Symlinks are preserved (retains relative path)
Linux to Windows, Tar format -> Symlinks are preserved (retains relative path)

@jandubois
Copy link
Contributor Author

Thanks for your comprehensive tests, @johnarok!

I'm travelling right now and only have access to a Macbook for the next 2 weeks, so can't do any Windows testing myself. I assume some of the failures are due to not normalizing the symlink paths to always use forward slashes for the path separator. I added a commit that calls filepath.ToSlash on all symlinks before storing them into an archive.

If you have the time, could you redo your tests with the updated PR?

I'm somewhat surprised that the appveyor tests didn't see the windows-to-windows failures you were seeing; did the tests pass for you on Windows? That would mean that the tests are not comprehensive enough. The only explanation I could see is if git didn't re-create the symlinks in the testdata directory.

@johnarok
Copy link
Contributor

Hi! @jandubois , so the windows tests indeed pass, the reason being a git clone on windows (git version 2.16.2.windows.1 and Windows 10 ) creates the symlink files as normal files. I am hoping at the end of the PR, we can add a test case to workaround this.

I retested by manually creating the symbolic link for the proverb3.txt file.

mklink proverb3.txt .\proverbs\extra\proverb3.txt

Below is what I observed this time

Windows to Windows, Zip format -> Symlinks are not preserved (they appear as normal files)
Windows to Linux, Zip format -> Symlinks are not preserved, Archiver CLI fails.

./archiver open testdata-windows.zip 
/home/user01/Projects/go/gowork/src/github.com/mholt/archiver/test/testdata/proverb3.txt: making symbolic link for: symlink  /home/user01/Projects/go/gowork/src/github.com/mholt/archiver/test/testdata/proverb3.txt: no such file or directory

Windows to Windows, Tar format -> Symlinks are preserved (breaks symlink with incorrect path)

10/24/2018  10:46 PM    <SYMLINK>      proverb3.txt [testdata\proverb3.txt]

Windows to Linux, Tar format -> Symlinks are preserved (breaks symlink with incorrect path)

lrwxrwxrwx 1 user01 user01   21 Oct 24 22:53 proverb3.txt -> 'testdata\proverb3.txt'

Linux to Linux, Zip format -> Symlinks are preserved (retains relative path)
Linux to Windows, Zip format -> Symlinks are not preserved (they appear as normal files)
Linux to Linux, Tar format > Symlinks are preserved (retains relative path)
Linux to Windows, Tar format -> Symlinks are preserved (retains relative path)

@jandubois
Copy link
Contributor Author

Hi @johnarok! I think I will need access to my Windows VMs to investigate this further, so there won't be any further activity from me for the next 7-10 days. I'll get back to you once I actually have run this on Windows myself.

Just for clarification, what are the expectations for older Windows versions that don't have proper symlink support? Or for versions that require admin permissions to create symlinks when the tests are being run as a normal user?

@johnarok
Copy link
Contributor

johnarok commented Nov 4, 2018

Hi @jandubois, I did some cursory research on the state of symlinks for windows in other projects, links here.

What are the expectations for older Windows versions that don't have proper symlink support?
IMO, We should fail safe, that is error out if we cannot create the symlink and potentially allow the user to explicitly force via a FLAG or ENV, to fallback - restore the files as normal files. /cc @mholt for his opinion on this. This would be different from what git is doing, which is fall back by default.

Or for versions that require admin permissions to create symlinks when the tests are being run as a normal user?
IMO, The tests are run as administrator in appveyor, so symlinks should work, but we need add a step to appveyor to set git config core.symlinks=true to allow the test data to be cloned correctly i.e create the links. I was able to get this behavior on Windows 10.

Also, retested it with 1.11.2 go version and the results did not change much. I have uploaded the artifacts here for reference.

@mholt
Copy link
Owner

mholt commented Nov 7, 2018

:-/ Well this complicated, isn't it. Thanks for the work, both of you!

I'd like to see this get to a state where it can be merged, but my PR #99 will have to come first, and it does not address the symlink issues, which means that this PR will have to be rebased -- fortunately, I think this PR should be easier to rebase.

FWIW the new code in #99 will allow the user to decide whether they want to continue on errors, so it's OK if there are errors, it doesn't have to stop the operation.

As for older versions of Windows -- I say drop support. I'm not thrilled with supporting old OSes. Especially when it makes things simpler.

So, I might be a little hands-off on this PR -- but don't let me slow you down. Go ahead and both of you do what you think is best and let's merge it in after my rewrite is finalized. In the meantime, you might want to rebase this fork off my rewrite branch...

@mholt
Copy link
Owner

mholt commented Nov 9, 2018

Okay, the rewrite is done! If you want to update this PR, go for it.

Also implement extraction of symlinks from zip archives.
@jandubois
Copy link
Contributor Author

@mholt I've updated this PR on top of latest master, but haven't gotten around to looking into the Windows issues yet.

@mholt
Copy link
Owner

mholt commented Nov 20, 2018

Sounds good. Keep us posted! :)

@jandubois
Copy link
Contributor Author

@johnarok I cannot reproduce your test failures. I'm also somewhat confused that your re-testing with the updated PR still showed a symlink with backslashes in them; that should have been fixed by that time.

Anyways, with the current state of the PR I used a Windows 10 x64 VM with the latest Windows updates, and I've enabled "Developer Mode" to be able to create symlinks without getting UAC prompts.

I installed these 2 packages to get git and go:

https://github.com/git-for-windows/git/releases/download/v2.19.1.windows.1/Git-2.19.1-64-bit.exe
https://dl.google.com/go/go1.11.2.windows-amd64.msi

I enabled git config core.symlinks true and switched to my PR branch. The directory has the expected symlinks:

 Directory of C:\Users\Jan\go\src\github.com\jandubois\archiver\testdata

11/20/2018  12:47 PM    <DIR>          .
11/20/2018  12:47 PM    <DIR>          ..
11/20/2018  12:41 PM             8,944 already-compressed.jpg
11/20/2018  12:47 PM    <SYMLINK>      exist [C:\target\does\not\exist]
11/20/2018  12:46 PM    <SYMLINK>      proverb3.txt [proverbs\extra\proverb3.txt]
11/20/2018  12:41 PM    <DIR>          proverbs
11/20/2018  12:41 PM                59 quote1.txt

All tests pass (I've temporarily patched cmd/arc/main.go to use jandubois/archiver for testing):

C:\Users\Jan\go\src\github.com\jandubois\archiver>go test
PASS
ok      github.com/jandubois/archiver   0.840s

I created 2 test archives:

arc archive foo.tar.gz testdata
arc archive foo.zip testdata

Both archives could be unpacked with arc again, both on Windows and on OS X, with the symlinks intact.

I could also unpack them on OS X with tar xfvz and unzip, again with the symlinks intact.

What did not work was unpacking them on Windows using the corresponding MinGW tools. I assume they don't support symlinks, or at least not by default. I believe their limitation is irrelevant to this PR.

I've also created the same tarball and zip files using arc on OS X and successfully unpacked them on Windows with the symlinks intact.

To me this means that the code is supporting symlinks on Windows as intended.

@johnarok Please provide more detailed instructions to reproduce your failures, if you believe they still need to be addressed!

The only open question in my mind is if the drive specified should be stripped from a full path symlink, i.e. store /target/does/not/exist instead of C:/target/does/not/exist. I feel that stripping the volume id is the right thing to do, but can't really provide a strong rationale for it.

Unrelated observation:

When using unzip foo.zip on OS X, using the zip file created on Windows, the proverbs/ subdirectory has the wrong permissions (world writable):

drwxrwxrwx  5 jan  staff   160 20 Nov 12:41 proverbs

This does not happen with the zip file created on OS X, and doesn't happen with either zip file when using arc unarchive:

drwxr-xr-x  5 jan  staff   160 30 Sep 11:24 proverbs

I haven't looked further into it, as it is completely unrelated to the symlink code. It should probably get its own Github issue, but maybe you can double-check this when verifying this latest PR version?

@jandubois
Copy link
Contributor Author

Out of curiosity I just disabled "Developer Mode" and ran the tests again:

C:\Users\Jan\go\src\github.com\jandubois\archiver>go test
--- FAIL: TestArchiveUnarchive (0.02s)
    archiver_test.go:280: [zip] extracting archive [C:\Users\Jan\AppData\Local\Temp\archiver_test364499367\archiver_test.zip -> C:\Users\Jan\AppData\Local\Temp\archiver_test364499367\extraction_test_zip]: didn't expect an error, but got: reading file in zip archive: C:\Users\Jan\AppData\Local\Temp\archiver_test364499367\extraction_test_zip\testdata\exist: making symbolic link for: symlink C:\target\does\not\exist C:\Users\Jan\AppData\Local\Temp\archiver_test364499367\extraction_test_zip\testdata\exist: A required privilege is not held by the client.
FAIL
exit status 1
FAIL    github.com/jandubois/archiver   0.572s

That seems fine to me as well...

@johnarok
Copy link
Contributor

@jandubois, thank you for validating, I did not use the developer mode. I will pull the latest and test again to confirm - hopefully next week after the holidays.

@mholt
Copy link
Owner

mholt commented Nov 25, 2018

Something to consider: should how symlinks are handled be configurable? It's possible now. Maybe some users want to preserve them, while others want to follow/dereference them...

@johnarok
Copy link
Contributor

johnarok commented Dec 3, 2018

@jandubois I have to admit, testing with the latest pull is working perfect for all combinations both with dep and go mod. Could you please update the Appveyor.yml file with below snippet in the beginning.

# set git config to clone with symlinks enabled on windows
init:
  - git config --global core.symlinks true

On preferences, I prefer default to be the current behavior, fail when proper access is not available to create symlinks. Follow or De Reference could be the modifier.. thoughts?

@jandubois
Copy link
Contributor Author

@johnarok I've added the appveyor config you requested. Tests are still passing, but I can't tell if appveyor actually does support symlinks or not (they may be running a version of git that ignores that option).

My thoughts on making symlink support configurable:

  • For extraction I don't see any need for configuration: if the archive contains a symlink, it should be extracted as a symlink. If this isn't possible, then extraction should fail. The option to skip errors should be enough to deal with this.

  • For creating archives, having an option to follow symlinks vs. storing them verbatim makes sense. The question then becomes: "What is the default behavior?"

I agree with @johnarok that the current behavior implemented by this PR is the desirable default. Only concern is that previously creating ZIP archives would have followed symlinks (I think, I haven't tested), and maintaining backwards compatibility is a good thing.

For TAR archives the code before this PR used to store the full path to the original file as a symlink, which is totally broken and not worth preserving.

My preferred way forward would be to switch the version to 4.0 and change the default mode for ZIP archive creation.

Adding an option for following symlinks is independent functionality and should be requested and/or implemented in a separate issue/pr.

@mholt
Copy link
Owner

mholt commented Dec 4, 2018

@jandubois I think I agree with almost all of what you've proposed!

The archiver.go file has two functions which I kept from the old versions:

As you can see there, the only references to those functions are in writing tar archives. I'm not sure if they're handled correctly, I haven't tested them with the v3 code that we currently have.

In short, I'd love to see this package brought up to speed with a standard/consistent way of handling symlinks (and hard links maybe? doesn't have to be in this PR) for Zip and Tar types (and extracting them from Rar, I suppose).

So far, I think we've decided:

  • Handling of symlinks should be configurable. This is now possible as a struct field on the relevant types: Zip and Tar, mostly. It would only apply to creation?

  • Symlinks contained in archives should be extracted as symlinks, at least as the default behavior. Makes sense to me.

Not too concerned about backwards compatibility. I think we pretty much broke symlink support with v3, so I don't think that skipping or adding it would be breaking either way. We might be able to keep it in the v3 tree.

So, with that said, how should we proceed on this PR? What else needs to be done?

@jandubois
Copy link
Contributor Author

This PR uses writeNewSymbolicLink for both TAR and ZIP files consistently, so I think that is already "up to speed".

I'm not sure if preserving hard links is an important goal. It is not supported by ZIP at all, afaik.

Even with tarballs it is a problematic feature, e.g. you cannot selectively extract a later hardlink unless you have extracted the file under the first hardlink name already. Hardlink support may also be platform specific; I don't know if there is module that provides a device/inode view of Windows filesystems.

So to me it feels somewhat at odds with the project value of

It is not meant to be a replacement for specific archive format tools like tar, zip, etc. that have lots of features and customizability.

Either way, supporting hardlinks would be completely different from softlinks, so should be a separate issue. Personally I feel it is not worth the effort.

So, with that said, how should we proceed on this PR? What else needs to be done?

If you are fine with making the current behavior the default, then I think this PR should be ready for merging, and a new Github issue should be opened to add an option to dereference symbolic links while creating archives.

I'm personally not interested in that feature, so probably wouldn't want to implement it: the change itself is rather straight-forward, but I'm not sure how you would want to re-arrange the tests to accommodate testing different option settings. Once the test framework for this is in place, I would be happy to add the conditionals to the actual code itself though. 😄

@mholt
Copy link
Owner

mholt commented Dec 4, 2018

Excellent. You're right, hard links are probably not a priority at all, and maybe should even be removed entirely from the code. As for this PR currently, I will begin a review now.

Just to be sure I understand clearly going into this: this PR archives symlinks as symlinks, and does not follow them. (Right?)

Copy link
Owner

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple initial questions from my high-level pass through the code. Tests look good. Most things look good, I just have a couple questions about code organization and understanding symlink support in zip files.

zip.go Outdated
@@ -186,6 +186,15 @@ func (z *Zip) extractNext(to string) error {
if !ok {
return fmt.Errorf("expected header to be zip.FileHeader but was %T", f.Header)
}
if (header.FileInfo().Mode() & os.ModeSymlink) != 0 {
buffer := make([]byte, header.FileInfo().Size())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be very different from how it works in tar archives; while I know that this PR writes the target filepath as the symlink's contents, is it safe to assume that every zip archiver will do that? Is there risk here of reading in a potentially huge file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure; I simply went by the symlink code in archive/zip test: https://github.com/golang/go/blob/b0a53d2/src/archive/zip/writer_test.go#L54-L59

I believe this is what Info-ZIP refers to as the "generic symlink support" in the later versions. Earlier versions (and PKZIP) had platform specific extensions for symlinks (so symlinks archived on one platform could not be extracted on another one), but I think those have been abandoned.

I'm not sure if this is necessary, but I can add a check for some reasonable maximum path length, if you prefer. If you want that, what should the limit be? 2^16 seems to be way beyond what I would expect to see for real pathnames, but small enough not to cause memory issues...

Copy link
Owner

@mholt mholt Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I suppose this is OK then (no need for size check at this time), but I would probably want to move this logic to the z.extractFile method, if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the logic into extractNext because it already had done the type assertion on header, but you are right, it must be in extractFile, otherwise Extract will not be able to extract symlinks. I'll do this (and duplicate the header assertion into extractFile).

Unfortunately this points out that test coverage of the Extract method is lacking. 😦

zip.go Outdated Show resolved Hide resolved
Copy link
Owner

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I think we can move one chunk of code into another function for a little better organization.

zip.go Outdated
@@ -186,6 +186,15 @@ func (z *Zip) extractNext(to string) error {
if !ok {
return fmt.Errorf("expected header to be zip.FileHeader but was %T", f.Header)
}
if (header.FileInfo().Mode() & os.ModeSymlink) != 0 {
buffer := make([]byte, header.FileInfo().Size())
Copy link
Owner

@mholt mholt Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I suppose this is OK then (no need for size check at this time), but I would probably want to move this logic to the z.extractFile method, if possible.

jandubois and others added 3 commits December 3, 2018 23:05
This is necessary so that calls from Extract() to extractFile()
will handle symlinks properly as well.
tar.go Outdated
@@ -338,7 +340,16 @@ func (t *Tar) Write(f File) error {
return fmt.Errorf("missing file name")
}

hdr, err := tar.FileInfoHeader(f, f.Name())
filename := f.Name()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, previous commits in this change had left the filename (previously called linkTarget) empty for non-symlinks, which I think was probably a bug. Should be fixed now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was intentional, the value is only used for symlinks (which is why I renamed it to linkTarget to make it more obvious), so the empty string in the non-symlink case would be appropriate: https://github.com/golang/go/blob/159797a/src/archive/tar/common.go#L646-L648

So I would prefer not to set it to a value that doesn't make any sense (this was the original bug, that the link was set to the source filename and not the link target). It is just confusing when you read the source and wonder why the field is set to this value.

But as I said, the value is unused for non-symlinks, so it doesn't really matter at runtime, but I would prefer to undo this change.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. Pushed a commit that fixes that. Thanks. Am ready to merge if you are!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am ready to merge if you are!

Sure, go ahead and let's close a bunch of open issues! 😄

I assume you will take care of filing issues for things discovered in this PR, if you feel they should be addressed at some point:

  • Add option to dereference symlinks before adding to archive

  • Add test coverage for Extract method

  • (maybe) Remove vestigial hardlink support

  • (maybe) Investigate weird permissions issue with ZIP files created on Windows

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think those are all suitable for other issues and PRs. Thanks again!

@mholt
Copy link
Owner

mholt commented Dec 4, 2018

Thanks for the work! I took the liberty of finishing up a few minor changes. Tests are passing; let me know what you think, and if it works for you on your symlink-containing archives.

@mholt mholt merged commit 05009c5 into mholt:master Dec 4, 2018
@jandubois jandubois deleted the symlinks branch December 4, 2018 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants