New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: sanitize the FileHeader.Name to remove path traversal ("../../") from zip files? #25849

Open
bradfitz opened this Issue Jun 12, 2018 · 19 comments

Comments

Projects
None yet
10 participants
@bradfitz
Member

bradfitz commented Jun 12, 2018

Go isn't directly affected by path traversal attacks in archives but programs written in Go might be.

In particular, if a Go program reads a malicious zip file, the archive/zip package will return the malicious filename in the FileHeader.FileName.

Perhaps we should sanitize it?

If we really need the raw/unsanitized version, we could copy it to a new field FileHeader.InsecureFileName or RawFilename or something.

/cc @dsnet @ianlancetaylor @FiloSottile @andybons @rsc

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 12, 2018

/cc @mholt too, author of github.com/mholt/archiver (pointed out by @dsnet), which properly sanitizes paths on its own. But maybe that shouldn't be its job.

@dsnet

This comment has been minimized.

Member

dsnet commented Jun 12, 2018

I'm of the opinion that it's the users responsibility to sanitize the path, as I have seen some crazy (and very rare) use-cases in the past actually depending on this behavior. We should document this though.

As a data point, a popular package that wraps the standard archive package is github.com/mholt/archiver, which handles it properly.

@dsnet

This comment has been minimized.

Member

dsnet commented Jun 12, 2018

Hmmm, as a counter-point, it seems that the logic in archiver was added precisely because of zip-slip:
mholt/archiver#65

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 12, 2018

I think we should be safe by default but provide access to the unsanitized value for the "crazy" use cases you mention.

@dsnet

This comment has been minimized.

Member

dsnet commented Jun 12, 2018

One proposal is to return a distinguishable error when an invalid path is encountered, but still parse out (or write) the full header. Odd use cases can check for that specific semantic error and ignore it, but most users will be protected.

@FiloSottile

This comment has been minimized.

Member

FiloSottile commented Jun 12, 2018

+1 on sanitizing FileName and adding InsecureFileName. Looks like this applies to archive/tar as well. We'll also have to make absolute paths relative.

However, note that it's not enough to prevent all path traversal attacks. For example, allowing symlinks with arbitrary targets (like ../../) in a tar archive can still results in an escape even if filenames are clean. Sanitizing symlink targets might break more things than filenames.

See https://blog.filippo.io/so-i-lost-the-password-of-my-nas/ section 3 for example.

@FiloSottile

This comment has been minimized.

Member

FiloSottile commented Jun 12, 2018

And it was already reported against github.com/mholt/archiver: mholt/archiver#65 (comment)

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 12, 2018

One proposal is to return a distinguishable error when an invalid path is encountered, but still parse out (or write) the full header.

I like that.

It'd mean only adding a new error variable (or type) to the package, and some docs on two functions.

However, note that it's not enough to prevent all path traversal attacks. For example, allowing symlinks with arbitrary targets (like ../../) in a tar archive can still results in an escape even if filenames are clean. Sanitizing symlink targets might break more things than filenames.

This is more work, but not terrible. We could track them all and still detect when another name would reference a path element with a link. Without that, though, fortunately the number of affected callers is probably fewer because you need to go out of your way to support symlinks. If the caller program is, say, a webapp letting users upload a zip of JPEGs, they probably didn't add in the symlink creation support.

Let's start with some more documentation for Go 1.11.

@gopherbot

This comment has been minimized.

gopherbot commented Jun 12, 2018

Change https://golang.org/cl/118335 mentions this issue: archive/zip: warn about FileHeader.Name being unvalidated on read

@slrz

This comment has been minimized.

slrz commented Jun 12, 2018

+1 on sanitizing FileName and adding InsecureFileName. Looks like this applies to archive/tar as well. We'll also have to make absolute paths relative.

I'm not comfortable with doing this for archive/tar. Absolute pathnames are common usage there and writing to /etc/passwd is often intended. Returning an error where there was none before is going to break existing programs.

@mholt

This comment has been minimized.

mholt commented Jun 12, 2018

I probably should not have open-sourced my archiver package, since I only use it to compress, and only in non-adversarial environments. Sorry if it caused any trouble. (I don't have much time to maintain it lately, so if anyone wants to do so, I'll make you a collaborator.)

Thanks for thinking of this aspect of security, though, from the standard library perspective. Admittedly it's hard to do path sanitization right, since there are some use cases, as others have mentioned, where "unsafe" sanitization may be desired. And I keep getting path cleanup wrong myself (did at least 3 times in Caddy too).

Whatever you decide, I vote for sane defaults with the option to shoot yourself in the foot if need be, with strongly-worded warnings in the godocs. :)

gopherbot pushed a commit that referenced this issue Jun 13, 2018

archive/zip: warn about FileHeader.Name being unvalidated on read
Updates #25849

Change-Id: I09ee928b462ab538a9d38c4e317eaeb8856919f2
Reviewed-on: https://go-review.googlesource.com/118335
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>

dna2github added a commit to dna2fork/go that referenced this issue Jun 14, 2018

archive/zip: warn about FileHeader.Name being unvalidated on read
Updates golang#25849

Change-Id: I09ee928b462ab538a9d38c4e317eaeb8856919f2
Reviewed-on: https://go-review.googlesource.com/118335
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Jun 19, 2018

@odeke-em

This comment has been minimized.

Member

odeke-em commented Jun 22, 2018

/cc @mikesamuel too

@mikesamuel

This comment has been minimized.

Contributor

mikesamuel commented Jun 22, 2018

It seems that there are 2 use cases:

  • We have a tarball from a source that we trust with arbitrary current-user file-system access
  • We have a tarball of which we are suspicious but we can interact with carefully as long as it's self-contained.

IIUC, it seems rare that part of a tarball would be trusted more than another part.

If so, we probably know this when we open the tarball.

Instead of adding more file information to individual entries, could we treat this as an optional is self contained check?

An individual entry might be treated as invalid if the tarball is supposed to be self contained and either

  • unpacking the tarball into an empty subdirectory would place the entries content out of the subdirectory or
  • an entry has the symlink bit set and unpacking the entry would cause the referent (whether it currently exists) to be outside the directory.
@mikesamuel

This comment has been minimized.

Contributor

mikesamuel commented Jun 25, 2018

@odeke-em

How hard is the symlink problem to disentangle?

Resolving symlink targets might require isDirectory checks so it might require accessing multiple entries to tell whether a symlink entry reaches outside the directory to which entry . refers.

That means that you have to detect and reject symlink cycles.

Are isDirectory checks the only kind of transitive analysis required?

isDirectory becomes easier to reason about if you also assume that symlink targets that don't map to any entry are not directories. Otherwise, you might have to do a worst-case analysis.

I did some experimenting with symlink chains below. I think this means that we can do self-containedness checks even where an attacker can cause two or more tarballs to untar into the same containing directory since an attacker doesn't defeat any analysis by constructing a symlink chain where even elements are from one tarball and odd from another.

A symlink chain that reaches past `.` must have one link that reaches past `.`

Question: Given a tarball with a symlink structure like the below, would untarring into ${UNTAR_DIR} cause cat ${UNTAR_DIR/c} to access ${UNTAR_DIR}/../../etc/shadow?

a/a/a/a/a -> ../../../etc/shadow
b -> a/a/a/a/
c -> b/a

The answer seems to be no.

$ mkdir /tmp/ln-exp; \
  cd /tmp/ln-exp/; \
  mkdir -p a/a/a/a/; \
  mkdir a/etc/; \
  echo 'self-contained' > a/etc/shadow; \
  cd a/a/a/a; \
  ln -s ../../../etc/shadow a; \
  cd -; \
  ln -s a/a/a/a/ b; \
  ln -s b/a c; \
  cat c
self-contained
$ find . \( -type l -o -type f \) -exec ls -l {} \;
-rw-r--r--  1 msamuel  wheel  15 Jun 25 14:40 ./a/etc/shadow
lrwxr-xr-x  1 msamuel  wheel  19 Jun 25 14:40 ./a/a/a/a/a -> ../../../etc/shadow
lrwxr-xr-x  1 msamuel  wheel  3 Jun 25 14:40 ./c -> b/a
lrwxr-xr-x  1 msamuel  wheel  8 Jun 25 14:40 ./b -> a/a/a/a/

@rsc rsc added early-in-cycle and removed release-blocker labels Jun 25, 2018

@rsc rsc modified the milestones: Go1.11, Go1.12 Jun 25, 2018

@rsc

This comment has been minimized.

Contributor

rsc commented Jun 25, 2018

I'd like to see an answer that covers archive/zip, archive/tar, and anything else in a consistent way. We can take our time. This is a very very very old "problem".

@mikesamuel

This comment has been minimized.

Contributor

mikesamuel commented Jun 27, 2018

How about this:

archive/tar

  • fork NewReader into
    • NewReader which creates a reader with an internal bit set indicating it should be self contained
    • NewTrustedArchiveReader which creates a reader with that bit unset.
  • Change func (tr *Reader) Read(b []byte) (int, error) so that it reports an error for any entry that violates self containment as defined in #25849 (comment)
  • Make sure that the Sys() is nil for anyos.FileInfo from FileInfo() for an entry that violates self-containment. Alternatively, produce an error.

archive/zip

  • same as archive/tar
  • fork other Open methods and/or possibly associate a trustedness bit with File
@rsc

This comment has been minimized.

Contributor

rsc commented Sep 26, 2018

Separate "NewTrustedArchiveReader" etc seems like overkill. A new InsecureName or RawName, as proposed above, combined with sanitizing the current Name field, seems like enough. But then you need to define what "sanitized" means. Is it just ".." or do we care about other things, like files named COM1 or with backslashes or ... ?

@dsnet

This comment has been minimized.

Member

dsnet commented Sep 26, 2018

I'm not a fan of duplicated header fields.

  • When writing, it is not clear whether you are required to populate both Name and InsecureName.
  • In the case of archive/tar you would need both InsecureName and InsecureLinkname.
  • When reading, how does duplicated fields help the security issue? If I'm reading a file entry, and it is not "self-contained", does the reader just put an empty Name in the header and populate the InsecureName instead? Is it my responsibility to check that Name is not set, but that InsecureName is? If the suggestion is a "sanitized" name, how could that work for "../foo.txt"?

The more I think about this, I like my suggestion of a non-fatal distinguishable error (#25849 (comment)).

@rsc

This comment has been minimized.

Contributor

rsc commented Nov 28, 2018

This seems OK to leave until Go 1.13.

@rsc rsc removed this from the Go1.12 milestone Nov 28, 2018

@rsc rsc added this to the Go1.13 milestone Nov 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment