Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: archive/zip: add ability to read local file header #40354

Open
dstaley opened this issue Jul 22, 2020 · 3 comments
Open

proposal: archive/zip: add ability to read local file header #40354

dstaley opened this issue Jul 22, 2020 · 3 comments
Labels
Projects
Milestone

Comments

@dstaley
Copy link

@dstaley dstaley commented Jul 22, 2020

In a zip archive, file information is stored in two locations: the central directory header, and a local file header located prior to the file's data. In certain circumstances, a user might want to read information from the local file header as opposed to solely the central directory header. The current implementation relies solely on the central directory header to compute the FileHeader struct.

Instances in which a user might want to have access to the local header include:

  1. comparing the central and local file names to detect potential malicious activity (such as naming a file with a .txt extension in the central directory header, but with .exe in the local file header)
  2. detecting potential archive corruption (such as when a file contains a trailing slash in the central directory header, but the correct character in the local file header)

Currently, if a user wanted to read the local file header, they would need to implement their own zip reader, as no export would allow them the raw access to the underlying reader necessary to read the local file header.

Ideally, the solution would not introduce additional unnecessary reads for users who don't care about the local file header. One potential solution would be the addition of the following function:

func (f *File) ReadLocalHeader() (FileHeader, error) {
	var buf [fileHeaderLen]byte
	if _, err := f.zipr.ReadAt(buf[:], f.headerOffset); err != nil {
		return FileHeader{}, err
	}
	b := readBuf(buf[:])
	if sig := b.uint32(); sig != fileHeaderSignature {
		return FileHeader{}, ErrFormat
	}
	h := FileHeader{}
	h.ReaderVersion = b.uint16()
	h.Flags = b.uint16()
	h.Method = b.uint16()
	h.ModifiedTime = b.uint16()
	h.ModifiedDate = b.uint16()
	h.CRC32 = b.uint32()
	h.CompressedSize = b.uint32()
	h.UncompressedSize = b.uint32()
	h.CompressedSize64 = uint64(h.CompressedSize)
	h.UncompressedSize64 = uint64(h.UncompressedSize)
	filenameLen := int(b.uint16())

	d := make([]byte, filenameLen)
	if _, err := f.zipr.ReadAt(d[:], f.headerOffset+fileHeaderLen); err != nil {
		return FileHeader{}, err
	}
	h.Name = string(d[:filenameLen])

	return h, nil
}

This would allow users to explicitly request a read of the local file header for a given zip.File.

@gopherbot gopherbot added this to the Proposal milestone Jul 22, 2020
@gopherbot gopherbot added the Proposal label Jul 22, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 22, 2020

CC @dsnet

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Jul 22, 2020
@dsnet
Copy link
Member

@dsnet dsnet commented Jul 23, 2020

The duplication of information in ZIP combined with the lack of implementations that verify that the two are consistent has resulted in a large number of ZIP files that have inconsistent local vs central header information. I believe that the de facto state of most implementations is to treat the central header as authoritative anytime they differ.

I don't see a way to provide this feature without a potentially significant change to the API. While I understand the desire for being able to verify ZIP file integrity for security purposes, I'm not sure it's a compelling enough use-case to warrant this additional API. Let's say that we verify consistency of (centrally known) local headers with the central headers, there are still weird corner cases that people can theoretically exploit for malicious purposes. For example, what do you do with local files that do not exist in the central directory? Such a situation still presents a possible security hole. I'm not sure this implementation would want to support such a use case at it would require changing the ZIP decoder from being random-access, to be streaming based.

@dstaley
Copy link
Author

@dstaley dstaley commented Jul 23, 2020

Just to be absolutely clear, I'm not suggesting that the archive/zip package perform validation by default (or even by request), simply that it provide access to the local file header should the user wish to perform that validation themselves. The provided suggestion is a single new API method, which I think is probably the least-significant modification that could be made and still reasonably support this functionality. An alternative would be providing access to the underlying reader, but I'm not 100% sure you could do that and still have safe concurrent access (though I'd love to be proven wrong there!).

For example, what do you do with local files that do not exist in the central directory?

As of now, I believe it's impossible to even detect this state with the current implementation. However the proposed API (providing access to the local header of a zip.File generated from the central header) also wouldn't provide the information necessary to detect this scenario. I do agree that that specific scenario would likely require major modifications, and likely isn't worth implementing in the standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.