Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: zip64 extra headers problems #33116

Open
philip-firstorder opened this issue Jul 15, 2019 · 18 comments
Open

archive/zip: zip64 extra headers problems #33116

philip-firstorder opened this issue Jul 15, 2019 · 18 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@philip-firstorder
Copy link

philip-firstorder commented Jul 15, 2019

What version of Go are you using (go version)?

$ go version
go1.12.1 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/philip/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/philip/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/nd/dfzcj9w53w35kbvytmjwczg80000gn/T/go-build036595790=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Created 2 blank files:

$ mkfile -n 5000000000 ~/Desktop/5GB_big_empty_file
$ mkfile -n 5000 ~/Desktop/5KB_small_empty_file

Archived them with Go/archive/zip in the EXACT ORDER as above using:

fileHeader := &zip.FileHeader{
	Method: zip.Store, // no compression, very important
}

Then inspected the result in terminal with zipinfo -v and the contents with Hex Fiend:.

Then I tested the archive against all popular unarchivers and spend the last 2 weeks double-checking with their respective developers to confirm the problem:

What did you expect to see?

Document: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

4.5.3

The order of the fields in the zip64 extended
information record is fixed, but the fields MUST
only appear if the corresponding Local or Central
directory record field is set to 0xFFFF or 0xFFFFFFFF.

Not documented here is what to do when the size is EXACTLY 0xFFFFFFFF. But since the main purpose of archiving files is to make them as small as possible, then it makes more sense to NOT add a useless zip64 with a 0 value taking 8 extra data bytes for nothing.

Thus the Central-directory extra fields should contain only 16 and 8 data bytes, see correct 7zip code, for ONLY the fields > 0xFFFFFFFF:

Entry 1. Only 16 bytes for compressed and uncompressed sizes

Central directory entry #1:
---------------------------
  5GB_big_empty_file

  offset of local header from start of archive:   0
                                                  (0000000000000000h) bytes
...
  compressed size:                                5000000000 bytes
  uncompressed size:                              5000000000 bytes
...
  The central-directory extra field contains:
  - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 16 data bytes:
    00 f2 05 2a 01 00 00 00 00 f2 05 2a 01 00 00 00.

Entry 2. Only 8 bytes for the offset of local header

Central directory entry #2:
---------------------------
  5KB_small_empty_file

  offset of local header from start of archive:   5000000068
                                                  (000000012A05F244h) bytes
...
  compressed size:                                5000 bytes
  uncompressed size:                              5000 bytes
...
  The central-directory extra field contains:
  - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
    44 f2 05 2a 01 00 00 00.

What did you see instead?

Go wrongly adds 24 data bytes in the extra fields for both files:

Central directory entry #1:
---------------------------
  5GB_big_empty_file

  offset of local header from start of archive:   0 <= this should NOT be added
                                                  (0000000000000000h) bytes
...
  compressed size:                                5000000000 bytes
  uncompressed size:                              5000000000 bytes
...
  The central-directory extra field contains:
  - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 24 data bytes:
    00 f2 05 2a 01 00 00 00 00 f2 05 2a 01 00 00 00 00 00 00 00 00 00 00 00.

Central directory entry #2:
---------------------------
  5KB_small_empty_file

  offset of local header from start of archive:   5000000081
                                                  (000000012A05F251h) bytes
 ...
  compressed size:                                5000 bytes <= this should NOT be added
  uncompressed size:                              5000 bytes <= this should NOT be added
 ...
  The central-directory extra field contains:
  - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 24 data bytes:
    88 13 00 00 00 00 00 00 88 13 00 00 00 00 00 00 51 f2 05 2a 01 00 00 00.

This causes errors or warning messages in some unarchivers.

Source code explanation

src/archive/zip/writer.go#L101

if h.isZip64() || h.offset >= uint32max { // <= This check should be strictly > uint32max
	// the file needs a zip64 header. store maxint in both
	// 32 bit size fields (and offset later) to signal that the
	// zip64 extra header should be used. // <= These 3 fields should NOT be all set together
	b.uint32(uint32max) // compressed size  // <= set only if h.CompressedSize64 > uint32max
	b.uint32(uint32max) // uncompressed size  // <= set only if h.UncompressedSize64 > uint32max

	// append a zip64 extra block to Extra
	var buf [28]byte // 2x uint16 + 3x uint64 // <= These 3x uint64 fields should NOT be all set together
	eb := writeBuf(buf[:])
	eb.uint16(zip64ExtraID)
	eb.uint16(24) // size = 3x uint64 // <= This could be either 8, 16 or 24
	eb.uint64(h.UncompressedSize64) // <= set only if h.UncompressedSize64 > uint32max
	eb.uint64(h.CompressedSize64) // <= set only if h.CompressedSize64 > uint32max
	eb.uint64(h.offset) // <= set only if h.offset > uint32max
	h.Extra = append(h.Extra, buf[:]...)
} else {
	b.uint32(h.CompressedSize) // <= not needed if checked above
	b.uint32(h.UncompressedSize) // <= not needed if checked above
}

Proposed Solution

Replacing the above code with this should solve the problems.

if h.CompressedSize64 > uint32max {
	b.uint32(uint32max)
} else {
	b.uint32(uint32(h.CompressedSize))
}
if h.UncompressedSize64 > uint32max {
	b.uint32(uint32max)
} else {
	b.uint32(uint32(h.UncompressedSize))
}

// append a zip64 extra block to Extra
if h.CompressedSize64 > uint32max || h.UncompressedSize > uint32max || h.offset > uint32max {

	zip64ExtraSize := uint16(0)
	if h.CompressedSize64 > uint32max {
		zip64ExtraSize += 8
	}
	if h.UncompressedSize64 > uint32max {
		zip64ExtraSize += 8
	}
	if h.offset > uint32max {
		zip64ExtraSize += 8
	}

	var buf []byte
	buf = make([]byte, 4+zip64ExtraSize) // 2x uint16 + zip64ExtraSize
	eb := writeBuf(buf[:])
	eb.uint16(zip64ExtraID)
	eb.uint16(zip64ExtraSize)
	if h.CompressedSize64 > uint32max {
		eb.uint64(h.UncompressedSize64)
	}
	if h.UncompressedSize64 > uint32max {
		eb.uint64(h.CompressedSize64)
	}
	if h.offset > uint32max {
		eb.uint64(h.offset)
	}
	h.Extra = append(h.Extra, buf[:]...)
}
philip-firstorder added a commit to firstorder-gmbh/go that referenced this issue Jul 15, 2019
The central-directory extra fields can contain either 8, 16 or 24 data bytes for ONLY the fields > uint32max (0xFFFFFFFF).

They should not be all set when either CompressedSize, UncompressedSize or offset  > uint32max

Fixes golang#33116
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/186178 mentions this issue: archive/zip: fix zip64 extra headers problems

philip-firstorder added a commit to firstorder-gmbh/go that referenced this issue Jul 15, 2019
Should be h.UncompressedSize64 > uint32max

Fixes golang#33116
@odeke-em odeke-em changed the title zip64 extra headers problems archive/zip: zip64 extra headers problems Jul 15, 2019
@odeke-em
Copy link
Member

Thank you for this report @philip-firstorder as well as for the CL, and welcome to the Go project!

I shall kindly page @dsnet to take a look.

@odeke-em odeke-em added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 15, 2019
@philip-firstorder
Copy link
Author

@odeke-em Thank you for forwarding this. I will be using my fork in production till this gets released.

@philip-firstorder
Copy link
Author

Hi @odeke-em, I quoted the official documentation in the description of this issue.

I also think I understand why the confusion with
4.3.9.2 which says:

However ZIP64 format MAY be used regardless of the size of a file.

This refers to the local file headers and data descriptors, which are usually ignored by legacy unarchivers in favor of the central directory records at the end of the zip file.

4.3.6 Overall .ZIP file format:

  [local file header 1]
  [encryption header 1]
  [file data 1]
  [data descriptor 1]
  . 
  .
  .
  [local file header n]
  [encryption header n]
  [file data n]
  [data descriptor n]
  [archive decryption header] 
  [archive extra data record] 
  [central directory header 1]
  .
  .
  .
  [central directory header n]
  [zip64 end of central directory record]
  [zip64 end of central directory locator] 
  [end of central directory record]

The bug I discovered was in the central directory code, which should follow the 4.5.3 chapter in the documentation, not 4.3.9.2.

Does that make sense?

@odeke-em
Copy link
Member

odeke-em commented Nov 10, 2019 via email

@odeke-em
Copy link
Member

odeke-em commented Mar 3, 2020

How’s it going @philip-firstorder? Go1.15 development is currently in session. I left some feedback on your change. Please take a look, thanks!

@philip-firstorder
Copy link
Author

Hello @odeke-em. Unfortunately my knowledge in unit-testing in Go is limited. Can you please refer to the original programmer of the zip testing code?
Since I implemented the fix in my code I never had an issue again, so I can pretty mich guarantee that it will work.
Also, Keka, The Unarchiver and some other vendors all released fix updates for this issue.

@odeke-em
Copy link
Member

odeke-em commented Mar 3, 2020 via email

@philip-firstorder
Copy link
Author

Alright, I will look over that and get back at you.

@odeke-em
Copy link
Member

odeke-em commented Mar 3, 2020

For easy reference, here is the content that I suggested https://go-review.googlesource.com/c/go/+/186178/2#message-edf24149016442dc4300e59c92570de4d8c049d9 with the test and the commit message update,
and then down below I found for you relevant parts of the zip spec to quote in your commit message https://go-review.googlesource.com/c/go/+/186178/2#message-28c353e3634d77f238f682169c7f0dc9007f5442

I wanted to say again, thank you for filing this issue, for investigating it and for owning it and sending a change fixing. You’ve got this and I look forward to interacting with you even more!

@odeke-em odeke-em added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 9, 2020
@odeke-em
Copy link
Member

odeke-em commented Apr 9, 2020

@philip-firstorder kindly pinging you as the Go development tree closes in 3 weeks,
and it has been more than a month since my last ping :)

@philip-firstorder
Copy link
Author

@philip-firstorder kindly pinging you as the Go development tree closes in 3 weeks,
and it has been more than a month since my last ping :)

Hi, sorry for the delay but can some programmer at Google do the testing? My knowledge and time hinder me to further this issue.

@odeke-em
Copy link
Member

odeke-em commented Apr 9, 2020 via email

@philip-firstorder
Copy link
Author

You could go right ahead and start the CL from scratch with the changes you want to make, no problem for me. Sorry I cannot be of more help here.

@worksg
Copy link

worksg commented Oct 25, 2020

source code

if h.offset > uint32max {
b.uint32(uint32max)
} else {
b.uint32(uint32(h.offset))
}

A simple solution

I think the following code will solve the problem, I tested it and it works fine.

 if h.isZip64() || h.offset >= uint32max {  
 	b.uint32(uint32max) 
 } else { 
 	b.uint32(uint32(h.offset)) 
 } 

@philip-firstorder
Copy link
Author

source code

if h.offset > uint32max {
b.uint32(uint32max)
} else {
b.uint32(uint32(h.offset))
}

A simple solution

I think the following code will solve the problem, I tested it and it works fine.

 if h.isZip64() || h.offset >= uint32max {  
 	b.uint32(uint32max) 
 } else { 
 	b.uint32(uint32(h.offset)) 
 } 

Not enough, will probably not work with 7zip. My solution is in production for 1 year now and was tested against thousands archives without error.

Also my solution was validated by the programmers from 7zip and Keka, so I'm quite confident it's the right one.

@jboelter
Copy link

source code

if h.offset > uint32max {
b.uint32(uint32max)
} else {
b.uint32(uint32(h.offset))
}

A simple solution

I think the following code will solve the problem, I tested it and it works fine.

 if h.isZip64() || h.offset >= uint32max {  
 	b.uint32(uint32max) 
 } else { 
 	b.uint32(uint32(h.offset)) 
 } 

Just ran across the same issue w/ >4gb files in zip archives created by archive/zip. I came to the same solution of writing 0xFFFFFFFF for the offset when using a Zip64 extended attribute.

The proposed fix from @philip-firstorder also looks reasonable to only write a field in the extended attribute if > 0xFFFFFFFF.

How extensive is the compatibility and regression suite with existing built-in OS 'unzip'? (e.g. Windows 10 'Extract All..' from the explorer context menu).

klauspost pushed a commit to klauspost/compress that referenced this issue Jan 25, 2021
See also golang/go#33116

The offset must be written into the central directory
as 0xFFFFFFFF if it's going to be written in the zip64
extended information field.

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

The following is the layout of the zip64 extended
information "extra" block. If one of the size or
offset fields in the Local or Central directory
record is too small to hold the required data,
a Zip64 extended information record is created.
The order of the fields in the zip64 extended
information record is fixed, but the fields MUST
only appear if the corresponding Local or Central
directory record field is set to 0xFFFF or 0xFFFFFFFF.

Co-authored-by: Joshua Boelter <joshua.boelter@intel.com>
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
@arthurgustin
Copy link

arthurgustin commented Sep 29, 2022

Hello, I have some issues too with the generation of big zips:

  • Can't open them in the ubuntu standard GUI
  • Windows is able to open and extract but still shows a warning

I have tested the proposed fixes but it does not change the outcome

I have tested the integrity with 4 cli:

unzipada (OK)
unzipada -t file.zip
UnZipAda * minimal standalone unzipping tool Demo for the Zip-Ada library, by G. de Montmollin Library version 58 dated 20-Aug-2022 URL: https://unzip-ada.sourceforge.io/| This software is free and open-source. | It is provided "as is", WITHOUT WARRANTY OF ANY KIND. | For the full license wording, see the header (copyright & MIT license) | appearing on top of this software's source files. | In doubt, check the file: zip.ads
Testing archive file.zip Name
Method Compressed size Uncompressed size ------------------------- --------- --------------- ----------------- bigfile.txt Store 4294967337 : 100% of 4294967337 [........] 1 entries ------ Total ------ 4294967337 : 100% of 4294967337Test: no error foundStatistics per Zip sub-format ("method"): Store ... 1 files, 100% of all data; compr.-to-decompr. ratio: 100%Time elapsed : 30.94 sec Archive successfully processed (or empty archive, or no archive!)

zip (OK)
zip -Tv file.zip
zip warning: needs unzip 4.5 on system type 0: bigfile.txt
Archive: file.zip
testing: bigfile.txt
OK No errors detected in compressed data of file.zip. test of file.zip OK

unzip (OK)
unzip -tv file.zip
Archive: file.zip
testing: bigfile.txt
OK No errors detected in compressed data of file.zip.

7z (KO)
7z t file.zip7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=fr_FR.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz (806EC),ASM,AES-NI)
Scanning the drive for archives: 1 file, 4294967617 bytes (4097 MiB)
Testing archive: file.zip
ERRORS: Headers Error-- Path = file.zip Type = zip
ERRORS: Headers Error Physical Size = 4294967617 64-bit = + Archives with Errors: 1Open Errors: 1

Tools to repair the zip work fine

zip -vF file.zip --out repaired.zip

Sample code to reproduce

		f, err := os.Create("/tmp/test.zip")
		if err != nil {
			panic(err)
		}
		w := zip.NewWriter(f)
		token := make([]byte, 0xffffffff+42)
		rand.Read(token)
		fw, err := w.CreateHeader(&zip.FileHeader{
			Name:               "bigfile.txt",
			Method:             zip.Store,
			UncompressedSize64: uint64(len(token)),
			CompressedSize64:   uint64(len(token)),
		})
		if err != nil {
			panic(err)
		}

		_, err = fw.Write(token)
		if err != nil {
			panic(err)
		}
		err = w.Close()
		if err != nil {
			panic(err)
		}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants