Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: add support for extended timestamps #18359

Closed
bobjalex opened this issue Dec 17, 2016 · 60 comments
Closed

archive/zip: add support for extended timestamps #18359

bobjalex opened this issue Dec 17, 2016 · 60 comments
Assignees
Milestone

Comments

@bobjalex
Copy link

@bobjalex bobjalex commented Dec 17, 2016

What version of Go are you using (go version)?

go version go1.8beta2 windows/amd64

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\GoLib_beta
set GORACE=
set GOROOT=C:\Go_beta
set GOTOOLDIR=C:\Go_beta\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\Bob\AppData\Local\Temp\go-build778111039=/tmp/go-build -gno-record-gcc-switches
set CXX=g++
set CGO_ENABLED=1
set PKG_CONFIG=pkg-config
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2

What did you do?

I use zip archives in contexts that require correct file times, and are required to be compatible with non-Go zip libraries. Go's default archive/zip writing is incompatible with many other zip libraries and programs (e.g. Java's and Python's provided libraries, the info-zip zip program distributed with Linux, ...).

The problem:

  • Most zippers store local time into the main (MSDOS-FAT-type) header time fields, and Go stores
    UTC, causing times to be wrong when unzipped by other programs.
  • Go 8 stores UTC into the extended time fields, which is universal and a good thing (Go 7 does
    not store an extended time field).
  • Before Go 8, I could work around the issue by offsetting the file dates to cause local time to be stored.
  • With Go 8, I can't do that because it also offsets the extended time, which is incorrect.
  • The typical way that zippers work is to store local regular times and, if extended times are
    supported, UTC extended times.
  • But I can't make that happen with archive/zip in Go 8. It unconditionally stores the same time in
    both extended and non-extended fields -- UTC.
  • Since some unzippers use extended times and some don't, results are inconsistent,
    making the problem is actually worse in Go 8 than in prior versions.

Some solutions:
(1) Change archive/zip to store local non-extended times.
- But that might violate the Go 1 principle (???)
(2) Provide an option (in a Go 1 compatible way) that specifies whether the main non-extended
time fields should be local or UTC, defaulting to the current way. Like a public boolean field in
the writer and reader.

I favor (2), since it is flexible and would not violate the Go 1 principle. Default operation would be exactly as pre-Go 8 versions.

What did you expect to see?

Extended time stored as UTC and old (main) time fields stored as local time.

What did you see instead?

All file times stored as UTC.

@dsnet
Copy link
Member

@dsnet dsnet commented Dec 17, 2016

This was something that was annoying me about the current zip.FileHeader. I think it was an API mistake for there to be a ModifiedTime and ModifiedDatefield storing the time as MS-DOS formatted uint16s.

The path forward is probably to introduce another field ModTime time.Time and deprecate the MS-DOS timestamps in the same way we dealt with the uint32 sized fields.

EDIT: We'll need a different name since ModTime is already taken as the field name.

@dsnet dsnet added the NeedsFix label Dec 17, 2016
@dsnet dsnet added this to the Go1.9Maybe milestone Dec 17, 2016
@dsnet
Copy link
Member

@dsnet dsnet commented Dec 17, 2016

Also, to understand your problem more clearly. Is this problem something blocking you for Go 1.8 release? I don't think we can improve time handling in the zip package for the remainder of Go 1.8, but if this is problematic, we can revert the current change and go for a more complete round-trip fix in Go 1.9.

@dsnet dsnet modified the milestone: Go1.9Maybe Dec 17, 2016
@bradfitz bradfitz modified the milestones: Go1.8, Go1.9Maybe Dec 19, 2016
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 19, 2016

Let's partially roll this back for 1.8 and fix it fully in 1.9.

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Dec 19, 2016

This problem is not blocking me -- I have several ways to work around it.

In fiddling with this problem, I created a new zip module derived from the 1.8 one that solves the problem and is compatible with older versions. I added 2 new function that allow specification of whether the MSDOS 16-bit time and date fields should store as UTC (as it does now) or local (as many existing zip programs and libraries do):
func FileInfoHeaderLocal(fi os.FileInfo, localTime bool) (*FileHeader, error) // for writing
func OpenReaderLocal(name string, localTime bool) (*ReadCloser, error) // for reading

That seems to be all that is necessary (with an handful of other hidden code changes to make that work, but not very much, really).

My new version seems to be working OK for my purposes, but has not been exhaustively QA'd.

I attempted to attach my code -- feel free to use these changes if suitable, or point out improvements.

newzip.zip

@dsnet
Copy link
Member

@dsnet dsnet commented Dec 19, 2016

In light of #18378, I'm planning on reverting the whole change and just fix it fully for Go 1.9.

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 20, 2016

CL https://golang.org/cl/34651 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 20, 2016
This change reverts the following CLs:
	CL/18274: handle mtime in NTFS/UNIX/ExtendedTS extra fields
	CL/30811: only use Extended Timestamp on non-zero MS-DOS timestamps

We are reverting support for extended timestamps since the support was not
not complete. CL/18274 added full support for reading extended timestamp fields
and minimal support for writing them. CL/18274 is incomplete because it made
no changes to the FileHeader struct, so timezone information was lost when
reading and/or writing.

While CL/18274 was a step in the right direction, we should provide full
support for high precision timestamps in both the reader and writer.
This will probably require that we add a new field of type time.Time.
The complete fix is too involved to add in the time remaining for Go 1.8
and will be completed in Go 1.9.

Updates #10242
Updates #17403
Updates #18359
Fixes #18378

Change-Id: Icf6d028047f69379f7979a29bfcb319a02f4783e
Reviewed-on: https://go-review.googlesource.com/34651
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@dsnet dsnet modified the milestones: Go1.9, Go1.8 Dec 20, 2016
@dsnet dsnet changed the title archive/zip: file times don't play nicely with other zip libs and programs archive/zip: add support for extended timestamps Dec 20, 2016
@dsnet
Copy link
Member

@dsnet dsnet commented Dec 20, 2016

Re-purposing this issue to be about adding full support for extended timestamps. The solution we come up with should:

  • Probably add a new field of type time.Time to FileHeader
  • Not cause gibberish to be encoded if that timestamp is zero (#17403)
  • Continue to solve the use-case in #10242
  • Beware of the bug discovered in #18378
  • Not cause any problems that the author of this issue originally ran into.
@bobjalex
Copy link
Author

@bobjalex bobjalex commented Dec 20, 2016

Reposting the modified archive/zip code I posted in this thread with changes to handle local time in the 16-bit timestamp fields, to allow compatibility with much existing zip software. Found and fixed a bug in setting proper file time in some unzipped files. Now it's perfect :-)
newzip.zip

@dsnet
Copy link
Member

@dsnet dsnet commented Dec 20, 2016

I'll take a look at this when the Go1.9 development starts up. Thanks for your help.

@mattn
Copy link
Member

@mattn mattn commented Feb 6, 2017

@dsnet I hope this issue fix. And be sure in come in go1.9 .

Probably add a new field of type time.Time to FileHeader

What the reason for this? As my understanding, Go still have problem that doesn't handle extended file timestamp. And it should be fixed. So I think we don't have to keep compatibility.

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Feb 6, 2017

@mattn
Copy link
Member

@mattn mattn commented Feb 7, 2017

Please, either change to store local time in the non-extended timestamp, or
provide a mechanism so that it's at least possible to do so.

Store local-time in the non-extended timestamp is not good idea. It will make problem. When make zip file at 00:00 local time in several country, the timestamp should always pointed different time. Because it should have time offset. The field always be stored UTC.

In my opinion, the timestamp handled extra-field MUST be default in archive/zip. For example, unzip command show the offset-ed timestamp always with unzip -l.

c:\dev\unzip\unzip60\tmp>dir ..\README
 Volume in drive C has no label.
 Volume Serial Number is 207E-613D

 Directory of c:\dev\unzip\unzip60

2009/04/20  07:37            18,337 README
               1 File(s)         18,337 bytes
               0 Dir(s)  262,483,783,680 bytes free

c:\dev\unzip\unzip60\tmp>zip test.zip ..\README
  adding: ../README (deflated 58%)

c:\dev\unzip\unzip60\tmp>unzip -l test.zip
Archive:  test.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    18337  2009-04-20 07:37   ../README
---------                     -------
    18337                     1 file

c:\dev\unzip\unzip60\tmp>unzip test.zip
Archive:  test.zip
warning:  skipped "../" path component(s) in ../README
  inflating: README

c:\dev\unzip\unzip60\tmp>dir README
 Volume in drive C has no label.
 Volume Serial Number is 207E-613D

 Directory of c:\dev\unzip\unzip60\tmp

2009/04/20  07:37            18,337 README
               1 File(s)         18,337 bytes
               0 Dir(s)  262,483,570,688 bytes free

BTW, What is the usecase that you want to handle non-extended timestamp?

If there are people who want to use non-extended timestamp, we may have to provide API to access non-extended timestamp. But I think extended timestamp should be default (f.ModTime). The code always behave as same in anywhere, anytimes. We shouldn't add workaround adding time offset in code of application.

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Feb 7, 2017

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Feb 8, 2017

@dsnet dsnet modified the milestones: Go1.10, Go1.9 May 22, 2017
@mattn
Copy link
Member

@mattn mattn commented May 26, 2017

Well, I want to fix this until 1.9 . If someone will go to right way instead me, please do it. Or please tell me what is right fix. I'm probably confusing.

@bobjalex
Copy link
Author

@bobjalex bobjalex commented May 26, 2017

@mattn
Copy link
Member

@mattn mattn commented May 27, 2017

As far as I can read your opinion, I guess that you mean below.

When zip timestamp is written as "2016/01/02 15:04:05", ft.ModTime() should be 2016/01/02 15:04:05 JST for japanese local.

Right? Then patch will be below.

diff --git a/src/archive/zip/reader_test.go b/src/archive/zip/reader_test.go
index dfaae78436..8a09f897bd 100644
--- a/src/archive/zip/reader_test.go
+++ b/src/archive/zip/reader_test.go
@@ -15,7 +15,6 @@ import (
 	"regexp"
 	"strings"
 	"testing"
-	"time"
 )
 
 type ZipTest struct {
@@ -369,13 +368,8 @@ func readTestFile(t *testing.T, zt ZipTest, ft ZipTestFile, f *File) {
 	}
 
 	if ft.Mtime != "" {
-		mtime, err := time.Parse("01-02-06 15:04:05", ft.Mtime)
-		if err != nil {
-			t.Error(err)
-			return
-		}
-		if ft := f.ModTime(); !ft.Equal(mtime) {
-			t.Errorf("%s: %s: mtime=%s, want %s", zt.Name, f.Name, ft, mtime)
+		if mt := f.ModTime(); mt.Format("01-02-06 15:04:05") != ft.Mtime {
+			t.Errorf("%s: %s: mtime=%s, want %s", zt.Name, f.Name, mt, ft.Mtime)
 		}
 	}
 
diff --git a/src/archive/zip/struct.go b/src/archive/zip/struct.go
index 0be210e8e7..aff4d4b665 100644
--- a/src/archive/zip/struct.go
+++ b/src/archive/zip/struct.go
@@ -160,7 +160,7 @@ func msDosTimeToTime(dosDate, dosTime uint16) time.Time {
 		int(dosTime&0x1f*2),
 		0, // nanoseconds
 
-		time.UTC,
+		time.Local,
 	)
 }
 
@@ -168,7 +168,7 @@ func msDosTimeToTime(dosDate, dosTime uint16) time.Time {
 // The resolution is 2s.
 // See: http://msdn.microsoft.com/en-us/library/ms724274(v=VS.85).aspx
 func timeToMsDosTime(t time.Time) (fDate uint16, fTime uint16) {
-	t = t.In(time.UTC)
+	t = t.In(time.Local)
 	fDate = uint16(t.Day() + int(t.Month())<<5 + (t.Year()-1980)<<9)
 	fTime = uint16(t.Second()/2 + t.Minute()<<5 + t.Hour()<<11)
 	return
@rsc
Copy link
Contributor

@rsc rsc commented Jun 16, 2017

Hi @bobjalex. I do appreciate your willingness to help. I checked with our lawyers and it's OK for us to accept the diff here, since you said we can use it, provided you complete a contributor CLA as discussed in https://golang.org/doc/contribute.html#cla (the "Contributor License Agreement" section). Can you do that?

Thanks very much.

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Jun 16, 2017

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Jun 29, 2017

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jun 29, 2017

I confirm that we have the CLA on file for @bobjalex.

@rsc
Copy link
Contributor

@rsc rsc commented Jul 6, 2017

OK, thanks @bobjalex. @dsnet, can you look at this for Go 1.10?

@mattn
Copy link
Member

@mattn mattn commented Jul 19, 2017

@bobjalex something problems for create CL?

@dsnet dsnet self-assigned this Sep 28, 2017
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 1, 2017

Change https://golang.org/cl/74970 mentions this issue: archive/zip: add FileHeader.Modified field

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 1, 2017

@bobjalex, take a look at CL/74970 to see if it suites your needs.
It maintains backwards compatibility where UTC is typically the timezone used in the legacy timestamp fields.

However, since the FileHeader.Modified field is newly added, you can always set it directly to control the timezone used for the date.

fh, _ := zip.FileInfoHeader(fi) // This uses UTC for Modified to be backwards compatible
fh.Modified = dh.Modified.In(time.Local)
@mattn
Copy link
Member

@mattn mattn commented Nov 2, 2017

@dsnet Thank you for working this. I'm totally agreed to your CL. One question, how do you think about zipfs issue? We have to convert Modified for each items in the zip file?

#18359 (comment)

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 2, 2017

Why would this be an issue with zipfs? The time returned will be returned as UTC as before, and will be an accurate point in time (assuming the underlying zip has extended timestamps).

Rather, I should ask, why would the user care for the original timezone of the underlying zip file?

Essentially, my CL should not change the behavior of zipFI.ModTime in any way (other than being correct if zip has extended timestamps).

@mattn
Copy link
Member

@mattn mattn commented Nov 2, 2017

This is a file that archived using zip command.

foo.zip

file timestamp of write.go is 2017/11/02 10:58.

image

and file write.go in the zip archive have same timestamp.

image

Using CL74970 and following code.

package main

import (
	"archive/zip"
	"io/ioutil"
	"log"
	"os"
)

func ZipFile(writer *zip.Writer, zipPath string, targetFile string) error {
	f, err := os.Open(targetFile)
	if err != nil {
		return err
	}
	defer f.Close()

	body, err := ioutil.ReadAll(f)

	if err != nil {
		return err
	}

	info, err := os.Stat(targetFile)
	if err != nil {
		return err
	}

	header, _ := zip.FileInfoHeader(info)
	header.Name = zipPath

	zf, err := writer.CreateHeader(header)
	if err != nil {
		return err
	}

	if _, err := zf.Write(body); err != nil {
		return err
	}
	return nil
}

func main() {
	f, err := os.Create("my.zip")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	zw := zip.NewWriter(f)

	if ferr := ZipFile(zw, "write.go", "write.go"); ferr != nil {
		panic(ferr)
	}

	if ferr := zw.Close(); ferr != nil {
		panic(ferr)
	}
}

my.zip

image

the timestamp of write.go in the zip archive is not same.

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 2, 2017

Just to confirm, this was the same behavior before CL74970, right? If so, this is what I expect since Windows doesn't understand the extended timestamps.

If you do:

	header, _ := zip.FileInfoHeader(info)
	header.Name = zipPath
	header.Modified = header.Modified.In(time.Local)

Does that fix it?

@mattn
Copy link
Member

@mattn mattn commented Nov 2, 2017

Does that fix it?

Yes. it fixed correctly. Does it require to call header.Modifierd.In to be handled in our timestamp for Reader and Writer both?

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 2, 2017

Does it require to call header.Modified.In(time.Local)

When writing a zip file, it will be required. The old zip behavior was to use UTC in the legacy timestamp fields by default. The Go1 compatibility agreement binds us to preserve that behavior since it's not a bug on Go's writer.

@mattn
Copy link
Member

@mattn mattn commented Nov 2, 2017

Okay, I understand. Thank you. BTW, is it possible to add Writer#Locale to set whole timezone automatically in the headers? It should be UTC.

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 2, 2017

Writer.Locale is a reasonable API change, but let's not include this in this release. While it is clunkier to do header.Modified.In(time.Local) for every file, I'm don't think it's that bad.

If we had Writer.Locale, it would still take a single line to set it. On the other hand, you typically add a file to a zip archive in a for loop, so it would still be a single line to call header.Modified.In(time.Local).

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Nov 2, 2017

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 4, 2017

I'll be submitting the CL on Monday, since it's already past the freeze. Be sure to provide feedback before then.

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Nov 4, 2017

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 5, 2017

Thought 1: The mini-spec does not say what the time zone of Modified is if
there are no extended timestamps? (I'm assuming UTC in the following
comments.)

(By mini-spec, I assume you mean the Go documentation?). In the absence of extended timestamps, Modified will always be UTC.

Thought 2: I'm trying to reason how an application that chooses to
interpret a legacy time field as local time would work:

The encoding of FileHeader.Modified.Zone is not so you can detect whether extended timestamps are present, but more so that round-trip encoding of the FileHeader produces a similar ZIP file, where the legacy time field should be identical to the original for most reasonable cases (assuming the values differ exactly by the timezone offset).

Would the above snippet always work?

No. If the ZIP file was created in the UTC timezone itself, the current API does not provide enough information regarding whether extended timestamps were present or not.

If I understand you correctly, your main use case is that (in the lack of an extended timestamp), you wanted the reader to interpret the legacy field as a local time.

If I remove the offset != 0 case in struct.go:173, then you should be able to determine when extended timestamps are present except for the case where the legacy and extended timestamp fields differ in a unreasonable way (e.g., difference exceeds -12 or +14 hours).

Alternatively, you could check FileHeader.Extra, but I'm not a fond of how much low-level detail the ZIP API exposes to the user.

Other notes:

  • If there are multiple extended timestamps, the last one is used, even if every single timestamp listed says something different from each other.
@bobjalex
Copy link
Author

@bobjalex bobjalex commented Nov 5, 2017

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 6, 2017

I'm going to modify the conditional on struct.go:173 since it's a simple change. Thus, to determine whether extended timestamps are present, you can simply just check whether FileHeader.Modified.Location() == time.UTC or not.

Not to disparage your use-case, but I personally believe the number of users that really care that much about accurate timestamps, are not common, so I'm reluctant to add more API surface, especially for a feature of ZIP that was so poorly designed and inconsistent across the world.

Your feedback is helpful, though.

@bobjalex
Copy link
Author

@bobjalex bobjalex commented Nov 6, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.