Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: backport 'fix nanotime for macOS Sierra, again' to go 1.6.x maybe #17234

Closed
liggitt opened this issue Sep 26, 2016 · 20 comments

Comments

Projects
None yet
10 participants
@liggitt
Copy link
Contributor

commented Sep 26, 2016

If there is another Go 1.6.x release, consider adding the fix for #16570

(spawned from #16354 (comment))

@bradfitz

This comment has been minimized.

Copy link
Member

commented Sep 26, 2016

It's not our normal policy to backport non-security bug fixes to a non-current release branch.

@bradfitz bradfitz added this to the Go1.6.4 milestone Sep 26, 2016

@broady

This comment has been minimized.

Copy link
Member

commented Sep 27, 2016

Perhaps more importantly, we should do a 1.4.4.

As far as I could tell, the main reason to do a 1.6.4 was Docker, and they've moved on to 1.7.1 now.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Sep 27, 2016

Backporting for 1.4.4 is #16352

@smarterclayton

This comment has been minimized.

Copy link

commented Sep 30, 2016

Understanding that this is not normal policy, Go 1.7 is new enough in channels that distributions will have a hard time forcing everyone to upgrade to a newer Go runtime (and introducing large amounts of risk of uncovered issues), and so Mac OS being broken for Go 1.6 causes a fairly large amount of pain in the ecosystem.

The risk profile of upgrading something like Kubernetes to Go 1.7 in point releases without a long soak period on 1.7 increases the risk of regressions in other areas. Each new Go release has taken 1-2 months on Kubernetes to achieve satisfactory soak, and while the community has been fairly good at identifying and closing regressions on new major versions, it becomes very difficult from a release process to justify an upgrade mid-cycle.

We are considering carrying a backport for Fedora/CentOS/RHEL Go 1.6 to enable cross-compilation and unbreak those building on 1.6, but we really do prefer to keep the patch list as small as possible.

@vadimsht

This comment has been minimized.

Copy link

commented Sep 30, 2016

I'd also like to add that any code that targets Appengine Standard Environment must compile on Go 1.6 (because GAE runtime is stuck at Go 1.6.2). So code that targets GAE can't be built locally on macOS Sierra anymore (because 1.6.3 compiler crashes).

@mikekap

This comment has been minimized.

Copy link

commented Oct 9, 2016

Could this technically be a security bug? The root cause is a write to (somewhat) arbitrary memory. I'm not sure if it's possible, but could that be used to cause "interesting" kinds of memory/stack corruption?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Oct 9, 2016

@mikekap, nothing surprises me anymore security-vulnerability-wise, but it seems a little contrived.

@copumpkin

This comment has been minimized.

Copy link

commented Oct 18, 2016

See #17490 for a broader related proposal

@rsc

This comment has been minimized.

Copy link
Contributor

commented Oct 28, 2016

The release policy, documented at https://golang.org/doc/devel/release.html#policy, says that:

Each major Go release obsoletes and ends support for the previous one. For example, if Go 1.5 has been released, then it is the current release and Go 1.4 and earlier are no longer supported. We fix critical problems in the current release as needed by issuing minor revisions (for example, Go 1.5.1, Go 1.5.2, and so on).

As a special case, we issue minor revisions for critical security problems in both the current release and the previous release. For example, if Go 1.5 is the current release then we will issue minor revisions to fix critical security problems in both Go 1.4 and Go 1.5 as they arise. See the security policy for more details.

I do not believe we should issue a Go 1.6.4 containing any fixes for macOS Sierra, for two main reasons.

First, doing so would give the appearance of support for Go 1.6 when in fact Go 1.6 is now unsupported.

Second, we can make no guarantee that the nanotime crash is the only problem with Go 1.6 on Sierra. If we fix this quite obvious problem, Go 1.6 on Sierra may be left with subtler problems. Since relatively few people will be running Go 1.6 on Sierra (and Go 1.6 is unsupported), those problems are likely to go unsolved for a longer amount of time. Better to have everyone on Sierra using the same release: Go 1.7.

An obvious question is why we are comfortable making an exception for critical security problems. The answer is that security fixes are both important and minimal. If you're running Go 1.6.3 on Ubuntu 16.10 and happy and we identify a serious security problem, we will write the simplest, most minimal, most obviously correct fix we can, to boost confidence that running a hypothetical Go 1.6.4 on Ubuntu 16.10 will be exactly the same as running Go 1.6.3, except for the few lines of changes that fix the security problem. In this example, to a very large extent, the testing and qualification you've done for Go 1.6.3 on Ubuntu 16.10 carries forward to the hypothetical Go 1.6.4 on Ubuntu 16.10: only a few lines of code have changed in the entire stack (the security fix itself).

In contrast, the testing and qualification you may have done for Go 1.6.3 on El Capitan does not carry forward to a hypothetical Go 1.6.4 on Sierra. In addition to the 1-line patch in Go 1.6.3 that would stop gettimeofday from crashing, a 4 gigabytes update has been applied to the underlying operating system. The entire stack is very different. Putting aside the sheer size, It's also not uncommon or unexpected in general for operating system updates to break Go. If you are supporting a product written in Go and distributed for OS X / macOS, I would expect that you'd do QA testing for each different operating system version you intend to support. Instead of putting effort into testing a hypothetical unsupported Go 1.6.4 on Sierra, you could instead put that same effort into testing the real, supported Go 1.7.3 on Sierra.

Now that I've laid out the reasons you should not use Go 1.6.3+timefix on Sierra, I had intended to create a Gerrit CL in order to provide a Git ref for those who still insisted on using it. However, updating my laptop to Sierra and applying the time fix to Go 1.6.3 basically confirmed everything I said above. While the reliable crash is gone, less obvious bugs remain. It took me four tries to run all.bash successfully. The first time, an HTTP test got an unexpected "connection reset by peer". The second time, a compiled program crashed at startup in the guts of the memory allocator with "fatal error: MSpanList_Insert". The third time, a different HTTP test got an unexpected "connection reset by peer". The fourth time, all.bash passed.

It seems to me that Go failing unpredictably is worse than Go failing reliably. I don't think I will post a CL to make it easier to point this particular loaded gun at your foot. If you insist, you can recreate it with:

# For a broken Go-on-Sierra toolchain...
cd go
git checkout go1.6.3
git cherry-pick 2da5633e
echo go1.6.3+timefix >VERSION

Google has been running Go 1.7 in production by default since the first release candidate in early July:(that's the definition of a release candidate.

I encourage all current Go 1.6 users to update to Go 1.7, especially on Sierra. Go 1.7 is a supported release and will receive updates as needed if additional Sierra problems are found (until Go 1.8 is released in February 2017).

@rsc

This comment has been minimized.

Copy link
Contributor

commented Oct 28, 2016

@ianlancetaylor points out that if you cherry-pick another fix from the Go 1.7 release branch, then with both those fixes Go 1.6.3 does appear to run successfully on macOS 10.12 Sierra.

As I had intended originally, I've created a Gerrit CL containing the fixes, so that people who understand all the risks and still want to use Go 1.6.3 plus patches on Sierra can do so without having to apply patches themselves. I want to emphasize again that this code is at best very lightly tested - all.bash passes - and that Go 1.6 is unsupported: any functionality problems found in Go 1.6 at this point will not be fixed. If you do want to use a toolchain built from this CL, please do your own testing on Sierra to make sure that the code meets your needs.

With that caveat, you can get this modified toolchain using:

git clone -b go1.6.3 https://go.googlesource.com/go go163sierra
cd go163sierra
git fetch origin refs/changes/54/32254/1 && git checkout FETCH_HEAD
cd src
./all.bash
../bin/go version

all.bash should print "ALL TESTS PASSED" at the end, and the final ../bin/go version command should print:

$ ../bin/go version
go version go1.6.3+unsupported-sierra-fixes-20161028 darwin/amd64
$
@dimes

This comment has been minimized.

Copy link

commented Nov 13, 2016

Anyone using App Engine is unable to upgrade because App Engine is stuck on 1.6.2. This means that anyone using App Engine can no longer build or test their code locally using the App Engine SDK. This is a huge deal. How can you leave your customers hanging like that?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 13, 2016

@dimes, yours is the first mention of App Engine in this bug. Don't assume malice here. I think the Go core team weren't aware that App Engine was stuck on Go 1.6, and I bet the Go App Engine team is unaware of this bug at all. (Unless somebody has told them)

But you raise a good point. I'll discuss with people.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 13, 2016

@dimes, actually, I hear this is fixed in the Go App Engine SDK in version 1.9.46 (2016-11-03) from https://cloud.google.com/appengine/docs/go/download

Can you confirm?

@dimes

This comment has been minimized.

Copy link

commented Nov 13, 2016

Hmm, I'm not really sure what the App Engine SDK does internally, but if I use SDK version 1.9.46 with Go 1.6.3 I can build and run App Engine stuff. However, that Go 1.6.3 is useless for building non-App Engine stuff, so I guess the solution is to have two versions of Go installed. Not great, but better than nothing. Thanks for the help.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 13, 2016

Hmm, I'm not really sure what the App Engine SDK does internally, but if I use SDK version 1.9.46 with Go 1.6.3 I can build and run App Engine stuff.

Great. Thanks for confirming.

However, that Go 1.6.3 is useless for building non-App Engine stuff, so I guess the solution is to have two versions of Go installed. Not great, but better than nothing.

Is that a new situation, that the App Engine SDK can't be used for building non-App Engine things?

If not, I'm not sure how it's a relevant statement to the nanotime backport issue.

If it is, that's news to me, but that's probably an issue for App Engine. Or maybe it's intentional. It's been a few years since I've actively used App Engine now. I've also always just had a dozen GOROOTs laying around also, so I'm not sure what the expected usage pattern is.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 13, 2016

I don't know what "use SDK version 1.9.46 with Go 1.6.3" means. The SDK comes with its own copy of the Go toolchain. That one happens to be based on Go 1.6.3. When you run the goapp command, you get that one.

I don't believe the SDK looks at or uses any other copy of Go you have installed. It sounds like you are saying you have a separate copy of Go 1.6.3 installed; the SDK shouldn't care one way or the other about that one.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 13, 2016

I don't know what "use SDK version 1.9.46 with Go 1.6.3" means.

I read it as "SDK version 1.9.46 (which comes with Go 1.6.3)".

@dimes

This comment has been minimized.

Copy link

commented Nov 13, 2016

I don't know what "use SDK version 1.9.46 with Go 1.6.3" means.

This means that I have both the SDK and Go 1.6.3 installed with my GOROOT pointing to the 1.6.3 installation.

I don't believe the SDK looks at or uses any other copy of Go you have installed.

I know the SDK is supposed to use its own version of Go, but if I set my GOROOT to point to Go 1.7, then compilation fails. Maybe this is an SDK bug?

Edit: Just tested unsetting GOROOT and App Engine still fails to build (error cannot find package "context"). So it seems that the installed version of Go needs to be 1.6.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2016

@dimes, can you give the exact command you're running and the exact output? This probably belongs in a new issue (maybe in a new issue tracker) but I'll suggest where once I can see a bit more about the problem. Thanks.

@dimes

This comment has been minimized.

Copy link

commented Nov 14, 2016

The exact command I use is goapp serve staging.yaml.

When my $PATH has the binary for Go 1.7, i.e. which go => /usr/local/go/bin/go, I get the following error:

ERROR    2016-11-14 10:54:35,646 go_runtime.py:181] Failed to build Go application: (Executed command: go build -tags appenginevm -o /var/folders/8z/9sj7_02d4g36f_ksvbblpjb00000gq/T/tmpSs02_zappengine-go-bin/_ah_exe)

../../../../../golang.org/x/net/context/go17.go:10:2: cannot find package "context" in any of:
    /usr/local/go_appengine/goroot/src/context (from $GOROOT)
    /Users/me/go/src/context (from $GOPATH)
../../../../../golang.org/x/net/http2/go17.go:13:2: cannot find package "net/http/httptrace" in any of:
    /usr/local/go_appengine/goroot/src/net/http/httptrace (from $GOROOT)
    /Users/me/go/src/net/http/httptrace (from $GOPATH)

If Go 1.6 (which go => /usr/local/go-1.6.3/bin/go) is on my $PATH, everything works perfectly fine with app engine. But then any non-app engine stuff won't work due to the Sierra bug.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.