Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: time.Sleep not monotonic if you import os/user & have used setcap to give binding capabilities #42150

Closed
Radisovik opened this issue Oct 22, 2020 · 11 comments

Comments

@Radisovik
Copy link

@Radisovik Radisovik commented Oct 22, 2020

What version of Go are you using (go version)?

Tried both these versions and both reproduced the issue
go version go1.12.7 linux/amd64
go version go1.15.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN="/home/eric/go/bin"
GOCACHE="/home/eric/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/eric/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go1.12.7"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go1.12.7/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build154110967=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"fmt"
	"os/user"
	"time"
)

func main() {
	start := time.Now()
	for {
		fmt.Printf("%v Uptime: %v\n", time.Now(), time.Now().Sub(start))
		time.Sleep(5 * time.Second)
	}
	cu := user.Current
	fmt.Printf("never get here %v", cu)
}

If I compile that into a binary, and give the bind capability to the binary via:

setcap cap_net_bind_service=+ep <binary>

And then I set the clock backwards 5minutes.

When I took out the import to os/user -- the problem goes away.

What did you expect to see?

I would expect to continue to see the "Uptime" messages to be printed every 5 seconds

What did you see instead?

Output stopped

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Oct 22, 2020

perhaps CGo related due to os/user import but I can't reproduce

19:32:08 ~/testrepo-199 0:00:00
main » go build .

19:32:16 ~/testrepo-199 0:00:01
main » sudo setcap cap_net_bind_service=+ep testrepo-199

19:32:20 ~/testrepo-199 0:00:00
main » ./testrepo-199
2020-10-22 19:32:27.263129711 +0200 CEST m=+0.000057595 Uptime: 154ns
2020-10-22 19:32:32.263498017 +0200 CEST m=+5.000426013 Uptime: 5.000368853s
2020-10-22 19:32:37.263981624 +0200 CEST m=+10.000909642 Uptime: 10.000852462s
2020-10-22 19:20:00.410276497 +0200 CEST m=+15.001422131 Uptime: 15.001364944s
2020-10-22 19:20:05.410748758 +0200 CEST m=+20.001894382 Uptime: 20.001837211s
2020-10-22 19:20:10.411244842 +0200 CEST m=+25.002390468 Uptime: 25.002333275s
^C

19:20:11 ~/testrepo-199 0:12:16
main » cat main.go
package main

import (
	"fmt"
	"os/user"
	"time"
)

func main() {
	start := time.Now()
	for {
		fmt.Printf("%v Uptime: %v\n", time.Now(), time.Now().Sub(start))
		time.Sleep(5 * time.Second)
	}
	cu := user.Current
	fmt.Printf("never get here %v", cu)
}

while the program was running:

19:32:23 ~ 0:00:00
» sudo timedatectl set-ntp 0

19:32:30 ~ 0:00:00
» sudo timedatectl set-time 19:20:00
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 22, 2020

If you import os/user, and set the capability, and run the program, and set the clock backward, you say that the output stops? Do you mean that the program keeps running, it just doesn't print anything? What happens if you wait until the clock catches up?

A call to time.Sleep will use the monotonic clock (clock_gettime with CLOCK_MONOTONIC) so it's hard to see why changing the wall clock time would matter. At least in 1.15 the core sleep step will be a call to epoll_wait which passes a timeout in milliseconds, so again it's hard to see why the wall clock time matters.

It might help to run your program, in the failing case, under strace -f, and attach the output here.

@ianlancetaylor ianlancetaylor changed the title Sleep not monotonic if you import os/user & have used setcap to give binding capabilities runtime: time.Sleep not monotonic if you import os/user & have used setcap to give binding capabilities Oct 22, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Oct 22, 2020
@Radisovik
Copy link
Author

@Radisovik Radisovik commented Oct 22, 2020

If you import os/user, and set the capability, and run the program, and set the clock backward, you say that the output stops? Do you mean that the program keeps running, it just doesn't print anything? What happens if you wait until the clock catches up?

In this incarnation we don't return from the sleep. When this was originally encountered on a full sized application every sleep that was in progress also waited for (original sleep time + time we set the clock back). We did still see inbound web requests being served.. so the whole application wasn't stuck.. just things stuck on sleep.

I should add that while we didn't reproduce this in the sample code provided.. a ticker also suffered the lack of monotomic clock-ness.

A call to time.Sleep will use the monotonic clock (clock_gettime with CLOCK_MONOTONIC) so it's hard to see why changing the wall clock time would matter. At least in 1.15 the core sleep step will be a call to epoll_wait which passes a timeout in milliseconds, so again it's hard to see why the wall clock time matters.

We were able to reproduce this in:

go version go1.12.7 linux/amd64
go version go1.15.3 linux/amd64

If that helps isolate.

So far we can reproduce this on our embedded device plus a vm that is running a standard ubuntu (18.04?).. while another ubuntu box doesn't exhibit the problem. Very much feels like a "per system" problem. I'm not sure how go determines if a monotomic clock is available..etc.. maybe that is busted up.

It might help to run your program, in the failing case, under strace -f, and attach the output here.

Let me do that.

@Radisovik
Copy link
Author

@Radisovik Radisovik commented Oct 22, 2020

strace -f

Well.. the plot thickens. Doing a:

strace -f <ourbinary>

The problem goes away. Tried it several times with several pairs of eyes.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 22, 2020

I do see a potential problem here. The Go runtime will check for the VDSO clock_gettime call, and use that if available. Through 1.15, if there is no VDSO, the Go runtime will call the fixed gettimeofday call at 0xffffffffff600000. But gettimeofday uses wall clock time whereas clock_gettime uses monotonic time. So this suggests that when you set the capability, and when you link dynamically, it somehow disables the VDSO. I do not know why that would happen.

Interestingly, this has been fixed on tip for the future 1.16 release, by https://golang.org/cl/257982. So while I don't know what is going wrong, it's possible that this will be fixed in 1.16. You might try applying that patch to your 1.15 sources to see if it helps.

@Radisovik
Copy link
Author

@Radisovik Radisovik commented Oct 22, 2020

I do see a potential problem here. The Go runtime will check for the VDSO clock_gettime call, and use that if available. Through 1.15, if there is no VDSO, the Go runtime will call the fixed gettimeofday call at 0xffffffffff600000. But gettimeofday uses wall clock time whereas clock_gettime uses monotonic time. So this suggests that when you set the capability, and when you link dynamically, it somehow disables the VDSO. I do not know why that would happen.

Interestingly, this has been fixed on tip for the future 1.16 release, by https://golang.org/cl/257982. So while I don't know what is going wrong, it's possible that this will be fixed in 1.16. You might try applying that patch to your 1.15 sources to see if it helps.

I do remember something about setcap possibly mucking with LD_LIBRARY_PATH in an attempt to avoid escalation -- but my skillset didn't let me dig deeper into that angle.
Building statically did clear things up. So that is another possible workaround.
I don't mind building go from source and giving that ago. What branch should I build from?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 22, 2020

Either release-branch.go1.15 with https://golang.org/cl/257982 applied on top, or just plain master. Thanks.

@Radisovik
Copy link
Author

@Radisovik Radisovik commented Oct 23, 2020

Either release-branch.go1.15 with https://golang.org/cl/257982 applied on top, or just plain master. Thanks.

Master does indeed rectify the problem.

Is there anything else I can do to help out? It is very puzzling how os/user works into things..

It looks like our workarounds are:

  1. static link
  2. wait for 1.16
  3. stop using os/user and any code that also uses os/user (shell out for whoami..etc..)
  4. don't use setcap.. instead forward traffic with some port forwarding magic
@prattmic
Copy link
Member

@prattmic prattmic commented Oct 23, 2020

FWIW, I dug into the kernel a bit and verified that it does unconditionally include the VDSO auxv entry, so it shouldn't be missing at the time of exec.

It seems most likely then that the dynamic linker would be removing / changing the entry, though I've never heard of such behavior (some sort of "protection" for "setuid" binaries?). Do you know which dynamic linker your system is using? I looked at glibc but didn't see anything along these lines at a quick glance.

This would align with importing "os/user", as that will cause the program to become dynamically linked by default.

@prattmic
Copy link
Member

@prattmic prattmic commented Oct 23, 2020

And to your question, it seems like static linking (by disabling cgo, I assume that is what you mean?) seems like the cleanest workaround to me, provided the pure Go versions of "os/user" and "net" are sufficient for your application.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 23, 2020

Since this will be fixed in the upcoming Go 1.16 release, I'm going to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants