Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NTP sync causing periodic crashes #143

Closed
monkeini opened this issue May 29, 2018 · 10 comments
Closed

NTP sync causing periodic crashes #143

monkeini opened this issue May 29, 2018 · 10 comments

Comments

@monkeini
Copy link

NTP sync was introduced 0.1.0 via #83

Since upgrading to 0.1.0+ (including latest version) we see periodically the following:

F0529 18:25:05.766404 1 server.go:44] read udp [--IP REDACTED--]->128.138.141.172:123: i/o timeout goroutine 27 [running]: github.com/appscode/guard/vendor/github.com/golang/glog.stacks(0xc420339900, 0xc42019a320, 0x69, 0xa0) /go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:766 +0xcf github.com/appscode/guard/vendor/github.com/golang/glog.(*loggingT).output(0x22972e0, 0xc400000003, 0xc4200ad4a0, 0x21e0a82, 0x9, 0x2c, 0x0) /go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:717 +0x322 github.com/appscode/guard/vendor/github.com/golang/glog.(*loggingT).printDepth(0x22972e0, 0xc400000003, 0x1, 0xc420b04f98, 0x1, 0x1) /go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:646 +0x12a github.com/appscode/guard/vendor/github.com/golang/glog.(*loggingT).print(0x22972e0, 0xc400000003, 0xc420b04f98, 0x1, 0x1) /go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:637 +0x5a github.com/appscode/guard/vendor/github.com/golang/glog.Fatal(0xc420b04f98, 0x1, 0x1) /go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:1125 +0x53 github.com/appscode/guard/server.Server.ListenAndServe.func1(0xc420526480, 0xc420366900) /go/src/github.com/appscode/guard/server/server.go:44 +0xf1 created by github.com/appscode/guard/server.Server.ListenAndServe /go/src/github.com/appscode/guard/server/server.go:41 +0xe1b

I note via host 128.138.141.172 this is utcnist2.colorado.edu.

I see the default for clock-check-interval is five minutes. Our errors are not that frequent, not are they regular, so I don't think this can be put down to an overly restrictive security group on our part. It happens approximately 10-20 times per week.

Ideally - i/o timeouts should be dealt with, rather than any error here causing a fatal exception.

Alternatively - a simple way to turn this off should be provided. Best I can see right now is to make clock-check-interval big.

@thomaspeitz
Copy link
Contributor

thomaspeitz commented May 30, 2018

Same happening here -

F0530 09:30:28.518536       1 server.go:44] read udp [--IP REDACTED--]:48306->54.229.222.210:123: i/o timeout
goroutine 48 [running]:
github.com/appscode/guard/vendor/github.com/golang/glog.stacks(0xc4203a7f00, 0xc4201a0320, 0x6a, 0xa0)
	/go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:766 +0xcf
github.com/appscode/guard/vendor/github.com/golang/glog.(*loggingT).output(0x22972e0, 0xc400000003, 0xc4200b3550, 0x21e0a82, 0x9, 0x2c, 0x0)
	/go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:717 +0x322
github.com/appscode/guard/vendor/github.com/golang/glog.(*loggingT).printDepth(0x22972e0, 0xc400000003, 0x1, 0xc4204a7f98, 0x1, 0x1)
	/go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:646 +0x12a
github.com/appscode/guard/vendor/github.com/golang/glog.(*loggingT).print(0x22972e0, 0xc400000003, 0xc4204a7f98, 0x1, 0x1)
	/go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:637 +0x5a
github.com/appscode/guard/vendor/github.com/golang/glog.Fatal(0xc4204a7f98, 0x1, 0x1)
	/go/src/github.com/appscode/guard/vendor/github.com/golang/glog/glog.go:1125 +0x53
github.com/appscode/guard/server.Server.ListenAndServe.func1(0xc420ac3ac0, 0xc4206d2320)
	/go/src/github.com/appscode/guard/server/server.go:44 +0xf1
created by github.com/appscode/guard/server.Server.ListenAndServe
	/go/src/github.com/appscode/guard/server/server.go:41 +0xe1b

host is in my case - mail.thefrown.net

On production restarts happen quite often - Yesterday 33x times. I will try to increase clock-check-interval and see what happens.

EDIT: Did not help...

@monkeini
Copy link
Author

@tsupertramp that host does not look NTP-related? I also note that anonymous statistics are sent unless you turn off with --analytics=false. Might be worth trying in combination?

@tamalsaha
Copy link
Contributor

#144 will increase max allowed skew to 2 min and checks to every 10 mins. If you want to disable it, set it to --max-clock-skew=0 .

@thomaspeitz
Copy link
Contributor

guard-56c647654-vg25n 1/1 Running 32 5h

32 restarts in 5h on two different clusters seems pretty hight:

      - args:
        - run
        - --v=3
        - --tls-ca-file=/etc/guard/pki/ca.crt
        - --tls-cert-file=/etc/guard/pki/tls.crt
        - --tls-private-key-file=/etc/guard/pki/tls.key
        - --auth-providers=github
        - --max-clock-skew=0m
        - --clock-check-interval=10m
        - --analytics=false
        image: appscode/guard:0.1.2

@tamalsaha
Copy link
Contributor

tamalsaha commented Jun 4, 2018

@tsupertramp , have you tried setting the --max-clock-skew=0 to disable this check? When this restart happens, have you logged into the host machine and checked if the clock was actually skewed or not?

@thomaspeitz
Copy link
Contributor

Time seems to look good.

Ah sorry i set it accidentally to 0m instead of 0 - Lets see what this brings.

Appreciate your fast response time!

@thomaspeitz
Copy link
Contributor

thomaspeitz commented Jun 4, 2018

Happened again - F0604 13:52:07.845624 1 server.go:44] Time skew between NTP(2018-06-04 13:52:07.845755747 +0000 UTC m=+600.050490959) & machine(2018-06-04 13:52:07.845584735 +0000 UTC m=+600.050319810) exceedes limit - with --clock-check-interval="10m0s" and --max-clock-skew="0s" in logs - which was configured with - --max-clock-skew=0

@tamalsaha
Copy link
Contributor

Oh! Sorry, I misspoke. You need to set --clock-check-interval=0 to disable.

See here:

@thomaspeitz
Copy link
Contributor

Running 4h without problems. - Seems to "fix" it. Thanks a lot.

@tamalsaha
Copy link
Contributor

Fixed by https://github.com/appscode/guard/releases/tag/0.1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants