Skip to content

1.3 crash picks#10526

Closed
ncdc wants to merge 9 commits intoopenshift:masterfrom
ncdc:1.3-crash-picks
Closed

1.3 crash picks#10526
ncdc wants to merge 9 commits intoopenshift:masterfrom
ncdc:1.3-crash-picks

Conversation

Andy Goldstein added 9 commits August 18, 2016 22:13
Protect access of the original writer. Panics if anything has wrote
into the original writer or the writer is hijacked when times out.
The Pop method should allow a caller to requeue an item while under the
fifo lock, to avoid races on deletes.
…ed as read-only.

Send a node event when this happens and hint to the administrator
about the remediation.
Previously it was trying to use a nil pod variable if error was returned
from the pod update call.

(note that the rest of 26680 is unit test related, and not picked here)
@ncdc
Copy link
Copy Markdown
Contributor Author

ncdc commented Aug 19, 2016

[test]

@eparis eparis mentioned this pull request Aug 19, 2016
@ingvagabund
Copy link
Copy Markdown
Member

ingvagabund commented Aug 19, 2016

With respect to the origin master HEAD (e5178ec):

Only this one is not cherry-picked yet, the rest from Eric'c list is:

@openshift-bot
Copy link
Copy Markdown
Contributor

Evaluated for origin test up to b6e8f68

@ncdc
Copy link
Copy Markdown
Contributor Author

ncdc commented Aug 19, 2016

25308 isn't a clean cherry-pick because the history command has changed after 1.3. But we probably could massage it into our code.

@openshift-bot
Copy link
Copy Markdown
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8220/)

@derekwaynecarr
Copy link
Copy Markdown
Member

All the picks look good to me. The pick for 29672 is good, but I could not see what underlying issue it fixed, seemed like a change to just code hygiene since there was no reference to a material panic it prevented.

@ingvagabund
Copy link
Copy Markdown
Member

$ GOPATH=/root go test -race -cover -covermode atomic -timeout 120s github.com/openshift/origin/pkg/cmd/server/origin
I0819 14:39:24.137802 7405 :1] &{dev [wheel group-impersonater] map[]} is acting as &{system:admin [some-group] map[]}
I0819 14:39:24.138378 7405 :1] &{dev [wheel system-group-impersonater] map[]} is acting as &{system:admin [some-system:group] map[]}
I0819 14:39:24.138841 7405 :1] &{dev [wheel] map[]} is acting as &{system:admin [system:authenticated] map[]}
I0819 14:39:24.139295 7405 :1] &{dev [regular-impersonater] map[]} is acting as &{tester [system:authenticated system:authenticated:oauth] map[]}
I0819 14:39:24.141462 7405 :1] &{dev [sa-impersonater] map[]} is acting as &{system:serviceaccount:foo:default [system:serviceaccounts system:serviceaccounts:foo system:authenticated] map[]}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x91fc39]

goroutine 111 [running]:
panic(0x392bca0, 0xc82000e0f0)
/usr/lib/golang/src/runtime/panic.go:481 +0x3ff
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/unversioned.(_limitRanges).List(0xc82039d220, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/unversioned/limit_ranges.go:56 +0x109
github.com/openshift/origin/pkg/image/admission.NewLimitVerifier.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3f7e318, ...)
/root/src/github.com/openshift/origin/pkg/image/admission/imagestream_limits.go:23 +0xbb
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_ListWatch).List(0xc82019ebe0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/listwatch.go:80 +0x86
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_Reflector).ListAndWatch(0xc82069c680, 0xc820058d20, 0x0, 0x0)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:258 +0x3d0
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_Reflector).Run.func1()
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:202 +0x54
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc82019ec20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:84 +0x2b
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc82019ec20, 0x3b9aca00, 0x0, 0x445101, 0xc820058d20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:85 +0xca
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.Until(0xc82019ec20, 0x3b9aca00, 0xc820058d20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:47 +0x51
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).Run
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:205 +0x327
FAIL github.com/openshift/origin/pkg/cmd/server/origin 0.287s

@ingvagabund
Copy link
Copy Markdown
Member

So "UPSTREAM: 29743: Fix race condition found in JitterUntil" is the culprit

@ingvagabund
Copy link
Copy Markdown
Member

Putting the runtime.HandleCrash() gets rid of the panic. Which was removed due to kubernetes/kubernetes#29743 (diff): "Remove this function. We are switching to a world where it's safe for apiserver to panic, since it will be restarted by kubelet."

Works for containerized world. So either put back the call or skip the cherry-pick as @derekwaynecarr suggests.

@timothysc
Copy link
Copy Markdown

timothysc commented Aug 19, 2016

Isn't this actually exposing a real issue though? /cc @liggitt

@timothysc
Copy link
Copy Markdown

@ingvagabund Given where we are at in the release cycle think putting it back in makes sense, but imho we need to re-eval in the next cycle issues we are actually glossing over.

@derekwaynecarr
Copy link
Copy Markdown
Member

@timothysc - I agree it could cover a real issue, I have decided to carry a patch that keeps handle crash, and we can open an issue to evaluate that carry after release.

see #10541

@derekwaynecarr
Copy link
Copy Markdown
Member

closing in favor of #10541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants