Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic happened when add podgroup watch #1679

Closed
Crazybean-lwb opened this issue Nov 1, 2022 · 0 comments · Fixed by #1680
Closed

panic happened when add podgroup watch #1679

Crazybean-lwb opened this issue Nov 1, 2022 · 0 comments · Fixed by #1680

Comments

@Crazybean-lwb
Copy link

Crazybean-lwb commented Nov 1, 2022

kubeadm version: v1.20.11
k8s version: 1.20.11
I have build training-operator image with this merged pr.
pytorch job can use gang-scheduler, now. However nil pointer problem happened, when training controller build or watch pytorch jobs .
I have detected problem happened, when log event object in https://github.com/kubeflow/training-operator/blob/master/pkg/common/util/reconciler.go#L113
Some exception may happened.


errors as follows:

time="2022-10-31T11:55:15Z" level=info msg="PyTorchJob=ddp-vol, ReplicaType=Worker expected=1, running=1, succeeded=0, failed=0, Replicas=1"
E1031 11:56:00.164678       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 796 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x185a700?, 0x2a0c1a0})
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000064150?})
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/runtime/runtime.go:49 +0x75
panic({0x185a700, 0x2a0c1a0})
	/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/sirupsen/logrus.(*Entry).Logf(0xc0003dca00?, 0x1213940?, {0x1ade9f4?, 0xc000f92050?}, {0xc000d29b48?, 0xc00003c000?, 0x47?})
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:337 +0x22
github.com/sirupsen/logrus.(*Entry).Debugf(...)
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:347
github.com/kubeflow/training-operator/pkg/common/util.OnDependentUpdateFunc.func1({{0x1d5b0d0?, 0xc00045e000?}, {0x1d5b0d0?, 0xc000624d80?}})
	/workspace/pkg/common/util/reconciler.go:113 +0x4fd
sigs.k8s.io/controller-runtime/pkg/predicate.Funcs.Update(...)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/predicate/predicate.go:88
sigs.k8s.io/controller-runtime/pkg/source/internal.EventHandler.OnUpdate({{0x1d445a0, 0xc0002c5c20}, {0x1d4d7c8, 0xc00099c1a0}, {0xc0008a0d70, 0x1, 0x1}}, {0x1a6fb40?, 0xc00045e000}, {0x1a6fb40, ...})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/source/internal/eventsource.go:88 +0x43d
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/shared_informer.go:816 +0xf7
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00046ea20?)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007bf738?, {0x1d2b040, 0xc000722c00}, 0x1, 0xc00091a540)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00046ed50?, 0x3b9aca00, 0x0, 0x90?, 0xc0005a0420?)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:90
k8s.io/client-go/tools/cache.(*processorListener).run(0xc00036c280?)
	/go/pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/shared_informer.go:812 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:73 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:71 +0x85
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14eb0c2]

goroutine 796 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000064150?})
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x185a700, 0x2a0c1a0})
	/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/sirupsen/logrus.(*Entry).Logf(0xc0003dca00?, 0x1213940?, {0x1ade9f4?, 0xc000f92050?}, {0xc000d29b48?, 0xc00003c000?, 0x47?})
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:337 +0x22
github.com/sirupsen/logrus.(*Entry).Debugf(...)
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:347
github.com/kubeflow/training-operator/pkg/common/util.OnDependentUpdateFunc.func1({{0x1d5b0d0?, 0xc00045e000?}, {0x1d5b0d0?, 0xc000624d80?}})
	/workspace/pkg/common/util/reconciler.go:113 +0x4fd
sigs.k8s.io/controller-runtime/pkg/predicate.Funcs.Update(...)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/predicate/predicate.go:88
sigs.k8s.io/controller-runtime/pkg/source/internal.EventHandler.OnUpdate({{0x1d445a0, 0xc0002c5c20}, {0x1d4d7c8, 0xc00099c1a0}, {0xc0008a0d70, 0x1, 0x1}}, {0x1a6fb40?, 0xc00045e000}, {0x1a6fb40, ...})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/source/internal/eventsource.go:88 +0x43d
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/shared_informer.go:816 +0xf7
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00046ea20?)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007bf738?, {0x1d2b040, 0xc000722c00}, 0x1, 0xc00091a540)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00046ed50?, 0x3b9aca00, 0x0, 0x90?, 0xc0005a0420?)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:90
k8s.io/client-go/tools/cache.(*processorListener).run(0xc00036c280?)
	/go/pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/shared_informer.go:812 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:73 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:71 +0x85

Originally posted by @liuweibin6566396837 in #1666 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant