Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nil pointer exception during update when jobsubmitter pod is lost #130

Closed
pjthepooh opened this issue Oct 13, 2021 · 6 comments
Closed

Nil pointer exception during update when jobsubmitter pod is lost #130

pjthepooh opened this issue Oct 13, 2021 · 6 comments

Comments

@pjthepooh
Copy link
Contributor

When updating a chart, job submitter pod will be terminated and then flink operator will encounter a nil pointer exception and crash. This doesn't happen in v0.1.15 (though it has another issue at update), and seems to be introduced at v0.1.16. From the trace it looks related to https://github.com/spotify/flink-on-k8s-operator/pull/120/files#diff-1d53f2b19e0186f6437a431a19916402d6c59a531c7a959981e7e7305e342677R304 @regadas

{"level":"info","ts":1634156619.167147,"logger":"controllers.FlinkCluster","msg":"Failed to extract job submit result","cluster":"myCluster","error":"failed to get logs for pod : resource name may not be empty"}
{"level":"info","ts":1634156619.1740642,"logger":"controllers.FlinkCluster","msg":"Observed Flink job status list","cluster":"myCluster","jobs":[{"jid":"8f14e333b9720528b574012c77a8175d","state":"CANCELED","name":"myApp","start-time":1634156476498,"end-time":1634156541413,"duration":64915}]}

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x142da9b]

goroutine 403 [running]:
github.com/spotify/flink-on-k8s-operator/controllers.(*ClusterStateObserver).observeFlinkJobStatus(0xc000d358e8, 0xc000d35c98, 0xc000d35610)
	/workspace/controllers/flinkcluster_observer.go:304 +0x69b
github.com/spotify/flink-on-k8s-operator/controllers.(*ClusterStateObserver).observeJob(0xc000d358e8, 0xc000d35c98)
	/workspace/controllers/flinkcluster_observer.go:265 +0x3eb
github.com/spotify/flink-on-k8s-operator/controllers.(*ClusterStateObserver).observe(0xc000d358e8, 0xc000d35c98)
	/workspace/controllers/flinkcluster_observer.go:208 +0xdc5
github.com/spotify/flink-on-k8s-operator/controllers.(*FlinkClusterHandler).reconcile(0xc000d35c28, {0x17b2a1d, 0x25dbd20}, {{{0xc0002780d8, 0x203000}, {0xc000278120, 0xc0006aa540}}})
	/workspace/controllers/flinkcluster_controller.go:142 +0x285
github.com/spotify/flink-on-k8s-operator/controllers.(*FlinkClusterReconciler).Reconcile(0xc0008f6a80, {0x19c2478, 0xc00031b0e0}, {{{0xc0002780d8, 0x11}, {0xc000278120, 0x11}}})
	/workspace/controllers/flinkcluster_controller.go:84 +0x309
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00090caa0, {0x19c23d0, 0xc000574000}, {0x162e8c0, 0xc000aa6400})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:298 +0x303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00090caa0, {0x19c23d0, 0xc000574000})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:214 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.6/pkg/internal/controller/controller.go:210 +0x354
@regadas
Copy link
Contributor

regadas commented Oct 14, 2021

Thanks for the report! there's indeed a bug that I missed I could fix it but #107 should address this and #85.

@pjthepooh
Copy link
Contributor Author

pjthepooh commented Oct 15, 2021

@regadas #107 has merged along with v0.2.0 release. Has this been fixed there as well?

Is there a dev channel (e.g. slack) for this project I can join? We are actively using this operator

@elanv
Copy link
Contributor

elanv commented Oct 16, 2021

@regadas #107 has merged along with v0.2.0 release. Has this been fixed there as well?

Is there a dev channel (e.g. slack) for this project I can join? We are actively using this operator

@pjthepooh I created the PR, and I think it resovles many issues including this.

@pjthepooh
Copy link
Contributor Author

This issue seems to disappear with release v0.2.0.

@regadas
Copy link
Contributor

regadas commented Oct 21, 2021

Hi @pjthepooh sorry missed this one!

Yes this issue is addressed now in the v0.2.0. Use v0.2.2 as it addresses some regressions.

Is there a dev channel (e.g. slack) for this project I can join?

Yes! I just created one #flink-operator on the Spotify FOSS Slack. You can get an invite here. I'll update the docs with this.

We are actively using this operator

Can you update this? https://github.com/spotify/flink-on-k8s-operator/tree/master/docs/who_is_using.md

Thanks!

@regadas regadas closed this as completed Oct 21, 2021
@synhershko
Copy link

Looks like this is still happening? #160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants