Fail to resubmit job when job parameters change #85

clouddra · 2021-08-30T04:41:04Z

Hi, I am getting errors in the operator pod when trying to upgrade a job with different parameters

Observed job submitter	{"cluster": <cluster name>, "state": "nil"}
Observed job submitter pod list	{"cluster":  <cluster name>, "state": {"metadata":{},"items":[]}}
Failed to extract job submit result	{"cluster": "<cluster name>", "error": "job pod found, but no termination log found even though submission completed"}
github.com/spotify/flink-on-k8s-operator/controllers.(*ClusterStateObserver).observe
	/workspace/controllers/flinkcluster_observer.go:206
github.com/spotify/flink-on-k8s-operator/controllers.(*FlinkClusterHandler).reconcile
	/workspace/controllers/flinkcluster_controller.go:137
github.com/spotify/flink-on-k8s-operator/controllers.(*FlinkClusterReconciler).Reconcile
	/workspace/controllers/flinkcluster_controller.go:81
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:214

I have observed whenever I try upgrading a job cluster with flinkProperties or job volume mount changes.

The previous submitter pods are terminated and no new submitter pods are spun up. The previous running job is also cancelled with no new job replacing it.

Would appreciate some pointers on resolving this issue. Thanks!

The text was updated successfully, but these errors were encountered:

clouddra · 2021-08-30T16:27:18Z

Seems like this is a regression that only happens in the latest release. Works fine in v0.1.12

clouddra changed the title ~~Fail to submit job when job parameters change~~ Fail to resubmit job when job parameters change Aug 30, 2021

elanv mentioned this issue Sep 25, 2021

Elaborate savepoint and update features #107

Merged

7 tasks

pjthepooh mentioned this issue Oct 13, 2021

Nil pointer exception during update when jobsubmitter pod is lost #130

Closed

regadas closed this as completed in #107 Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to resubmit job when job parameters change #85

Fail to resubmit job when job parameters change #85

clouddra commented Aug 30, 2021 •

edited

clouddra commented Aug 30, 2021 •

edited

Fail to resubmit job when job parameters change #85

Fail to resubmit job when job parameters change #85

Comments

clouddra commented Aug 30, 2021 • edited

clouddra commented Aug 30, 2021 • edited

clouddra commented Aug 30, 2021 •

edited

clouddra commented Aug 30, 2021 •

edited