Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to resubmit job when job parameters change #85

Closed
clouddra opened this issue Aug 30, 2021 · 1 comment · Fixed by #107
Closed

Fail to resubmit job when job parameters change #85

clouddra opened this issue Aug 30, 2021 · 1 comment · Fixed by #107

Comments

@clouddra
Copy link

clouddra commented Aug 30, 2021

Hi, I am getting errors in the operator pod when trying to upgrade a job with different parameters

Observed job submitter	{"cluster": <cluster name>, "state": "nil"}
Observed job submitter pod list	{"cluster":  <cluster name>, "state": {"metadata":{},"items":[]}}
Failed to extract job submit result	{"cluster": "<cluster name>", "error": "job pod found, but no termination log found even though submission completed"}
github.com/spotify/flink-on-k8s-operator/controllers.(*ClusterStateObserver).observe
	/workspace/controllers/flinkcluster_observer.go:206
github.com/spotify/flink-on-k8s-operator/controllers.(*FlinkClusterHandler).reconcile
	/workspace/controllers/flinkcluster_controller.go:137
github.com/spotify/flink-on-k8s-operator/controllers.(*FlinkClusterReconciler).Reconcile
	/workspace/controllers/flinkcluster_controller.go:81
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:214

I have observed whenever I try upgrading a job cluster with flinkProperties or job volume mount changes.

The previous submitter pods are terminated and no new submitter pods are spun up. The previous running job is also cancelled with no new job replacing it.

Would appreciate some pointers on resolving this issue. Thanks!

@clouddra clouddra changed the title Fail to submit job when job parameters change Fail to resubmit job when job parameters change Aug 30, 2021
@clouddra
Copy link
Author

clouddra commented Aug 30, 2021

Seems like this is a regression that only happens in the latest release. Works fine in v0.1.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant