Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STRMCMP-675] Enable status subresource in CRD #126

Merged
merged 8 commits into from
Nov 13, 2019

Conversation

mwylde
Copy link
Contributor

@mwylde mwylde commented Nov 13, 2019

This PR enables the status subresource for the FlinkApplication CRD. This makes it harder for users to accidentally overwrite status fields while trying to update their applications (an issue we've seen internally). For CRDs with the status subresource enabled, updates to the base resource will ignore changes to the status subrecord, and updates the status subresource will ignore changes to the spec and metadata.

The bulk of this PR is a set of changes that enables that one. Previously, the operator would modify the SavepointInfo field of the spec during updates. This goes against the general principle that the user should own the spec, and prevented a move to the status subresource as the operator would need to update both the spec and status.

Instead, we introduce a new field spec.savepointPath that contains the savepoint that will be restored on the initial job submission (the existing spec.savepointInfo.savepointLocation is retained for backwards-compatibility, but will be removed in a future CRD version). Then we introduce two new fields on the status: status.savepointPath (which contains the path of the savepoint that the operator has created during the update process) and status.savepointTrigger (which contains the trigger for the savepoint the operator is creating). This allows us to cleanly separate out user intent (which savepoint should we initially start from) from the operator's internal state and processes.

Note: this change is slightly incompatible with the existing operator. This update should not be deployed to a cluster where there are active flinkapplication updates occurring — i.e., all flinkapplications should be in a Running or DeployFailed state.

@kumare3
Copy link

kumare3 commented Nov 13, 2019

You might want to make sure you support both, as older versions do no support the status subresource

@mwylde
Copy link
Contributor Author

mwylde commented Nov 13, 2019

Thanks for calling that out. We currently document support to 1.9 but we don't really have a way to validate below 1.10, and without testing I suspect we've introduced incompatibilities already. There also doesn't seem to be a good way to support both (especially while keeping semantics equivalent) and other operators, like the Spark-On-K8s-Operator have made the same decision to require it.

The status subresource was introduced in 1.10 (behind a feature flag), so I've updated the README with that as our minimum supported version. I will also make sure to document this change in the release notes when we release 0.4.

@anandswaminathan
Copy link
Contributor

Thanks a lot. This is super awesome.

Just one question: Is this backward compatible, as in with existing applications running, and we deploy this, will there be any issues? I believe applications in Running state will be fine but ones in Savepointing may be affected - as TriggerID is removed from spec. Is that the only scenario ?

@mwylde
Copy link
Contributor Author

mwylde commented Nov 13, 2019

Yeah, the upgrade should not cause issues so long as no deploys are in progress.

@mwylde mwylde merged commit f499e7f into master Nov 13, 2019
@mwylde mwylde deleted the micah_status_subresource branch November 13, 2019 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants