New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add annotations to perform connector/task restart operations #4114
feat: Add annotations to perform connector/task restart operations #4114
Conversation
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. This looks good to me. I left one question. You should also add it to the CHANGELOG.md under 0.21.0 release. And it would be also great to have at least some very simple doc to know how should this be used.
log.warn("{}: Failed to restart connector {}. {}", reconciliation, connectorName, throwable.getMessage()); | ||
return Future.succeededFuture(); | ||
}) | ||
.compose(ignored -> apiClient.statusWithBackOff(new BackOff(200L, 2, 10), host, port, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why do we call this at the end. Could you please explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing because ultimately maybeCreateOrUpdateConnector
needs to return the status. But it would probably be better to make pauseResume
, maybeRestartConnector
and maybeRestartConnectorTask
all return Future<Void>
and make the call the apiClient.statusWithBackOff
directly from maybeCreateOrUpdateConnector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is why I wondered about this. If it is about the status for the KafkaConnector status field, it should be probably called only at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review - Tom is correct here & I have updated to move the apiClient.statusWithBackOff
call to the end of the compose chains.
* @return the ID of the task to be restarted if the provided KafkaConnector resource instance has the strimzio.io/restart-task annotation or -1 otherwise. | ||
*/ | ||
protected int getRestartTaskAnnotationTaskID(CustomResource resource, String connectorName) { | ||
return Annotations.intAnnotation(resource, ANNO_STRIMZI_IO_RESTART_TASK, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should support a list of ids. It would make it simpler to restart several failed tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I have started looking into this - it should be possible to restart tasks on a KafkaConnector
resource using an annotation like: strimzi.io/restart-tasks: "0 1 2"
, although it does raise some questions:
- what should happen if one of the
restart
calls fails (e.g. maybe there is no task2
)? Should the annotation be altered to only include the task IDs that were not restarted, so that they can be attempted on the next reconcile? - what should the annotation look like for
KafkaMirrorMaker2
resources? The code currently supports annotations that address a single MM2 connector task like:strimzi.io/restart-connector-tasks: "<MM2-connector-name>:<task-id>"
. Do you think it should allow multiple MM2 connectors to be specified? Or just multiple tasks for a single connector? For examplestrimzi.io/restart-connector-tasks: "<MM2-connector-name>:0 1 2"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good points. I guess it might be a good idea to leave this idea for now and see how this stuff is used in practice. It seems like it would be easy enough to add at a later date.
log.warn("{}: Failed to restart connector {}. {}", reconciliation, connectorName, throwable.getMessage()); | ||
return Future.succeededFuture(); | ||
}) | ||
.compose(ignored -> apiClient.statusWithBackOff(new BackOff(200L, 2, 10), host, port, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing because ultimately maybeCreateOrUpdateConnector
needs to return the status. But it would probably be better to make pauseResume
, maybeRestartConnector
and maybeRestartConnectorTask
all return Future<Void>
and make the call the apiClient.statusWithBackOff
directly from maybeCreateOrUpdateConnector
.
eefc53e
to
40ab8ed
Compare
Latest commit adds the CHANGELOG update & some initial documentation. It also updates the code to move the |
.compose(ignored -> removeRestartTaskAnnotation(reconciliation, resource), | ||
throwable -> { | ||
// Ignore restart failures - just try again on the next reconcile | ||
log.warn("{}: Failed to restart connector task {}:{}. {}", reconciliation, connectorName, taskID, throwable.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have my doubts about this, particularly in the case that the connector task doesn't exist (i.e. bad id, presumably an http 404). If it's only in the logs then the user might not realise why the annotation wasn't removed. For transient errors it will likely succeed, so leaving it in the annotation for another try is not a bad idea. But for permanent errors like non-existence its more of a problem, especially if the user doesn't have access to the logs. I think we should propagate the error as a warning in the status.
The error case doesn't seem to be covered by the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit adds warning status conditions to the CR when the restart or restartTask REST calls fail. Have also added failure test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit. But LGTM otherwise.
e3717e2
to
456fe3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
(Just FYI: The failing regression test is not related to this PR) |
@@ -476,28 +480,34 @@ protected KafkaConnectApi getKafkaConnectApi() { | |||
* @param apiClient The client instance. | |||
* @param connectorName The connector name. | |||
* @param connectorSpec The desired connector spec. | |||
* @param resource The resource that defines the connector. | |||
* @return A Future whose result, when successfully completed, is a map of the current connector state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs amending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
. To restart the connector, annotate the `KafkaConnector` resource in Kubernetes. | ||
For example, using `kubectl annotate`: | ||
[source,shell,subs=+quotes] | ||
kubectl annotate KafkaConnector _KafkaConnector-name_ strimzi.io/restart=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be restart-connector
, so it's consistent with restart-task
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar questions around this were raised in the proposal discussion:
- Add connector and connector task restart proposal proposals#8 (comment)
- Add connector and connector task restart proposal proposals#8 (comment)
- Add connector and connector task restart proposal proposals#8 (comment)
For KafkaConnnector
this PR has the 2 annotations which apply to the resource they are being applied to:
strimzi.io/restart=true
strimzi.io/restart-task=<TASK-ID>
And for KafkaMirrorMaker2
this PR supports annotations that refer to a single connector or task amongst potentially many that are managed by the resource:
strimzi.io/restart-connector=<MM2-CONNECTOR-NAME>
strimzi.io/restart-connector-task=<MM2-CONNECTOR-NAME>:<TASK-ID>
With this style it's clear which annotations apply to which resource (and which connector/task is being restarted).
If people think it would be clearer, I am fine to change this to use the same annotations for both KafkaConnnector
and KafkaMirrorMaker2
, - e.g:
For KafkaConnnector
:
strimzi.io/restart-connector=true
strimzi.io/restart-connector-task=<TASK-ID>
For KafkaMirrorMaker2
:
strimzi.io/restart-connector=<MM2-CONNECTOR-NAME>
strimzi.io/restart-connector-task=<MM2-CONNECTOR-NAME>:<TASK-ID>
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the second option, even though it means the values are different between KafkaConnector and KMM2 while the keys are the same. WDYT @ppatierno ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, I think I prefer the first option but I have not a strong opinion tbh. Both of them are valid. The doubt on the second option is around the fact that it could be confusing for users to see the same annotation names but getting different values (bool vs string) even if in two different resources. Anyway, I am fine with both, in the end it's a matter of documenting the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit and a question, but otherwise lgtm.
456fe3e
to
274a8cc
Compare
@tombentley @ajborley Is there still something to do on this PR? Or can it be merged now? From the last comment it is not completely clear to me if there is still something to do here or not. |
There may be another commit to use the same annotation keys for both KafkaConnectors and KafkaMirrorMaker2s - I was waiting for @ppatierno to respond to Tom’s latest comment (#4114 (comment)) as similar thoughts were discussed in the proposal PR: |
@ppatierno Please have a look ^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the comment I left, the PR looks good to me anyway.
We have one vote for using the same annotation keys and one vote for leaving the PR as-is with different annotation keys for KafkaConnector and KafkaMirrorMaker2 resources. I don’t have strong feelings either way! |
- This commit implements the first part of the 'Restarting Kafka Connect connectors and tasks' proposal (https://github.com/strimzi/proposals), adding new annotations that cause the operator to restart connectors or tasks. The annotations can be applied to the KafkaConnector, and the KafkaMirrorMaker2 custom resources. The annotation acts as a trigger for a single restart call by the operator, and is removed from the CR when the restart REST API call is successfully called. Signed-off-by: Andrew Borley <borley@uk.ibm.com>
- Add doc, CHANGELOG - Reduce calls to connector status. Signed-off-by: Andrew Borley <borley@uk.ibm.com>
Signed-off-by: Andrew Borley <borley@uk.ibm.com>
- Also fix accidental doc deletion. Signed-off-by: Andrew Borley <borley@uk.ibm.com>
Signed-off-by: Andrew Borley <borley@uk.ibm.com>
274a8cc
to
d81b208
Compare
I talked about it with Tom and Paolo and I guess it is fine as it is. I rebased it to solve the conflict and if the tests pass I will merge it. Great PR ... thanks. |
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
Thanks for the PR. |
@scholzj was it included in 0.21 release? thanks! |
I will include it if we find some bugs to do RC2. But if the RC1 is released as GA it will be in 0.22. |
I think end-of-February is still the current plan. We need to finish the preparation for the CRDv1 ... I think once that is ready we will start with the release.. |
Type of change
Description
This PR implements the first part of the 'Restarting Kafka Connect connectors and tasks' proposal (https://github.com/strimzi/proposals/blob/master/007-restarting-kafka-connect-connectors-and-tasks.md),
adding new annotations that cause the operator to restart connectors or tasks. The annotations can be applied to the KafkaConnector, and the KafkaMirrorMaker2 custom resources. The annotation acts as a trigger for a single restart call
by the operator, and is removed from the CR when the restart REST API call is successfully called.
Checklist
Please go through this checklist and make sure all applicable tasks have been done