-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/auto restart connectors #7500
Feat/auto restart connectors #7500
Conversation
Can one of the admins verify this patch? |
87fb664
to
68d9a21
Compare
68d9a21
to
b75fd15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and sorry it took so long to get to it. I left some comments which we need to look into. Also, the tests failed => I restarted them in case it were just some flaky tests, but they seeemd related.
@EqualsAndHashCode(callSuper = true) | ||
public class KafkaMirrorMaker2ConnectorSpec extends AbstractConnectorSpec { | ||
private static final long serialVersionUID = 1L; | ||
|
||
private AutoRestart autoRestartConnectorsAndTasks = new AutoRestart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we deviate here from the proposal. The proposal suggests to have it once per KafkaMirrorMaker2
CR - so that would be added to KafkaMirrorMaker2Spec
class. But here you put it per connector.
In general, I do not have problem with that. But:
- It should be discussed and made clear since it deviates from the approved proposal
- If this is the way we want to go, then you can make the code simple and just have the
directly in the
autoRestart: enabled: false
AbstractConnectorSpec
and use the same APi for both of them.
CC @strimzi/maintainers @ajborley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a problem with that either - I think it's unlikely that a user would want to switch autoRestart off for one MM2 connector and not the others, but at least this way they have the option.
(thanks for tagging me in here Jakub, and thanks @ThomasDangleterre for picking this up! :) )
@@ -824,6 +883,13 @@ protected JsonObject asJson(KafkaConnectorSpec spec, KafkaConnectorConfiguration | |||
return connectorConfigJson.put("connector.class", spec.getClassName()); | |||
} | |||
|
|||
protected Future<KafkaConnectorStatus> getPreviousKafkaConnectorStatus(CrdOperator<KubernetesClient, KafkaConnector, KafkaConnectorList> resourceOperator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we got the resource with its latest status on the beginning? Can't we use that instead of querying the API server again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I saw, Kafka Connector statuses are recreated from scratch at each reconcile.
We get connector and tasks statuses from Connect API before auto restarting but it is lacking the auto restart status as it is stored in the resource and the status of the resource is null when we call maybeCreateOrUpdateConnector
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are. But on the beginning, there is always a full KafkaConnector
resource including the status. I'm not sure how easy or hard would it be to pass it around (currently, I think only the .spec
part is passed around) -> maybe it would not be possible because of the duality with the Mirror Maker 2 connectors.
...tor/src/main/java/io/strimzi/operator/cluster/operator/assembly/AbstractConnectOperator.java
Outdated
Show resolved
Hide resolved
...tor/src/main/java/io/strimzi/operator/cluster/operator/assembly/AbstractConnectOperator.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/status/KafkaMirrorMaker2Status.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this new feature being added, I had a few small suggested changes to wording etc. The main question I had though was around where in the reconciliation flow we are doing the auto restart and what that means for the connector status. It would be useful to have a test where the connector fails initially, is restarted successfully and we see the status is now ready.
...tor/src/main/java/io/strimzi/operator/cluster/operator/assembly/AbstractConnectOperator.java
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/status/AutoRestartStatus.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I left a couple of comments.
api/src/main/java/io/strimzi/api/kafka/model/status/AutoRestartStatus.java
Outdated
Show resolved
Hide resolved
...tor/src/main/java/io/strimzi/operator/cluster/operator/assembly/AbstractConnectOperator.java
Show resolved
Hide resolved
I know I come pretty late in the process but after reviewing the proposal, I've got a few questions:
I've not checked the PR yet, I'll try to take a look tomorrow. |
Yeah, good point. It should not be anabled by default. The API classes should not have the default object but should be null by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I made some suggestions on the doc. Can re-review for any additions and changes.
api/src/main/java/io/strimzi/api/kafka/model/status/AutoRestartStatus.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/status/AutoRestartStatus.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/status/AutoRestartStatus.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/status/AutoRestartStatus.java
Outdated
Show resolved
Hide resolved
92102c8
to
cbc11e0
Compare
api/src/main/java/io/strimzi/api/kafka/model/KafkaMirrorMaker2ConnectorSpec.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/KafkaConnectorSpec.java
Outdated
Show resolved
Hide resolved
packaging/helm-charts/helm3/strimzi-kafka-operator/crds/048-Crd-kafkamirrormaker2.yaml
Outdated
Show resolved
Hide resolved
bf3c488
to
fc3f0ad
Compare
Thanks everyone for your feedbacks 🙂 Does anyone know how I can test this method ? I didn't find any related tests for the underlying MirrorMaker2's connectors. Is there something I'm missing or Kafka Connector tests are enough ? |
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
Two ideas come to my mind ... make it possible to configure the current time into the method. So instead of testing in a realtime, you can call the method with some artificial timestamp. E.g. always pass
TBH, I'm not sure. IMHO the Mirror Maker connectors are passed into the connector operator. So maybe there are no other tests? |
Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
Co-authored-by: Kate Stanley <11195226+katheris@users.noreply.github.com> Signed-off-by: Thomas Dangleterre <75978678+ThomasDangleterre@users.noreply.github.com> Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com> Signed-off-by: Thomas Dangleterre <75978678+ThomasDangleterre@users.noreply.github.com> Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
Co-authored-by: PaulRMellor <47596553+PaulRMellor@users.noreply.github.com> Signed-off-by: Thomas Dangleterre <75978678+ThomasDangleterre@users.noreply.github.com> Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
… true Signed-off-by: ThomasDangleterre <thomas.dangleterre@decathlon.com>
4fef123
to
d985ec7
Compare
Factoring out some notion of the "current time" for any code which would depend on the current time is usually the right approach. That could simply be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few nits, but overall I'm happy with this. Thanks!
operator-common/src/main/java/io/strimzi/operator/common/operator/resource/StatusUtils.java
Outdated
Show resolved
Hide resolved
operator-common/src/main/java/io/strimzi/operator/common/operator/resource/StatusUtils.java
Outdated
Show resolved
Hide resolved
operator-common/src/main/java/io/strimzi/operator/common/operator/resource/StatusUtils.java
Outdated
Show resolved
Hide resolved
148b2e7
to
5bd3e8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The build failed because of an unused import:
[ERROR] /home/vsts/work/1/s/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/AbstractConnectOperator.java:79:8: Unused import - java.time.temporal.ChronoUnit. [UnusedImports]
Can you please fix it?
@katheris Do you have something more for this PR? |
…tor/resource/StatusUtils.java Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com> Signed-off-by: Thomas Dangleterre <75978678+ThomasDangleterre@users.noreply.github.com>
5bd3e8b
to
df17ca5
Compare
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
@ThomasDangleterre Thanks a lot for this PR, this is a great addition! |
@scholzj is this live in 0.32 or do we need to wait until 0.33? |
Hi @kmcrawford, |
Type of change
Description
Implements Automatically restarting FAILED connectors and tasks - proposal 007
Related issue : #2621
Checklist
Please go through this checklist and make sure all applicable tasks have been done