Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ST] Fix MigrationST #9954

Merged
merged 2 commits into from
Apr 16, 2024
Merged

[ST] Fix MigrationST #9954

merged 2 commits into from
Apr 16, 2024

Conversation

im-konge
Copy link
Member

@im-konge im-konge commented Apr 11, 2024

Type of change

  • Bugfix

Description

This PR fixes few things failing in the STs in the MigrationST class:

  • from time to time, producer fails to send one or two messages, so I extended the delivery.timeout.ms (together with request.timeout.ms) to 30s, which should be sufficient in case of disconnection from the node
  • long wait before the continuous clients are finished - changed the delay to 500 ms (from 1000ms)
  • issue with deletion of KafkaTopic(s) -> the issue is only with topics from the ZK mode, when we migrated to fully KRaft, from time to time UTO wasn't able to delete the KafkaTopic(s) and because of that the clean-up of our STs was failing. For now I just turned off finalizers in UTO, but this is being investigated
  • small timeout for movement of Kafka cluster state to KRaftDualWriting and other states -> extended to 5 minutes, however it's still not enough from time to time, depending on the KafkaRoller and when the Pods are rolled. This is being investigated as well
  • small timeout for migration AZP -> increased to 240minutes.

Other than this I decreased the wait timeout for "default" resource deletion -> from 3 minutes to 2, as it should be sufficient timeout for resources like CRDs, KafkaTopic, KafkaUser, Secret, ConfigMap, etc.

Checklist

  • Make sure all tests pass

Signed-off-by: Lukas Kral <lukywill16@gmail.com>
@im-konge
Copy link
Member Author

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@im-konge im-konge added this to the 0.41.0 milestone Apr 11, 2024
@im-konge
Copy link
Member Author

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@im-konge
Copy link
Member Author

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@im-konge im-konge marked this pull request as ready for review April 12, 2024 12:07
Signed-off-by: Lukas Kral <lukywill16@gmail.com>
@im-konge
Copy link
Member Author

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@henryZrncik henryZrncik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Job! Increase in request.timeout.ms and delivery.timeout.ms seems reasonable to me. Also the cleaning logs seems to be clean of any errors now, so thanks for looking on that.

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming it works.
At the same time you should plan investigations on the stuff already mentioned in the description. This PR sounds like a temporary workaround to me if I got it correctly.

@im-konge
Copy link
Member Author

At the same time you should plan investigations on the stuff already mentioned in the description. This PR sounds like a temporary workaround to me if I got it correctly.

The only thing that needs investigation is the deletion of topics by UTO.
The timeout for movement to KRaftDualWriting state is due to KAFKA-16563.
I will check the issue with topics further - to track it, I created #9975.

@im-konge im-konge merged commit d939954 into strimzi:main Apr 16, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants