Skip to content

Fix Go duration parsing in kafkaCleaner test#2352

Open
delthas wants to merge 2 commits intodevelopment/2.14from
bugfix/ZENKO-5218/fix-duration-parsing-kafkacleaner
Open

Fix Go duration parsing in kafkaCleaner test#2352
delthas wants to merge 2 commits intodevelopment/2.14from
bugfix/ZENKO-5218/fix-duration-parsing-kafkacleaner

Conversation

@delthas
Copy link
Contributor

@delthas delthas commented Mar 13, 2026

Summary

The kafkaCleaner end-to-end test (features/zzz.kafkaCleaner.feature) reads
KafkaCleanerInterval from the Zenko CR's spec.kafkaCleaner.interval field,
which is a Go duration string (e.g. "1m"). The test used parseInt("1m") to
convert this to seconds, but parseInt stops at the first non-numeric character
and silently returned 1 instead of 60.

This caused the test to compute a timeout of ~60s (1 × 6000ms × 10) instead
of the intended ~600s (60 × 6000ms × 10), giving the kafkacleaner only one
cleaning cycle to process all topics rather than ten. Under normal conditions one
cycle was often enough, but when operator reconciliation triggered topic
recreation mid-run, the kafkacleaner needed several cycles to catch up — and the
test failed with:

AssertionError: Kafka cleaner did not clean the topics within the expected time

Fix

Replace parseInt with a parseDurationToSeconds utility that properly handles
Go-style duration strings — including compound durations ("2h45m"), fractional
values ("1.5s"), and all standard time units (ns, us/µs, ms, s, m, h).

For the current "1m" value, this correctly returns 60 seconds, restoring the
intended 10-cycle timeout window.

Issue: ZENKO-5218

The KafkaCleanerInterval parameter is a Go duration string (e.g. "1m")
read from the Zenko CR spec.kafkaCleaner.interval. The test used
parseInt("1m") to parse it, which silently returned 1 instead of 60,
since parseInt stops at the first non-numeric character.

This made the test timeout after ~60s (1 * 6000ms * 10) instead of
~600s (60 * 6000ms * 10), giving the kafkacleaner only one cleaning
cycle to process all topics instead of ten. Under normal conditions
one cycle was often enough, but when operator reconciliation caused
topic recreation during the run, the kafkacleaner needed several
cycles to catch up, causing the test to fail with:
"Kafka cleaner did not clean the topics within the expected time"

Replace parseInt with a proper Go duration parser that handles
compound durations (e.g. "2h45m"), fractional values, and all
standard Go time units (ns, us, ms, s, m, h).

Issue: ZENKO-5218
@bert-e
Copy link
Contributor

bert-e commented Mar 13, 2026

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Mar 13, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Copy link
Contributor

@SylvainSenechal SylvainSenechal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey I'm approving just some thoughts :
This new helper function is kinda ugly and will probably only be used once,

  • I was thinking we could almost just drop the "KafkaCleanerInterval" parameters and just give that function 300sec to timeout.
  • If you do keep the parseDurationSeconds, consider a few documentation example of what "duration: string" => result this function would process

More importantly : In that codebaase, we have Zenko.yaml, and it's got the kafkaCleaner.interval set to 1m, considering that all the tests we've got probably don't produce that many messages (maybe a few hundreds), and that the issue was probably situation where that kafka cleaner would start a few seconds before the wrong 60sec timeout instead of 600sec that we had, I would suggest lowering the value, somewhere between 15s and 30

Drop the KafkaCleanerInterval parameter and parseDurationToSeconds
helper in favor of a hardcoded 300s timeout and 30s check interval.
Lower the kafkaCleaner interval in the CI Zenko CR from 1m to 15s
so the cleaner runs more frequently during tests.

Issue: ZENKO-5218
@delthas
Copy link
Contributor Author

delthas commented Mar 13, 2026

Hey I'm approving just some thoughts : This new helper function is kinda ugly and will probably only be used once,

* I was thinking we could almost just drop the "KafkaCleanerInterval" parameters and just give that function 300sec to timeout.

* If you do keep the parseDurationSeconds, consider a few documentation example of what "duration: string" => result this function would process

More importantly : In that codebaase, we have Zenko.yaml, and it's got the kafkaCleaner.interval set to 1m, considering that all the tests we've got probably don't produce that many messages (maybe a few hundreds), and that the issue was probably situation where that kafka cleaner would start a few seconds before the wrong 60sec timeout instead of 600sec that we had, I would suggest lowering the value, somewhere between 15s and 30

Right. Testing with 15s, timeout of 5 minutes. Let's see how it goes.

storageClassName: "standard"
kafkaCleaner:
interval: 1m
interval: 15s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to reduce from 1m to 15s ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion above: #2352 (review)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1min was done to be conservative, and avoid too much load : is there a significant need or benefit to lower the value?

situation where that kafka cleaner would start a few seconds before the wrong 60sec timeout instead of 600sec that we had, I would suggest lowering the value, somewhere between 15s and 30

the test used to work (before we changed to a duration I guess), and need to have the retry to ensure we eventually see the cleanup -whatever the value. It now fails because there is no retry : changing the period will not change this....

async function (this: Zenko) {
const kfkcIntervalSeconds = parseInt(this.parameters.KafkaCleanerInterval);
const checkInterval = kfkcIntervalSeconds * (1000 + 5000);
const checkIntervalMs = 30_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hardcode the value here?
This creates coupling with the CR, which

  • makes it harder to maintain
  • will further prevent running the tests against any zenko or artesca

hence the solution that was put in place to "derive" the timeout from the behavior configured.

Why not go with the solution you describe in the PR title, and simply parse the "unit"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion above: #2352 (review)

I don't have a strong opinion on this, either my first commit or @SylvainSenechal solution work for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really rather keep coupling low : esp. since we are on a path (c.f. all the tickets) to allow running Zenko tests "everywhere" (locally, against artesca...)

@bert-e
Copy link
Contributor

bert-e commented Mar 16, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following reviewers are expecting changes from the author, or must review again:

@delthas delthas requested a review from francoisferrand March 16, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants