Fix Go duration parsing in kafkaCleaner test#2352
Fix Go duration parsing in kafkaCleaner test#2352delthas wants to merge 2 commits intodevelopment/2.14from
Conversation
The KafkaCleanerInterval parameter is a Go duration string (e.g. "1m")
read from the Zenko CR spec.kafkaCleaner.interval. The test used
parseInt("1m") to parse it, which silently returned 1 instead of 60,
since parseInt stops at the first non-numeric character.
This made the test timeout after ~60s (1 * 6000ms * 10) instead of
~600s (60 * 6000ms * 10), giving the kafkacleaner only one cleaning
cycle to process all topics instead of ten. Under normal conditions
one cycle was often enough, but when operator reconciliation caused
topic recreation during the run, the kafkacleaner needed several
cycles to catch up, causing the test to fail with:
"Kafka cleaner did not clean the topics within the expected time"
Replace parseInt with a proper Go duration parser that handles
compound durations (e.g. "2h45m"), fractional values, and all
standard Go time units (ns, us, ms, s, m, h).
Issue: ZENKO-5218
Hello delthas,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
SylvainSenechal
left a comment
There was a problem hiding this comment.
Hey I'm approving just some thoughts :
This new helper function is kinda ugly and will probably only be used once,
- I was thinking we could almost just drop the "KafkaCleanerInterval" parameters and just give that function 300sec to timeout.
- If you do keep the parseDurationSeconds, consider a few documentation example of what "duration: string" => result this function would process
More importantly : In that codebaase, we have Zenko.yaml, and it's got the kafkaCleaner.interval set to 1m, considering that all the tests we've got probably don't produce that many messages (maybe a few hundreds), and that the issue was probably situation where that kafka cleaner would start a few seconds before the wrong 60sec timeout instead of 600sec that we had, I would suggest lowering the value, somewhere between 15s and 30
Drop the KafkaCleanerInterval parameter and parseDurationToSeconds helper in favor of a hardcoded 300s timeout and 30s check interval. Lower the kafkaCleaner interval in the CI Zenko CR from 1m to 15s so the cleaner runs more frequently during tests. Issue: ZENKO-5218
Right. Testing with 15s, timeout of 5 minutes. Let's see how it goes. |
| storageClassName: "standard" | ||
| kafkaCleaner: | ||
| interval: 1m | ||
| interval: 15s |
There was a problem hiding this comment.
Why do we want to reduce from 1m to 15s ?
There was a problem hiding this comment.
1min was done to be conservative, and avoid too much load : is there a significant need or benefit to lower the value?
situation where that kafka cleaner would start a few seconds before the wrong 60sec timeout instead of 600sec that we had, I would suggest lowering the value, somewhere between 15s and 30
the test used to work (before we changed to a duration I guess), and need to have the retry to ensure we eventually see the cleanup -whatever the value. It now fails because there is no retry : changing the period will not change this....
| async function (this: Zenko) { | ||
| const kfkcIntervalSeconds = parseInt(this.parameters.KafkaCleanerInterval); | ||
| const checkInterval = kfkcIntervalSeconds * (1000 + 5000); | ||
| const checkIntervalMs = 30_000; |
There was a problem hiding this comment.
Why hardcode the value here?
This creates coupling with the CR, which
- makes it harder to maintain
- will further prevent running the tests against any zenko or artesca
hence the solution that was put in place to "derive" the timeout from the behavior configured.
Why not go with the solution you describe in the PR title, and simply parse the "unit"?
There was a problem hiding this comment.
See discussion above: #2352 (review)
I don't have a strong opinion on this, either my first commit or @SylvainSenechal solution work for me.
There was a problem hiding this comment.
I'd really rather keep coupling low : esp. since we are on a path (c.f. all the tickets) to allow running Zenko tests "everywhere" (locally, against artesca...)
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
The following reviewers are expecting changes from the author, or must review again: |
Summary
The
kafkaCleanerend-to-end test (features/zzz.kafkaCleaner.feature) readsKafkaCleanerIntervalfrom the Zenko CR'sspec.kafkaCleaner.intervalfield,which is a Go duration string (e.g.
"1m"). The test usedparseInt("1m")toconvert this to seconds, but
parseIntstops at the first non-numeric characterand silently returned
1instead of60.This caused the test to compute a timeout of ~60s (
1 × 6000ms × 10) insteadof the intended ~600s (
60 × 6000ms × 10), giving the kafkacleaner only onecleaning cycle to process all topics rather than ten. Under normal conditions one
cycle was often enough, but when operator reconciliation triggered topic
recreation mid-run, the kafkacleaner needed several cycles to catch up — and the
test failed with:
Fix
Replace
parseIntwith aparseDurationToSecondsutility that properly handlesGo-style duration strings — including compound durations (
"2h45m"), fractionalvalues (
"1.5s"), and all standard time units (ns,us/µs,ms,s,m,h).For the current
"1m"value, this correctly returns60seconds, restoring theintended 10-cycle timeout window.
Issue: ZENKO-5218