Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve UTO logs #9718

Merged
merged 2 commits into from Feb 29, 2024
Merged

Improve UTO logs #9718

merged 2 commits into from Feb 29, 2024

Conversation

fvaleri
Copy link
Contributor

@fvaleri fvaleri commented Feb 20, 2024

This change improves the UTO logs. It now logs every periodic reconciliation and every batch reconciliation. It also logs individual topic reconciliations in case of failure. All YAML paylodas have been replaced with topic name.

Log example: uto.log. This should close #9465.


The KafkaTopic informer resync configuration had a small issue which was causing the skip of some periodic reconciliations. They were often triggering at 4 minutes, instead of 2 minutes.

The informer interval acts like a heartbeat, then each handler interval will cause a resync at some interval of the overall heartbeat. The closer these values are together the more likely it is that the handler skips one informer intervals. Setting both intervals to the same value generates just enough skew that when the informer checks if the handler is ready for resync it sees that it still needs another couple of micro-seconds and skips to the next informer level resync.

This is fixed by introducing a small fixed resync period for the informer. The resync operation is all in memory and results in a noop most of the time, so this causes no harm.

This change improves the UTO logs. It now logs every periodic reconciliation and every batch reconciliation. It also logs individual topic reconciliations in case of failure. All YAML paylodas have been replaced with topic name.

---

The KafkaTopic informer resync configuration had a small issue which was causing the skip of some periodic reconciliations. They were often triggering at 4 minutes, instead of 2 minutes.

The informer interval acts like a heartbeat, then each handler interval will cause a resync at some interval of the overall heartbeat. The closer these values are together the more likely it is that the handler skips one informer intervals. Setting both intervals to the same value generates just enough skew that when the informer checks if the handler is ready for resync it sees that it still needs another couple of micro-seconds and skips to the next informer level resync.

This is fixed by introducing a small fixed resync period for the informer. The resync operation is all in memory and results in a noop most of the time, so this causes no harm.

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
@fvaleri fvaleri added this to the 0.40.0 milestone Feb 20, 2024
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is a good start. The other operators normally log on the INFO level the reconciliation of each resource. That is useful to detect if for example the KafkaTopic resource has the right label etc. But it shows that something is going on and it is not excessive. So as far as I'm concerned we can start with this and see what more do we need in the future.

Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor points, but this LGTM.

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
@scholzj scholzj merged commit 3e3643d into strimzi:main Feb 29, 2024
13 checks passed
@fvaleri fvaleri deleted the improve-uto-logs branch February 29, 2024 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[UTO] needs to use log levels in a better way
5 participants