-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stable-2.3 | kata-monitor: increase delay before syncing with the container manager #3554
stable-2.3 | kata-monitor: increase delay before syncing with the container manager #3554
Conversation
When we detect a new kata sandbox from the sbs fs, we add that to the sandbox cache to retrieve metrics. We also schedule a sync with the container manager, which we consider the source of truth: if the kata pod is not yet ready the container manager will not report it and we will drop it from our cache. We will add it back only when we re-sync, i.e., when we get an event from the sbs fs (which means a kata pod has been terminated or a new one has been started). Since we use the sync with the container manager to remove pods from the cache, we can wait some more before syncing (and so reduce the chance to miss a kata pod just because it was not ready yet). Let's raise the waiting time before starting the sync timer. Fixes: #3550 Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Wonder if we could make that a configurable option in a later iteration?
A head's up that may affect your PR. In the Architecture Committee meeting from January 25th, 2022, the Architecture Committee has agreed on using the "Dismiss stale pull request approvals when new commits are pushed" configuration from GitHub. It basically means that if your PR has been rebased or updated, the approvals given will be erased. In order to minimize traumas due to the new approach, please, consider adding a note on the changes done before the rebase / force-push, and also pinging the reviewers for a subsequent round of reviews. Thanks for your understanding! Related issue: kata-containers/community#249 |
Thanks for reviewing!
Yeah, I had that thought too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm as a workaround and I do agree it's not worth pulling the full fix down to the stable branch.
Thanks, @fgiudici!
/retest-s390x |
This is meant to address #3550 on the stable branch.
Proposal
We are stable, so we probably don't want to backport #3553 here.
Since we use the sync with the container manager basically to just remove pods from the cache, we can wait some more time before syncing (and so reduce the chance to miss a kata pod just because it was not ready yet) without missing reporting any metric. The sync will just clean-up the sandbox cache from stale entries.
So, let's just raise the waiting time before starting the sync timer to an higher value: this will not fix the issue (if a pod has not completely started when the sync timer fires, we will miss its metrics till a new kata pod starts / is removed) but will make the issue to be much more unlikely to happen.