-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rerun Kubelet Scraper When Data is Populated #633
Conversation
2a8ca24
to
03a5cd9
Compare
cmd/nri-kubernetes/main.go
Outdated
return fmt.Errorf("retrieving kubelet data: %w", err) | ||
if kubeletScraper.IsMaxRerunReached(kubeletReruns) { | ||
return fmt.Errorf("retrieving kubelet data: %w", err) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we can omit the else statement since the if above has a return? can help readability a bit. overall looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, remove the else
@@ -8,7 +8,7 @@ sources: | |||
- https://github.com/newrelic/nri-kubernetes/tree/main/charts/newrelic-infrastructure | |||
- https://github.com/newrelic/infrastructure-agent/ | |||
|
|||
version: 3.12.0 | |||
version: 3.12.1 | |||
appVersion: 3.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make changes to the agent, I believe we want to bump both the appVersion
and the chart version
.
That said, I am new to helm charts and the nri-kubernetes
release process, so I'm less clear on how this works in practice. 😅 @nserrino - do we want to make these chart changes in this PR or wait until we do a release next Wednesday?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nserrino I made several updates to address the lints, including the version change, after you approval. :) So pls review the PR again. I will revert the version change if it is not proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the chart change before the PR merge
This PR addresses the issue when Kubelet becomes temporarily unavailable, no Kubelet metrics are populated, and the error "kubelet data was not populated after trying all endpoints" from Kubelet scraper is returned to main.
The current behavior of K8s agent main is that if any (ksm, kubelet, controlplane) scrapers returns any error, the K8s agent will Immediately fail and exit with code 5.
Instead of exiting immediately when kubelet is not reachable, we can set a ScraperMaxReruns (e.g., 4 times) at the scraper level. This will improve customer experience when kubelet is temporarily not reachable.