Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerun Kubelet Scraper When Data is Populated #633

Merged
merged 7 commits into from
Feb 22, 2023
Merged

Conversation

xqi-nr
Copy link
Contributor

@xqi-nr xqi-nr commented Feb 17, 2023

This PR addresses the issue when Kubelet becomes temporarily unavailable, no Kubelet metrics are populated, and the error "kubelet data was not populated after trying all endpoints" from Kubelet scraper is returned to main.
The current behavior of K8s agent main is that if any (ksm, kubelet, controlplane) scrapers returns any error, the K8s agent will Immediately fail and exit with code 5.

Instead of exiting immediately when kubelet is not reachable, we can set a ScraperMaxReruns (e.g., 4 times) at the scraper level. This will improve customer experience when kubelet is temporarily not reachable.

@xqi-nr xqi-nr requested a review from a team as a code owner February 17, 2023 00:34
return fmt.Errorf("retrieving kubelet data: %w", err)
if kubeletScraper.IsMaxRerunReached(kubeletReruns) {
return fmt.Errorf("retrieving kubelet data: %w", err)
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we can omit the else statement since the if above has a return? can help readability a bit. overall looks good!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, remove the else

@@ -8,7 +8,7 @@ sources:
- https://github.com/newrelic/nri-kubernetes/tree/main/charts/newrelic-infrastructure
- https://github.com/newrelic/infrastructure-agent/

version: 3.12.0
version: 3.12.1
appVersion: 3.6.0
Copy link
Contributor

@htroisi htroisi Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make changes to the agent, I believe we want to bump both the appVersion and the chart version.

That said, I am new to helm charts and the nri-kubernetes release process, so I'm less clear on how this works in practice. 😅 @nserrino - do we want to make these chart changes in this PR or wait until we do a release next Wednesday?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nserrino I made several updates to address the lints, including the version change, after you approval. :) So pls review the PR again. I will revert the version change if it is not proper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the chart change before the PR merge

@xqi-nr xqi-nr merged commit 5641443 into main Feb 22, 2023
@xqi-nr xqi-nr deleted the add_scraper_retries branch February 22, 2023 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants