-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flow: services terminate too early during process shutdown #266
Comments
I'll also add that it's not obvious what a clean fix for this problem looks like. Maybe the Flow controller needs a dedicated scheduler just for services to allow it to be terminated later? |
This issue has not had any activity in the past 30 days, so the |
Hi there 👋 On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025. To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :) |
I'm also not sure what a clean solution would look like. It can get complicated when we discuss a cluster of collectors doing a rolling restart. There are a few obvious options:
Probably the only realistic solution is to "not fall behind" 😄 For users affected by this, in the short term it might be best to look into why the collector is taking so long to send the samples. Some other, "cleaner", solutions could be:
|
What's wrong?
When Flow is shutting down, it does the following:
Because some components take a while to terminate (such as
prometheus.remote_write
which waits to flush buffered metrics), this means that the process may be running for a period of time without services available.This has a few implications, but the main issue is that the HTTP service is terminated nearly immediately, preventing users from collecting metrics during the shutdown process. This will be observable as scrape failures, as the agent is still running but can't be scraped.
Steps to reproduce
System information
N/A
Software version
v0.39.1
Configuration
No response
Logs
No response
The text was updated successfully, but these errors were encountered: