Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federation errors logged so often disk was filled #1042

Closed
TheTincho opened this Issue Aug 30, 2015 · 9 comments

Comments

Projects
None yet
3 participants
@TheTincho
Copy link
Contributor

TheTincho commented Aug 30, 2015

Hi,

Since I enabled federation in a group of servers, the log started to fill up pretty fast with errors like this one:

time="2015-08-30T10:04:51Z" level=warning msg="Ignoring sample with out-of-order timestamp for fingerprint 3cba2e147c172b70 (up{instance=\"<redacted>:9100\", job=\"node\", monitor=\"<redacted>\"}): 1440929077.209 is not after 1440929077.209" file=storage.go line=564 

On one side, I don't know why I get this error, but the main problem here is that these messages were logged a few times per second, and my /var partition got full in no time. I think they should be rate limited, or somehow bounded so this does not happen again.

Thanks.

@TheTincho

This comment has been minimized.

Copy link
Contributor Author

TheTincho commented Aug 30, 2015

The error seems to appear when a metric is not being updated on the leaf prometheus server, because the target is down. Unless I am missing something here, this looks like a bug, since it is not that the timestamp is going back in time, it is always the same value...

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 30, 2015

Are you running 0.15.1? This should no longer happen in master.

On Sun, Aug 30, 2015, 6:15 PM Martín Ferrari notifications@github.com
wrote:

The error seems to appear when a metric is not being updated on the leaf
prometheus server, because the target is down. Unless I am missing
something here, this looks like a bug, since it is not that the timestamp
is going back in time, it is always the same value...


Reply to this email directly or view it on GitHub
#1042 (comment)
.

@TheTincho

This comment has been minimized.

Copy link
Contributor Author

TheTincho commented Aug 30, 2015

On 30/08/15 20:02, Fabian Reinartz wrote:

Are you running 0.15.1? This should no longer happen in master.

Yes, 0.15.1 in all machines.

Martín Ferrari (Tincho)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 30, 2015

That indicates a misconfiguration then, could you be pulling in the same timeseries from two targets?

@TheTincho

This comment has been minimized.

Copy link
Contributor Author

TheTincho commented Aug 30, 2015

Ah, that might be the problem then. I am scraping the leaf prometheus, its node_exporter and federating it, while the leaf prometheus also scrapes itself and its node_exporter... I did not think this would be a problem.

In any case, I believe the logging issue might deserve some attention, as the growth rate is alarming :)

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 30, 2015

Do you mean in other places too, or just this instance. As I said, this
should no longer happen in master.

On Sun, Aug 30, 2015, 10:11 PM Martín Ferrari notifications@github.com
wrote:

Ah, that might be the problem then. I am scraping the leaf prometheus, its
node_exporter and federating it, while the leaf prometheus also scrapes
itself and its node_exporter... I did not think this would be a problem.

In any case, I believe the logging issue might deserve some attention, as
the growth rate is alarming :)


Reply to this email directly or view it on GitHub
#1042 (comment)
.

@TheTincho

This comment has been minimized.

Copy link
Contributor Author

TheTincho commented Aug 30, 2015

No, I just meant this, if it is already fixed post-0.15.1, then this can be closed.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 31, 2015

To be clear, this is not necessarily a misconfiguration. If your federation frequency is higher than that of some federated time series, this warning would still be logged (in 0.15.1).

Just ping again should this not be resolved with the next version for some reason.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.