Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboards flatline and frequent WAL truncation #3489

Closed
smd1000 opened this Issue Nov 17, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@smd1000
Copy link

smd1000 commented Nov 17, 2017

What did you do?
Viewing dashboards.

What did you expect to see?
Datapoints

What did you see instead? Under which circumstances?
Queries flatline. I also see frequent block compaction.

Environment
amzn-linux,

  • System information:

Linux 4.9.58-18.51.amzn1.x86_64 x86_64

  • Prometheus version:

prometheus, version 2.0.0 (branch: HEAD, revision: 0a74f98)
build user: root@615b82cb36b6
build date: 20171108-07:11:59
go version: go1.9.2

  • Alertmanager version:

alertmanager, version 0.9.1 (branch: HEAD, revision: 9f5f4b2a516d35cfaf196530b277f1d109254569)
build user: root@3b87d661c3dd
build date: 20170929-12:59:03
go version: go1.9

  • Prometheus configuration file:
global:
  scrape_interval:     60s
  evaluation_interval: 60s

  external_labels:
      monitor: 'prometheus-monitor'

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093
    scheme: http
    timeout: 10s

rule_files:
  - "rules/*"

scrape_configs:

  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'aws'
    static_configs:
      - targets: ['localhost:9273']

  - job_name: 'app'
    static_configs:
      - targets: ['localhost:9126']

  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['localhost:9091']
  • Logs:
level=info ts=2017-11-14T20:08:57.186437172Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2017-11-14T20:08:57.186506925Z caller=main.go:216 build_context="(go=go1.9.2, user=root@615b82cb36b6, date=20171108-07:11:59)"
level=info ts=2017-11-14T20:08:57.186536909Z caller=main.go:217 host_details="(Linux 4.9.58-18.51.amzn1.x86_64 #1 SMP Tue Oct 24 22:44:07 UTC 2017 x86_64 ip-10-124-2-209 (none))"
level=info ts=2017-11-14T20:08:57.189515259Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2017-11-14T20:08:57.18954558Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=info ts=2017-11-14T20:08:57.189517046Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=warn ts=2017-11-14T20:09:05.440305942Z caller=head.go:317 component=tsdb msg="unknown series references in WAL samples" count=865
level=info ts=2017-11-14T20:09:05.465688773Z caller=main.go:326 msg="TSDB started"
level=info ts=2017-11-14T20:09:05.46574615Z caller=main.go:394 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2017-11-14T20:09:05.46762271Z caller=main.go:371 msg="Server is ready to receive requests."
level=info ts=2017-11-14T21:00:01.454529363Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510682400000 maxt=1510689600000
level=info ts=2017-11-14T21:00:03.158424768Z caller=head.go:345 component=tsdb msg="head GC completed" duration=95.023716ms
level=info ts=2017-11-14T21:00:03.55238007Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=393.872988ms
level=info ts=2017-11-14T21:00:03.744823258Z caller=compact.go:361 component=tsdb msg="compact blocks" count=2 mint=1510639200000 maxt=1510682400000
level=info ts=2017-11-14T23:00:01.455737134Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510689600000 maxt=1510696800000
level=info ts=2017-11-14T23:00:04.313731712Z caller=head.go:345 component=tsdb msg="head GC completed" duration=185.897575ms
level=info ts=2017-11-14T23:00:04.972420021Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=658.592641ms
level=debug ts=2017-11-15T00:32:49.070495729Z caller=scrape.go:663 component="target manager" scrape_pool=newrelic target=http://localhost:9126/metrics msg="Scrape failed" err="context deadline exceeded"
level=info ts=2017-11-15T01:00:01.455823233Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510696800000 maxt=1510704000000
level=info ts=2017-11-15T01:00:05.279454176Z caller=head.go:345 component=tsdb msg="head GC completed" duration=138.282245ms
level=info ts=2017-11-15T01:00:05.791411398Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=511.871365ms
level=info ts=2017-11-15T01:00:06.094684591Z caller=compact.go:361 component=tsdb msg="compact blocks" count=3 mint=1510682400000 maxt=1510704000000
level=info ts=2017-11-15T03:00:01.454705402Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510704000000 maxt=1510711200000
level=info ts=2017-11-15T03:00:04.386439244Z caller=head.go:345 component=tsdb msg="head GC completed" duration=145.12478ms
level=info ts=2017-11-15T03:00:05.096143089Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=709.617418ms
level=info ts=2017-11-15T05:00:01.455064802Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510711200000 maxt=1510718400000
level=info ts=2017-11-15T05:00:04.589628187Z caller=head.go:345 component=tsdb msg="head GC completed" duration=156.182415ms
level=info ts=2017-11-15T05:00:05.420742151Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=831.028291ms
level=info ts=2017-11-15T07:00:01.454496331Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510718400000 maxt=1510725600000
level=info ts=2017-11-15T07:00:04.464419419Z caller=head.go:345 component=tsdb msg="head GC completed" duration=221.557482ms
level=info ts=2017-11-15T07:00:05.294324507Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=829.815786ms
level=info ts=2017-11-15T07:00:05.556352877Z caller=compact.go:361 component=tsdb msg="compact blocks" count=3 mint=1510704000000 maxt=1510725600000
level=info ts=2017-11-15T09:00:01.456681837Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510725600000 maxt=1510732800000
level=info ts=2017-11-15T09:00:05.1374666Z caller=head.go:345 component=tsdb msg="head GC completed" duration=150.822873ms
level=info ts=2017-11-15T09:00:05.702010324Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=564.45725ms
level=info ts=2017-11-15T11:00:01.455254659Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510732800000 maxt=1510740000000
level=info ts=2017-11-15T11:00:04.193177332Z caller=head.go:345 component=tsdb msg="head GC completed" duration=177.823958ms
level=info ts=2017-11-15T11:00:04.976260491Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=782.99052ms
level=info ts=2017-11-15T13:00:01.455945864Z caller=compact.go:361 component=tsdb msg="compact blocks" count=1 mint=1510740000000 maxt=1510747200000
level=info ts=2017-11-15T13:00:04.730959109Z caller=head.go:345 component=tsdb msg="head GC completed" duration=194.332063ms
level=info ts=2017-11-15T13:00:05.426657472Z caller=head.go:354 component=tsdb msg="WAL truncation completed" duration=695.610444ms
@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Nov 17, 2017

Hi, not sure what you meant by dashboards flatline but the compactions are supposed to run every 2hrs and from the looks of it, are completely fine according to your logs.

Could you check if the scrapes are successful and if the targets are returning the right data?

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Nov 17, 2017

Closing this as it doesn't look like an issue with Prometheus and looks more like a configuration error / usage question.
It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

Please feel free to re-open if you think this is a Prometheus bug.

@gouthamve gouthamve closed this Nov 17, 2017

@smd1000

This comment has been minimized.

Copy link
Author

smd1000 commented Nov 17, 2017

Does anything in my configuration I pasted above look incorrect? I also rarely receive any scrape errors and their times do not coincide with the flatlining. This was posted here, because we have not previously seen behavior like this in prometheus over the last 3-4mos.

Here's an attachment of what "flatlining" looks like.
screen shot 2017-11-17 at 9 59 14 am

@anthu

This comment has been minimized.

Copy link

anthu commented Nov 17, 2017

You can configure how to handle "null" values in grafana.

For me it looks like your application is not providing this metric for some time or returning the same value during this period (metric updating issue app-side or no requests at all). And grafana is simply connecting the datapoints according your "null value handling" configuration.

How does this graph looks like if you zoom into this "flat" time range?

@smd1000

This comment has been minimized.

Copy link
Author

smd1000 commented Nov 17, 2017

It looks like it's returning the same metric value. The null handling value is not set to connected.

@andrey-kozyrev

This comment has been minimized.

Copy link

andrey-kozyrev commented Mar 5, 2018

Same problem for me. Flat lines go for some time.
Logs:
pms_1 | level=debug ts=2018-03-05T13:53:42.30764359Z caller=scrape.go:676 component="scrape manager" scrape_pool=finagle target=http://akz.local:20001/admin/prometheusMetrics msg="Scrape failed" err="context deadline exceeded"

@smd1000

This comment has been minimized.

Copy link
Author

smd1000 commented Mar 5, 2018

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.