Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No chunk files generated and retention doesnt work #5484

Closed
saiharshitachava opened this Issue Apr 18, 2019 · 2 comments

Comments

Projects
None yet
2 participants
@saiharshitachava
Copy link

saiharshitachava commented Apr 18, 2019

**What did you do?**I deployed prometheus and Im collecting data from Docker swarm of 80 nodes and I set the retention time to 30m..but dint work and filled up the space

**What did you expect to see?**I expected prometheus to create a tsdb structure like chunks,meta.json..etc

What did you see instead? Under which circumstances?
only lock and wal stuff is generated and retentions doesnt work

Environment

  • System information:

    Linux 4.4.155-94.50-default x86_64

  • Prometheus version:

    /prometheus $ prometheus --version
    prometheus, version 2.9.1 (branch: HEAD, revision: ad71f27)
    build user: root@09f919068df4
    build date: 20190416-17:50:04
    go version: go1.12.4

  • Alertmanager version:

    not using yet

  • Prometheus configuration file:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'nas-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:

  - job_name: 'node-exporter_nas'
    dns_sd_configs:
    - names:
      - 'tasks.nodeexporter_nas'
      type: 'A'
      port: 9100


  - job_name: 'cadvisor_nas'
    dns_sd_configs:
    - names:
      - 'tasks.cadvisor_nas'
      type: 'A'
      port: 8080

  - job_name: 'prometheus_nas'
    dns_sd_configs:
    - names:
      - 'tasks.prometheus_nas'
/prometheus $ cat /etc/prometheus/prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'nas-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:

  - job_name: 'node-exporter_nas'
    dns_sd_configs:
    - names:
      - 'tasks.nodeexporter_nas'
      type: 'A'
      port: 9100


  - job_name: 'cadvisor_nas'
    dns_sd_configs:
    - names:
      - 'tasks.cadvisor_nas'
      type: 'A'
      port: 8080

  - job_name: 'prometheus_nas'
    dns_sd_configs:
    - names:
      - 'tasks.prometheus_nas'
      type: 'A'
      port: 9090

  • Alertmanager configuration file:
None
  • Logs:Nothing else I see for a sucess gs we should the checkpoints and stuff but I dont see anything
level=info ts=2019-04-18T14:58:09.442Z caller=main.go:321 msg="Starting Prometheus" version="(version=2.9.1, branch=HEAD, revision=ad71f2785fc321092948e33706b04f3150eee44f)"
level=info ts=2019-04-18T14:58:09.442Z caller=main.go:322 build_context="(go=go1.12.4, user=root@09f919068df4, date=20190416-17:50:04)"
level=info ts=2019-04-18T14:58:09.442Z caller=main.go:323 host_details="(Linux 4.4.155-94.50-default #1 SMP Tue Sep 11 13:04:00 UTC 2018 (bc8c7c0) x86_64 387b947e640b (none))"
level=info ts=2019-04-18T14:58:09.442Z caller=main.go:324 fd_limits="(soft=65536, hard=65536)"
level=info ts=2019-04-18T14:58:09.443Z caller=main.go:325 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-04-18T14:58:09.444Z caller=main.go:640 msg="Starting TSDB ..."
level=info ts=2019-04-18T14:58:09.444Z caller=web.go:416 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-04-18T14:58:09.452Z caller=main.go:655 msg="TSDB started"
level=info ts=2019-04-18T14:58:09.452Z caller=main.go:724 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2019-04-18T14:58:09.453Z caller=main.go:751 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2019-04-18T14:58:09.454Z caller=main.go:609 msg="Server is ready to receive web requests."

image

@saiharshitachava

This comment has been minimized.

Copy link
Author

saiharshitachava commented Apr 19, 2019

Any help here would help..I am wondering what is the issue here..It works like charm in the other cluster..but fails here..Could this be system related?

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Apr 19, 2019

If you're limited by disk space, you might try to use the --storage.tsdb.retention.size flag (note that it is experimental and may change in the future) but in any case Prometheus needs some headroom to store the WAL directory.

I'm closing it for now. If you have further questions, please use our user mailing list, which you can also search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.