Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption using Prometheus Docker v2.0.0-alpha.2 image on NFS #2805

Closed
bvis opened this Issue Jun 4, 2017 · 27 comments

Comments

Projects
None yet
@bvis
Copy link

bvis commented Jun 4, 2017

What did you do?
I started a prometheus task in a Docker swarm cluster. I started it with the next config:

prometheus:
    image: prom/prometheus:v2.0.0-alpha.2
    ports:
      - "9090:9090"
    networks:
      - monitoring
    volumes:
      - prometheus:/prometheus
    command: -config.file=/run/secrets/prometheus.yml -web.external-url=http://prometheus2.${CLUSTER_DOMAIN}
   secrets:
      - source: "monitoring_prometheus.yml"
        target: "prometheus.yml"
        uid: "0"
        gid: "0"
        mode: 0400
      - source: "monitoring_alert.rules_nodes"
        target: "alert.rules_nodes"
        uid: "0"
        gid: "0"
        mode: 0400
      - source: "monitoring_alert.rules_service-groups"
        target: "alert.rules_service-groups"
        uid: "0"
        gid: "0"
        mode: 0400
      - source: "monitoring_alert.rules_tasks"
        target: "alert.rules_tasks"
        uid: "0"
        gid: "0"
        mode: 0400
    deploy:
      mode: replicated
      replicas: 1
      resources:
        limits:
          cpus: '0.50'
          memory: 1024M
        reservations:
          cpus: '0.50'
          memory: 128M

Where these "secrets" are just the configuration file and alerts used by Prometheus.

What did you expect to see?

I expected to see it working. 0:)

What did you see instead? Under which circumstances?

I've seen that the service was not running (as it was running as a "beta" service I didn't have any monitoring over it) and when checking the logs I saw:

"time="2017-06-04T22:15:10Z" level=error msg="Opening storage failed: read meta information data/01BHMHDEMCSNG02KJKPC8M5YH0: open data/01BHMHDEMCSNG02KJKPC8M5YH0/meta.json: no such file or directory" source="main.go:89""

Environment

It's running in a Swarm cluster, in AWS in EU-WEST-1 in 3 different AZs.
It's running as a swarm service with 1 task.
The data is stored in a EFS system using rexray/efs plugin.

  • System information:

Linux 4.10.0-21-generic x86_64

  • Prometheus version:
prometheus, version 2.0.0-alpha.2 (branch: master, revision: ab0ce4a8d9858956b37b545cfb84bb4edb5d7776)
  build user:       root@fda0efffe2cf
  build date:       20170524-15:34:23
  go version:       go1.8.1
  • Prometheus configuration file:
global:
  scrape_interval:     30s
  evaluation_interval: 30s

  labels:
    cluster: swarm
    replica: "1"

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'prometheus'

rule_files:
  - "alert.rules_nodes"
  - "alert.rules_tasks"
  - "alert.rules_service-groups"

scrape_configs:
  - job_name: 'prometheus'
    dns_sd_configs:
    - names:
      - 'tasks.prometheus'
      type: 'A'
      port: 9090

  - job_name: 'cadvisor'
    dns_sd_configs:
    - names:
      - 'tasks.cadvisor'
      type: 'A'
      port: 8080

  - job_name: 'node-exporter'
    dns_sd_configs:
    - names:
      - 'tasks.node-exporter'
      type: 'A'
      port: 9100

  - job_name: 'docker-exporter'
    dns_sd_configs:
    - names:
      - 'tasks.docker-exporter'
      type: 'A'
      port: 4999

This means that I'm using built-in DNS service discovery in Swarm to autodiscover the task endpoint, to be consumed by Prometheus. It's something I've been doing with previous versions.

@cornelf

This comment has been minimized.

Copy link

cornelf commented Jun 5, 2017

Confirmed bug for my use case as well.
ECS,
EFS backed storages

level=error msg="Opening storage failed: read meta information /prometheus/01BHMHCQ64M68EATD9YAZTAH29: open /prometheus/01BHMHCQ64M68EATD9YAZTAH29/meta.json: no such file or directory" source="main.go:89"
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 5, 2017

Does the same happen without EFS?

We strongly recommend not using NFS or other networked filesystems.

@cornelf

This comment has been minimized.

Copy link

cornelf commented Jun 5, 2017

@brian-brazil thanks for the amazingly quick feedback.

Is there no chance for the prometheus:2 to support (eventually) network attached storage? or the target is only to support SAN rather than NAS in order to make EBS a recommended storage solution for Prometheus timeseries db?
We took this wishful path based on the blog prometheus-2.0-storage-layer-optimization - but by network volumes Fabian may refer to network block devices only.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 5, 2017

We support working Posix filesystems, and recommend they be local for reliability and performance.

NFS is not known for being a working Posix filesystem.

@cornelf

This comment has been minimized.

Copy link

cornelf commented Jun 5, 2017

@les FYI

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 5, 2017

@cornelf

This comment has been minimized.

Copy link

cornelf commented Jun 5, 2017

I switched the alpha-2 to EBS and will let you know whether the issue is still gonna be there with it.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 6, 2017

@gouthamve do you have any hunch what could possibly cause us to not write (or even delete) a meta.json file. The code seems super straightforward in that regard.

@fabxc fabxc modified the milestone: v2.x Jun 6, 2017

@fabxc fabxc added the dev-2.0 label Jun 6, 2017

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jun 12, 2017

I think the issue is only with EFS, but EFS claims to be POSIX compliant. There are some docs on its consistency guarantees: http://docs.aws.amazon.com/efs/latest/ug/using-fs.html#consistency but I am not able to understand why meta.json is not being written.

Maybe, the changes are not being written as we are not closing the directory (ref: prometheus/tsdb#96)

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 12, 2017

Interesting possibility. Let's see if this keeps getting reported after a release with the fix.

@stp-ip

This comment has been minimized.

Copy link

stp-ip commented Aug 25, 2017

Seems like we are running into the same issue on a NFS v4 mount.

@cainelli

This comment has been minimized.

Copy link

cainelli commented Sep 14, 2017

Apparently, we got into this issue with EFS and version v2.0.0-beta.0

@sbadakhc

This comment has been minimized.

Copy link

sbadakhc commented Sep 23, 2017

Seeing this also using an NFS mount. A fix would be appreciated!

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 23, 2017

Per above the issue here is NFS, and this the fix is not to use NFS.

@sbadakhc

This comment has been minimized.

Copy link

sbadakhc commented Sep 24, 2017

Fair play unfortunately I don't have a choice. The thing I did was to ramp up the sampling time to 1min and it's now been running longer although for how long I'm not sure.

Thanks.

@saily

This comment has been minimized.

Copy link

saily commented Sep 25, 2017

I have the same issue when running prom/prometheus:v2.0.0-beta.5 on Kubernetes with NFS as data storage. How do you guys recommend running Prometheus on Kubernetes?

@brian-brazil brian-brazil changed the title Data corruption using Prometheus Docker v2.0.0-alpha.2 image Data corruption using Prometheus Docker v2.0.0-alpha.2 image on NFS Oct 3, 2017

@sbadakhc

This comment has been minimized.

Copy link

sbadakhc commented Oct 10, 2017

Is there a way to get Prometheus to the send data via a network socket rather than write to disk? If if could stream the data via a network socket I could have it write on a non nfs storage and pick it up from there?

@adrissss

This comment has been minimized.

Copy link

adrissss commented Oct 17, 2017

It happened to us with v2.0.0-rc.0 + docker swarm + EFS volume.

Since then, prometheus won't restart anymore.
I tried deleting the most recent metrics directory which contained an empty meta.json, but then I get this on startup:

level=info ts=2017-10-17T16:13:18.1300919Z caller=main.go:213 msg="Starting prometheus" version="(version=2.0.0-rc.0, branch=HEAD, revision=012e52e3f9a0c1741b160498615a29bd5d723028)"
level=info ts=2017-10-17T16:13:18.130276035Z caller=main.go:214 build_context="(go=go1.9.1, user=root@d94ce23e8b9b, date=20171005-14:42:30)"
level=info ts=2017-10-17T16:13:18.130340863Z caller=main.go:215 host_details="(Linux 4.9.36-moby #1 SMP Tue Jul 11 02:00:07 UTC 2017 x86_64 f23d8f1a0fd5 (none))"
level=info ts=2017-10-17T16:13:18.136561486Z caller=main.go:307 msg="Starting TSDB"
level=info ts=2017-10-17T16:13:18.136571317Z caller=web.go:378 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2017-10-17T16:13:18.138752945Z caller=targetmanager.go:68 component="target manager" msg="Starting target manager..."

And it stays there for hours, with no other error message. A grafana query while in this state gives:

{
  "status": "error",
  "errorType": "execution",
  "error": "TSDB not ready",
  "message": "TSDB not ready"
}

Any suggestions to make it start again without losing all the stored metrics?
Thanks!

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Oct 17, 2017

@adrissss This just looks like you ran into the WAL read issue. At SoundCloud we saw startup times of over 90m on our busy servers. Fabian has improved the WAL read times drastically and the same server boots in 2m now. All these changes are part of rc.1, please update and report again.

@adrissss

This comment has been minimized.

Copy link

adrissss commented Oct 17, 2017

Thanks for your reply @grobie It could be reading the WAL as you said, as I see 15-25% of CPU time in io. I've updated to 2.0.0-rc1 and initially everything seemed exactly the same. But after about 25 min the prometheus container just died. So I'd say that whatever it was doing before, now it is doing it faster.

Actually I've seen its memory usage grows continuously until it reaches the limit I set up for the container:

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM % 
3bb397bb43c8        7.42%               5.706GiB / 6.348GiB   89.89%

And it dies when it reaches exactly those 6.348 GB (in an 8GB node hosting just this prometheus container). The whole process starts again as the service is configured to be restarted on-failure:

ID                  NAME                          IMAGE                         NODE                                        DESIRED STATE       CURRENT STATE              ERROR                         PORTS
rndh682d15jy        monitoring_prometheus.1       prom/prometheus:v2.0.0-rc.1   ip-172-31-3-23.eu-west-1.compute.internal   Running             Running 3 minutes ago
p7kfbe0m64p4         \_ monitoring_prometheus.1   prom/prometheus:v2.0.0-rc.1   ip-172-31-3-23.eu-west-1.compute.internal   Shutdown            Failed 3 minutes ago       "task: non-zero exit (137)"
n12uie1shrmo         \_ monitoring_prometheus.1   prom/prometheus:v2.0.0-rc.1   ip-172-31-3-23.eu-west-1.compute.internal   Shutdown            Failed 26 minutes ago      "task: non-zero exit (137)"

Any thoughts ?
cheers !

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 17, 2017

That indeed sounds like an OOM kill. Prometheus mmaps all storage data and thus externally it appears to just keep using more and more memory. However, the operating system can take this memory back immediately as soon as another process needs it without killing the Prometheus.

Running containers with memory limits in Kubernetes confirmed this behavior to work as expected, i.e. the memory occupied by mmap'd is ceiled by the containers limits without killing the process. This should not differ in general.

Could you monitor the Prometheus instance itself and tell us what the go_memstats_heap_inuse_bytes metric tells you? If this one keeps growing more and more as well, it indicates another problem. Also, sharing some details on the size of the Prometheus instance is important.

@haraldschilly

This comment has been minimized.

Copy link

haraldschilly commented Nov 7, 2017

Hi, we're running 2.0.0-rc3 and I got here because I just saw this

level=error ts=... caller=main.go:323 msg="Opening storage failed" err="read meta information /data/01....: unexpected end of JSON input

I don't know what happened, but yes, it could be OOM related. We're not running this on NFS. I'll look at what this go_memstats_heap_inuse_bytes metric says.

edit: in case someone is curious, the meta.json file was completely empty.

@haraldschilly

This comment has been minimized.

Copy link

haraldschilly commented Nov 7, 2017

Well, go_memstats_heap_inuse_bytes is always dancing up and down in the range of 2 to 4 gb in the last 7 days.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Nov 12, 2017

The memory behavior sounds normal. I cannot point my finger at your error right now. We are writing the meta.json file in one go and only make the block visible after that completed. Even a crash should not cause this theoretically.

However, we did some fixes that should've addressed NFS related issues. Since you are not running NFS, I'll close here. Please open a new issue if you encounter this again.

@fabxc fabxc closed this Nov 12, 2017

@auhlig

This comment has been minimized.

Copy link

auhlig commented Nov 30, 2017

Happend with prometheus:v2.0.0 using NFS:

level=info ts=2017-11-30T13:53:15.774449669Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2017-11-30T13:53:15.774567774Z caller=main.go:216 build_context="(go=go1.9.2, user=root@615b82cb36b6, date=20171108-07:11:59)"
level=info ts=2017-11-30T13:53:15.774584415Z caller=main.go:217 host_details="(Linux 4.13.9-coreos #1 SMP Thu Oct 26 03:21:00 UTC 2017 x86_64 prometheus-frontend-4217608546-6mkiw (none))"
level=info ts=2017-11-30T13:53:15.77544454Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2017-11-30T13:53:15.776060323Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2017-11-30T13:53:15.776080166Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=error ts=2017-11-30T13:53:16.931485157Z caller=main.go:323 msg="Opening storage failed" err="read meta information /prometheus/01BZEP0ZW3E3W53ASV0K62VCH4: open /prometheus/01BZEP0ZW3E3W53ASV0K62VCH4/meta.json: no such file or directory"

Config can be found here.

Happend twice. One incident was related to a node reboot, which might have caused an ungraceful shutdown of prometheus. Today this happend again without any obvious reason. Had to delete the folder to get it back working. Memory usage is consistent.

EDIT: Happened again with 2 separate Prometheus instances. Out of nowhere. Memory usage consistent. Nothing obvious.

Any help would be much appreciated @fabxc.

@anguslees

This comment has been minimized.

Copy link

anguslees commented Feb 13, 2018

I can confirm that prometheus 2.1.0 still chokes on these directories.

Fwiw, the issue is caused/exacerbated by nfs's handling of deleted-while-open files.

level=error ts=2018-02-13T11:15:45.089150199Z caller=main.go:579 err="Opening storage failed read meta information /prometheus/01C3F05JJQFGNVC9E9N028SVP3: open /prometheus/01C3F05JJQFGNVC9E9N028SVP3/meta.json: no such file or directory"
core@localhost ~ $ sudo find /var/lib/kubelet/pods/ecb68feb-10ae-11e8-b1d0-02500b02b531/volumes/kubernetes.io~nfs/pvc-35d43d9d-f4ec-11e7-a8b1-02120902b07c/01C3F05JJQFGNVC9E9N028SVP3 -ls      
33033501    4 drwxr-xr-x   3 nobody   nobody       4096 Feb  8 15:30 /var/lib/kubelet/pods/ecb68feb-10ae-11e8-b1d0-02500b02b531/volumes/kubernetes.io~nfs/pvc-35d43d9d-f4ec-11e7-a8b1-02120902b07c/01C3F05JJQFGNVC9E9N028SVP3
33033502    4 drwxr-xr-x   2 nobody   nobody       4096 Feb  8 15:30 /var/lib/kubelet/pods/ecb68feb-10ae-11e8-b1d0-02500b02b531/volumes/kubernetes.io~nfs/pvc-35d43d9d-f4ec-11e7-a8b1-02120902b07c/01C3F05JJQFGNVC9E9N028SVP3/chunks
33033504 58896 -rw-r--r--   1 nobody   nobody   60307522 Jan 10 03:00 /var/lib/kubelet/pods/ecb68feb-10ae-11e8-b1d0-02500b02b531/volumes/kubernetes.io~nfs/pvc-35d43d9d-f4ec-11e7-a8b1-02120902b07c/01C3F05JJQFGNVC9E9N028SVP3/chunks/.nfs0000000001f80d2000000001
33033503 8160 -rw-r--r--   1 nobody   nobody    8352682 Jan 10 03:00 /var/lib/kubelet/pods/ecb68feb-10ae-11e8-b1d0-02500b02b531/volumes/kubernetes.io~nfs/pvc-35d43d9d-f4ec-11e7-a8b1-02120902b07c/01C3F05JJQFGNVC9E9N028SVP3/.nfs0000000001f80d1f00000002

If it matters at all, these are nfs4 mounts:

core@localhost ~ $ grep nfs /proc/mounts
192.168.0.10:/home/kube/monitoring-prometheus-data-pvc-35d43d9d-f4ec-11e7-a8b1-02120902b07c /var/lib/kubelet/pods/ecb68feb-10ae-11e8-b1d0-02500b02b531/volumes/kubernetes.io~nfs/pvc-35d43d9d-f4ec-11e7-a8b1-02120902b07c nfs4 rw,relatime,vers=4.0,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.127,local_lock=none,addr=192.168.0.10 0 0

I need to think further about what's causing these to be created, but I suspect prometheus' file loading should ignore these "empty" directories rather than getting all excited and aborting.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.