Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus doesn't ignore lost+found directory when bind mounting to container /prometheus #953

Closed
hookenz opened this Issue Aug 3, 2015 · 8 comments

Comments

Projects
None yet
5 participants
@hookenz
Copy link

hookenz commented Aug 3, 2015

Hi, Great project. I've just started using it and found an issue.

If you have a filesystem that you want to use for storing prometheus data and bind mount that entire filesystem to /prometheus then it won't start.

To work around it, you have to remove lost+found initially (which will be created by fsck) or create a directory beneath it and bind-mount from there.

i.e.

~$ docker run --name prometheus -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml -v /var/lib/prometheus:/prometheus prom/prometheus
prometheus, version 0.15.1 (branch: stable, revision: 64349aa)
  build user:       @bfbdc5abbb9f
  build date:       20150727-15:58:57
  go version:       1.4.2
time="2015-08-03T05:51:44Z" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" file=main.go line=173 
time="2015-08-03T05:51:44Z" level=error msg="Error opening memory series storage: could not detect storage version on disk, assuming version 0, need version 1 - please wipe storage or run a version of Prometheus compatible with storage version 0" file=main.go line=116
~ $ ls -l /var/lib/prometheus
total 16
drwx------ 2 root root 16384 Aug  3 04:54 lost+found

~$ rmdir /var/lib/prometheus/lost+found
~$ docker run --name prometheus -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus
/prometheus.yml:/etc/prometheus/prometheus.conf  prom/prometheusprometheus, version 0.15.1 (branch: stable, revision: 64349aa)
  build user:       @bfbdc5abbb9f
  build date:       20150727-15:58:57
  go version:       1.4.2
time="2015-08-03T05:48:49Z" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" file=main.go line=173 
time="2015-08-03T05:48:49Z" level=info msg="Loading series map and head chunks..." file=storage.go line=263 
time="2015-08-03T05:48:49Z" level=info msg="0 series loaded." file=storage.go line=268 
time="2015-08-03T05:48:49Z" level=info msg="Starting target manager..." file=targetmanager.go line=75 
time="2015-08-03T05:48:49Z" level=info msg="Listening on :9090" file=web.go line=186
...

I assume it expects an empty directory.  But in some cases, you might want an empty file system.  In my case this is an ceph rbd volume but could equally be an nfs share or similar I suppose.  

@beorn7 beorn7 self-assigned this Aug 3, 2015

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 3, 2015

Indeed. If Prometheus finds anything in the directory but doesn't find the marker files it needs to establish sanity, it will show above behavior.

This conservative approach was taken because older versions did not have a VERSION file at all, so "anything but not a VERSION file == probably older storage version".

Not sure if we want to go less conservative here or require the directory to be truly empty. (You could use a sub directory on the Ceph or NFS share in your case.)

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Aug 3, 2015

I would vote to be less conservative and ignore anything that won't be used, i.e. if the marker doesn't exist check whether anything would collide with the possible names used, but if it doesn't keep going.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 3, 2015

I think we should be strict on this (though an exception for lost+found would be reasonable), as a user may try and have multiple things share the storage directory which would end badly and cause support effort for us.

@davidchua

This comment has been minimized.

Copy link

davidchua commented Mar 2, 2017

I'm having the same problem. I've deleted lost+found and I'm using Prometheus v1.3.0.

Basically I was running Prometheus on Kubernetes with the /prometheus/data mounted from an persistent AzureDisk.

The AzureDisk went out of space so I brought it down and resized it and reattached it.

time="2017-03-02T10:00:09Z" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:247"
time="2017-03-02T10:00:09Z" level=error msg="Error opening memory series storage: could not detect storage version on disk, assuming version 0, need version 1 - please wipe storage or run a version of Prometheus compatible with storage version 0" source="main.go:181"
@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Mar 2, 2017

Please upgrade to 1.5.2. This has been fixed in the meantime.

@davidchua

This comment has been minimized.

Copy link

davidchua commented Mar 10, 2017

@beorn7 I've upgraded to 1.5.2 and I'm getting the follow error now:

time="2017-03-10T07:04:18Z" level=error msg="Error opening memory series storage: readdirent: input/output error" source="main.go:182"
@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Mar 10, 2017

It cannot read a directory. Perhaps a permission problem? This doesn't seem to be a problem related to the original issue, but a configuration problem on your side. It makes way more sense to discuss this problem on the users mailing list as others can help you there and benefit from any solution provided.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.