Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticSearch 1.6 fails to start with an empty/missing fstab #12018

Closed
peikk0 opened this issue Jul 3, 2015 · 27 comments
Closed

ElasticSearch 1.6 fails to start with an empty/missing fstab #12018

peikk0 opened this issue Jul 3, 2015 · 27 comments
Labels
:Core/Infra/Core Core issues without another label

Comments

@peikk0
Copy link

peikk0 commented Jul 3, 2015

Hi,

I'm running ES in a FreeBSD jail, so there is no fstab and no mount point visible. And since I upgraded ES to version 1.6, it no longer start because it fails to obtain a lock because "Mount point not found in fstab" (file permissions are ok).

[2015-07-03 12:22:10,088][INFO ][node                     ] [Awesome Android] version[1.6.0], pid[42071], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-07-03 12:22:10,088][INFO ][node                     ] [Awesome Android] initializing ...
[2015-07-03 12:22:10,092][INFO ][plugins                  ] [Awesome Android] loaded [], sites []
[2015-07-03 12:22:10,133][ERROR][bootstrap                ] Exception
org.elasticsearch.ElasticsearchIllegalStateException: Failed to obtain node lock, is the following location writable?: [/var/db/elasticsearch/mon]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:158)
        at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:162)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:77)
        at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:245)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.io.IOException: failed to obtain lock on /var/db/elasticsearch/mon/nodes/49
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:145)
        ... 5 more
Caused by: java.io.IOException: Mount point not found in fstab
        at sun.nio.fs.BsdFileStore.findMountEntry(BsdFileStore.java:86)
        at sun.nio.fs.UnixFileStore.<init>(UnixFileStore.java:65)
        at sun.nio.fs.BsdFileStore.<init>(BsdFileStore.java:40)
        at sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:53)
        at sun.nio.fs.BsdFileSystemProvider.getFileStore(BsdFileSystemProvider.java:37)
        at sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:368)
        at java.nio.file.Files.getFileStore(Files.java:1413)
        at org.elasticsearch.env.NodeEnvironment.getFileStore(NodeEnvironment.java:256)
        at org.elasticsearch.env.NodeEnvironment.access$000(NodeEnvironment.java:62)
        at org.elasticsearch.env.NodeEnvironment$NodePath.<init>(NodeEnvironment.java:75)
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:134)
        ... 5 more

Why does it even need to access /etc/fstab just to obtain a lock?

@clintongormley clintongormley added discuss :Core/Infra/Core Core issues without another label labels Jul 5, 2015
@clintongormley
Copy link

It looks up the type of filesystem I think to determine whether it is an SSD or not, as that changes the default number of merge threads.

@mikemccand any ideas here?

@mikemccand
Copy link
Contributor

For 1.x we use this for for diagnostics (log filesystem type, mount point, free space for each path.data on node init) and from JmxFsProbe (pulling "fs" node stats). This was done in #10502 and #10527 ...

In 2.0 we also log the spins detection (SSD or not). Currently ES does not default merge schedule defaults according to spins; rather, we always use aggressive settings (more than one merge thread). I'm not sure we should change to Lucene's defaults... that could be a sudden change on ES users upgrading.

Before, JmxFsProbe would only call Files.getFileStore API when you asked for fs stats (and sigar wasn't used), so you wouldn't hit this unless you pulled fs stats w/o sigar, but now we cache the FileStore on init instead.

@clintongormley
Copy link

@mikemccand so what should we do in this case (freebsd jail) where there is not fstab?

@mikemccand
Copy link
Contributor

@peikk0 Can you configure your jail to include an fstab? I don't have much experience with jails in FreeBSD but on some quick googling it seems like this is possible, e.g. https://forums.freebsd.org/threads/jail-conf.34741/

@peikk0
Copy link
Author

peikk0 commented Jul 7, 2015

Those fstab are used by the host and are not visible inside the jail. Jails are not allowed to mount anything by default anyway and don't have access to devices, or else it could compromise the host and other jails.

Anyways, even if I create a fake fstab in the jail, it still won't start:

yavin ~ # mount
corellia/usr/jails/mon on / (zfs, local, noatime, nfsv4acls)
yavin ~ # cat /etc/fstab
corellia/usr/jails/mon / zfs noauto 0 0
yavin ~ # service elasticsearch console
[2015-07-07 17:19:02,750][INFO ][node                     ] [Mantra] version[1.6.0], pid[3165], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-07-07 17:19:02,750][INFO ][node                     ] [Mantra] initializing ...
[2015-07-07 17:19:02,754][INFO ][plugins                  ] [Mantra] loaded [], sites []
{1.6.0}: Initialization Failed ...
- ElasticsearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/var/db/elasticsearch/mon]]
        IOException[failed to obtain lock on /var/db/elasticsearch/mon/nodes/49]
                IOException[Mount point not found in fstab]

IMO, a proper fix would be to catch the exception and use a safe default in this case.

@rmuir
Copy link
Contributor

rmuir commented Jul 7, 2015

I don't think thats necessarily safe. Its abnormal that you cannot retrieve information from the filestore. I don't think we should hide the error condition.

@rmuir
Copy link
Contributor

rmuir commented Jul 7, 2015

also keep in mind, I think the filestore is used to know the amount of disk space.

@peikk0
Copy link
Author

peikk0 commented Jul 7, 2015

/etc/fstab is not a reliable source anyway, it defines what should be mounted, but not what actually is. How about simply using the output of mount? Or something equivalent (no /proc/mounts on FreeBSD either, or /proc/ at all).

@rmuir
Copy link
Contributor

rmuir commented Jul 7, 2015

Those are issues to take up with the bsd port of the openjdk IMO. we are just using the only way in java to do it, and thats to call Files.getFileStore

@clintongormley
Copy link

It sounds like this configuration can't be supported - we need access to this info, and we rely on Java to provide it.

Closing

@djneades
Copy link

djneades commented Oct 9, 2015

I just ran into this problem – it’s exceedingly unhelpful behaviour for those of us wishing to jail elasticsearch.

@rmuir
Copy link
Contributor

rmuir commented Oct 9, 2015

You need to open issues with oracle about it. There is nothing we can do.

@peikk0
Copy link
Author

peikk0 commented Oct 9, 2015

I've read it works fine with OpenJDK 8, I haven't tried it yet though.

@djneades
Copy link

djneades commented Oct 9, 2015

@peikk0, thank you for the pointer. Unfortunately, that doesn’t seem to be the case, at least in my configuration (I have a data directory nullfs mounted into the jail’s directory hierarchy). I have OpenJDK8 installed (and uninstalled OpenJDK7 just to make sure that JDK 8 is being used), but I still encounter the problem.

@djneades
Copy link

djneades commented Oct 9, 2015

Having investigated this a little further, it seems that setting the jail’s enforce_statfs property to 1 allows Elasticsearch to obtain the information it needs (at least when running with OpenJDK 8). From the jail (8) man page:

enforce_statfs
             This determines what information processes in a jail are able to
             get about mount points.  It affects the behaviour of the follow‐
             ing syscalls: statfs(2), fstatfs(2), getfsstat(2), and
             fhstatfs(2) (as well as similar compatibility syscalls).  When
             set to 0, all mount points are available without any restric‐
             tions.  When set to 1, only mount points below the jail's chroot
             directory are visible.  In addition to that, the path to the
             jail's chroot directory is removed from the front of their path‐
             names.  When set to 2 (default), above syscalls can operate only
             on a mount-point where the jail's chroot directory is located.

Perhaps this information will be useful to anyone else who encounters this issue.

@peikk0
Copy link
Author

peikk0 commented Oct 9, 2015

Good catch! I just tried it and it works with OpenJDK 7 too!

@rmuir
Copy link
Contributor

rmuir commented Oct 9, 2015

Nice solution! I think we should return this information in the error if we hit exception trying to pull the filestores on freebsd. I will take care of it.

@rmuir rmuir self-assigned this Oct 9, 2015
@djneades
Copy link

djneades commented Oct 9, 2015

@peikk0, thank you for the confirmation, that’s good to know.

@rmuir, an informational message would no doubt be very helpful, good call!

rmuir added a commit to rmuir/elasticsearch that referenced this issue Oct 15, 2015
…e_statfs=1

We can't track disk usage in this situation, failing is the correct thing to do.
But we can give a FreeBSD-specific error message, so the user can set the
necessary jail parameters, versus a vague IOException.

Closes elastic#12018
rmuir added a commit that referenced this issue Oct 24, 2015
…e_statfs=1

We can't track disk usage in this situation, failing is the correct thing to do.
But we can give a FreeBSD-specific error message, so the user can set the
necessary jail parameters, versus a vague IOException.

Closes #12018
rmuir added a commit that referenced this issue Oct 24, 2015
…e_statfs=1

We can't track disk usage in this situation, failing is the correct thing to do.
But we can give a FreeBSD-specific error message, so the user can set the
necessary jail parameters, versus a vague IOException.

Closes #12018
@marcoc610
Copy link

I have experienced the same problem on Linux (so no jails, no chroot) when the path.data is in a BTRFS subvolume that is not mounted.
Actually I think that it is more a java issue, but I wonder if there is any workaround for btrfs subvolumes too... And, yes, there is one: mounting the subvolumes explicitly solved the issue for me.

@hydrapolic
Copy link

This is still an issue on Linux.

@remram44
Copy link

remram44 commented Mar 9, 2017

I am running into this on Linux. I don't think it is right for elasticsearch to go through that much magic to determine if locking should work. At the very least, I'd expect it to at least try to lock rather than exiting with an error "Failed to obtain node lock" although the conditions are right.

@dakrone
Copy link
Member

dakrone commented Mar 9, 2017

@remram44 are you running this on linux with no virtualization? without an /etc/fstab file? Maybe you can give us more information about your setup.

@remram44
Copy link

remram44 commented Mar 9, 2017

I'm running from a chroot, so there is no entry in /proc/mounts for /. Adding some fstab with fake info gives out ElasticsearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/var/lib/elasticsearch/elasticsearch]]

@remram44
Copy link

remram44 commented Mar 9, 2017

Moving the directory I chroot, creating an empty dir where it was and mounting the new location to the old with mount -o bind, I can get elasticsearch to start. Tell me again why you need elasticsearch's root to be its own filesystem? Surely if the rest of the world can lock files without it, elasticsearch can manage? Why add so much magic when it will only break things?

@remram44
Copy link

remram44 commented Mar 9, 2017

strace log

@samimb
Copy link

samimb commented May 14, 2018

Was this ever solved? I'm trying to setup elastic in an iocage with Version: 6.2.4, Build: ccec39f/2018-04-12T20:37:28.497551Z, JVM: 1.8.0_162

The iocage is running directly on an SSD with /var/db/elasticsearch mounted as nullfs from the host on a mechanical drive.

@renormalist
Copy link

In my world with elasticsearch 7.3.1 I still need to apply above bind-mount workaround from @remram44.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label
Projects
None yet
Development

No branches or pull requests