Skip to content
This repository has been archived by the owner on Apr 12, 2022. It is now read-only.

Elasticsearch 6: Failed to create node environment #187

Closed
csdev opened this issue Aug 3, 2018 · 14 comments
Closed

Elasticsearch 6: Failed to create node environment #187

csdev opened this issue Aug 3, 2018 · 14 comments

Comments

@csdev
Copy link

csdev commented Aug 3, 2018

Bug Description

Suppose we have a data directory that is not owned by the elasticsearch user, so we set group permissions on it as per the documentation:

sudo mkdir esdatadir
sudo chmod g+rwx esdatadir
sudo chgrp 1000 esdatadir

Then we run the elasticsearch container:

docker run -v `pwd`/esdatadir:/usr/share/elasticsearch/data -v `pwd`/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml docker.elastic.co/elasticsearch/elasticsearch:6.3.2

This results in the following error:

[2018-08-03T17:03:09,439][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to create node environment
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.3.2.jar:6.3.2]
        at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.3.2.jar:6.3.2]
Caused by: java.lang.IllegalStateException: Failed to create node environment
        at org.elasticsearch.node.Node.<init>(Node.java:273) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.2.jar:6.3.2]
        ... 6 more
Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:385) ~[?:?]
        at java.nio.file.Files.createDirectory(Files.java:682) ~[?:?]
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:789) ~[?:?]
        at java.nio.file.Files.createDirectories(Files.java:775) ~[?:?]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:203) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:270) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.2.jar:6.3.2]
        ... 6 more

However, this used to work correctly on elasticsearch 5. For example, this command runs successfully:

docker run -v `pwd`/esdatadir:/usr/share/elasticsearch/data -v `pwd`/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml docker.elastic.co/elasticsearch/elasticsearch:5.6.10

It seems like the newer containers require something more that just group permissions on the data directory, so either the documentation is wrong or there is a bug somewhere.

Environment

$ docker info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: syslog
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.14.47-56.37.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 30.64GiB
Name:
ID: HMOB:C52U:DFYD:4SFE:Q27M:O5IV:7BM2:OZAB:Q27H:NOSI:ITT5:6AFZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
@dliappis
Copy link
Contributor

dliappis commented Aug 4, 2018

I presume that the uid of the user that you issuing the commands as is not 1000? I was able to reproduce this by creating a local data dir and using something like sudo chown -R 1500:1000 esdatadir. Looking at the tests, seems we are only testing for the case with random uid, but gid=0, so this might be a genuine mistake/bug in the documentation. If you use chgrp -R 0 esdatadir it should work fine. Also chown -R 1000:1000 esdatadir will work.

@currycan
Copy link

I also meet the same problem when I mount logs dir from the container to local ,and it returns to me as follow:
es_node1 | OpenJDK 64-Bit Server VM warning: Cannot open file logs/gc.log due to Permission denied

@jarpy
Copy link
Contributor

jarpy commented Aug 20, 2018

It appears that the entrypoint is not correctly setting the primary group of the user to 1000. We are using the --userspec feature of GNU chroot(8) to change the UID to 1000 before starting Elasticsearch. What we didn't notice in the past is that this does not automatically set the GID of the resulting process to that user's primary group:

$ docker run --rm -it docker.elastic.co/elasticsearch/elasticsearch:6.3.2 bash

[root@d92b5ede3193 elasticsearch]# chroot --userspec=1000 / id
uid=1000(elasticsearch) gid=0(root) groups=0(root)

We can, however, explicitly set the GID of the process using the same technique:

[root@d92b5ede3193 elasticsearch]# chroot --userspec=1000:1000 / id
uid=1000(elasticsearch) gid=1000(elasticsearch) groups=1000(elasticsearch)

I suggest we do this in the entrypoint.

@jarpy
Copy link
Contributor

jarpy commented Aug 20, 2018

Alternatively, now that we prefer GID 0 in some circumstances (OpenShift), it may be better to amend the docs and deprecate all use of, and references to, GID 1000. What do you think, @dliappis?

@jeremycod
Copy link

jeremycod commented Nov 24, 2018

Hi @dliappis ,

My docker-compose looks like this:

elasticsearch:
     container_name: elasticsearch
     image: ${ES_VERSION_MANIFEST}
     environment:
          - LOGSPOUT=ignore
          - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
          - cluster.name=docker-prosolo-elasticsearch
          - bootstrap.memory_lock=true
     #command: -Des.node.name="Prosolo-elasticsearch"
     ulimits:
          memlock:
             soft: -1
             hard: -1
     networks:
          - localnet
     ports:
         - 9200:9200
         - 9300:9300
     volumes:
         - ./elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/elasticsearch.yml
         - elasticsearch_volume:/usr/share/elasticsearch/data
volumes:
  cassandra_volume:
    external:
      name: cassandra_data_volume_${VCS_BRANCH}
  mysql_volume:
    external:
      name: mysql_data_volume_${VCS_BRANCH}
  elasticsearch_volume:
    external:
      name: elasticsearch_data_volume_${VCS_BRANCH}

And my local user id is 1000, so workaround with changing ownership to 1000:1000 for /usr/share/elasticsearch doesn't work. Is there any other solution?

Thanks

@jarpy
Copy link
Contributor

jarpy commented Nov 26, 2018

Have you tried changing the group of the files to 0?

@jeremycod
Copy link

Have you tried changing the group of the files to 0?

You mean 1000:0 ? I just tried it, and it doesn't work.

@jarpy
Copy link
Contributor

jarpy commented Nov 26, 2018

To be honest, I'm not sure I fully understand the behaviour you are seeing. This issue is about file permissions for the data directory, but it looks like you are using a named volume for the data dir. That should not have any problem. Can you expand a bit on what's not working for you?

@dliappis
Copy link
Contributor

Alternatively, now that we prefer GID 0 in some circumstances (OpenShift), it may be better to amend the docs and deprecate all use of, and references to, GID 1000. What do you think, @dliappis?

@jarpy sorry this took time to get back to. I vaguely remember that I was happy with the chroot --userspec=1000 / id default behavior but it's quite some time ago so I might be altogether wrong. I agree, in my opinion it's best to deprecate all uses and refs to gid 1000 in our docs.

@jeremycod
Copy link

To be honest, I'm not sure I fully understand the behaviour you are seeing. This issue is about file permissions for the data directory, but it looks like you are using a named volume for the data dir. That should not have any problem. Can you expand a bit on what's not working for you?

@jarpy I'm using named volume for data dir, but I can see exactly the same exception as in original post not being able to create node environment because of the AccessDeniedException on /usr/share/elasticsearch/data/nodes. I've manually created these directories and tried to change owner of elasticsearch directory and all children. It doesn't work for me since I'm user 1000.

@jarpy
Copy link
Contributor

jarpy commented Dec 3, 2018

I don't think that your user account being UID 1000 is a problem. My UID is usually 1000, too. That's only in the host UID namespace, however. Unless you have explicitly arranged otherwise, the container is running in its own UID namespace. It doesn't know about "you", only the "elasticsearch" user, which is UID 1000 in that namespace.

@jeremycod
Copy link

I don't think that your user account being UID 1000 is a problem. My UID is usually 1000, too. That's only in the host UID namespace, however. Unless you have explicitly arranged otherwise, the container is running in its own UID namespace. It doesn't know about "you", only the "elasticsearch" user, which is UID 1000 in that namespace.

I'm not sure how it is possible that container is not aware of host user. These files are on the host machine, and when I change ownership to 1000, it is assigned to my username. For me, it doesn't make sense that docker container can override the ownership of the files that live on host machine and which is assigned by host OS, and take the ownership on the files. I'm not an expert for docker, so I can't claim this for sure. It's just my thought.

@jarpy
Copy link
Contributor

jarpy commented Dec 5, 2018

I'm not sure how it is possible that container is not aware of host user.

It's all about Linux namespaces. These are really the essence of containers. The PID namepace is also really important, for example. That's how your process can think it's PID 1 inside the container, but can also be seen as some other number outside. UIDs within the container are completely separate from those on the host. That model does break down a little, however, if you share a filesystem between the host and the container. Then, you have to start worrying about matching up the UIDs. We recommend that people avoid sharing filesystems if possible and instead use storage that is dedicated to the container, like Docker named volumes (which you are doing).

If there are any existing file in those volumes, then it's best to make them UID:GID 1000:0. The presence of a host user with UID 1000 has no effect. Better still, just recreate the volumes and let Elasticsearch create the files it needs. I assume there is no data in them, since the process won't start?

What is the exact error that you are seeing?

@rjernst
Copy link
Member

rjernst commented Jun 25, 2019

This issue hasn't had any feedback in 6 months. Additionally, maintenance of the docker files for elasticsearch has moved to the elasticsearch repo. Please open any new bug reports there. As we will be archiving this repository, I am going to close this issue.

@rjernst rjernst closed this as completed Jun 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants