Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service failing to start RHEL 7 #77053

Closed
passdev-sc opened this issue Aug 31, 2021 · 20 comments · Fixed by #80651
Closed

Service failing to start RHEL 7 #77053

passdev-sc opened this issue Aug 31, 2021 · 20 comments · Fixed by #80651
Labels
>bug :Core/Infra/Core Core issues without another label :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Core/Infra Meta label for core/infra team Team:Delivery Meta label for Delivery team

Comments

@passdev-sc
Copy link

passdev-sc commented Aug 31, 2021

Elasticsearch version (bin/elasticsearch --version):
Version: 7.14.0, Build: default/rpm/dd5a0a2acaa2045ff9624f3729fc8a6f40835aa1/2021-07-29T20:49:32.864135063Z, JVM: 16.0.1

Plugins installed: []
None

JVM version (java -version):
Bundled version is:
openjdk version "16.0.1" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-16.0.1+9 (build 16.0.1+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-16.0.1+9 (build 16.0.1+9, mixed mode, sharing)

OS Version is:
openjdk version "16.0.1" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-16.0.1+9 (build 16.0.1+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-16.0.1+9 (build 16.0.1+9, mixed mode, sharing)

-- note I have not changed settings, so the bundled version should be the one in use

OS version (uname -a if on a Unix-like system):
Linux 1025093 3.10.0-957.21.2.el7.x86_64 #1 SMP Tue May 28 09:26:43 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Red Hat Enterprise Linux Server 7.9 (Maipo)

Description of the problem including expected versus actual behavior:
After installing elasticsearch via the "Installing from the RPM repository" instructions on https://www.elastic.co/guide/en/elasticsearch/reference/7.14/rpm.html#rpm-repo we are able to start elasticsearch directly /bin/elasticsearch (with warnings). However when attempting to start the service we receive an error.

Internal exceptions (20 events):
Event: 4.735 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd312da0}: 'void java.lang.invoke.DirectMethodHandle$Holder.invokeStaticInit(java.lang.Object, java.lang.Object, java.lang.Object)'> (0x00000017bd312da0)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 4.778 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd430120}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeStaticInit(java.lang.Object)'> (0x00000017bd430120)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 4.905 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd886a60}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.lang.Object, int, java.lang.Object)'> (0x00000017bd886a60)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 4.910 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd8bfa80}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.lang.Object, java.lang.Object, int, java.lang.Object, java.lang.Object)'> (0x00000017bd8bfa80)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 4.911 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd8c77a8}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, java.lang.Object, int, java.lang.Object)'> (0x00000017bd8c77a8)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 4.912 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd8cbca0}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object, int, java.lang.Object)'> (0x00000017bd8cbca0)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 4.917 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd8f9388}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeStaticInit(java.lang.Object, java.lang.Object, java.lang.Object)'> (0x00000017bd8f9388)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 4.927 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd93d878}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, int, java.lang.Object, java.lang.Object)'> (0x00000017bd93d878)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 4.927 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bd941a08}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java.lang.Object, java.lang.Object, int, java.lang.Object, java.lang.Object)'> (0x00000017bd941a08)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 5.389 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bde97770}: 'int java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object)'> (0x00000017bde97770)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 5.470 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bc149e30}: 'java.lang.Object java.lang.invoke.Invokers$Holder.invokeExact_MT(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object)'> (0x00000017bc149e30)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 5.955 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bc94c528}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object)'> (0x00000017bc94c528)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 6.071 Thread 0x00007f319c0254a0 Implicit null exception at 0x00007f318c096c54 to 0x00007f318c097224
Event: 6.200 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bb603f80}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(java.lang.Object, java.lang.Object, java.lang.Object)'> (0x00000017bb603f80)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 6.224 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bb69b510}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeStaticInit(java.lang.Object, java.lang.Object, long, java.lang.Object)'> (0x00000017bb69b510)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 6.225 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bb6a1b78}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, java.lang.Object, long)'> (0x00000017bb6a1b78)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 6.225 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bb6a58e8}: 'java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, long, java.lang.Object)'> (0x00000017bb6a58e8)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 6.253 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bb7970a0}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object, int)'> (0x00000017bb7970a0)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 6.254 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bb79b598}: 'java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, int)'> (0x00000017bb79b598)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]
Event: 6.254 Thread 0x00007f319c0254a0 Exception <a 'java/lang/NoSuchMethodError'{0x00000017bb79f358}: 'java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object, int, java.lang.Object)'> (0x00000017bb79f358)
thrown [./src/hotspot/share/interpreter/linkResolver.cpp, line 790]

Warnings received when starting elastic search directly:

java.lang.UnsatisfiedLinkError: /tmp/elasticsearch-2615699647026920288/jna10243734774257956315.tmp: /tmp/elasticsearch-2615699647026920288/jna10243734774257956315.tmp: failed to map segment from shared object: Operation not permitted
        at jdk.internal.loader.NativeLibraries.load(Native Method) ~[?:?]
        at jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:383) ~[?:?]
        at jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:227) ~[?:?]
        at jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:169) ~[?:?]
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2383) ~[?:?]
        at java.lang.Runtime.load0(Runtime.java:746) ~[?:?]
        at java.lang.System.load(System.java:1857) ~[?:?]
        at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:1019) ~[jna-5.7.0-1.jar:5.7.0 (b0)]
        at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:989) ~[jna-5.7.0-1.jar:5.7.0 (b0)]
        at com.sun.jna.Native.<clinit>(Native.java:195) ~[jna-5.7.0-1.jar:5.7.0 (b0)]
        at java.lang.Class.forName0(Native Method) ~[?:?]
        at java.lang.Class.forName(Class.java:375) ~[?:?]
        at org.elasticsearch.bootstrap.Natives.<clinit>(Natives.java:34) [elasticsearch-7.14.0.jar:7.14.0]
        at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:102) [elasticsearch-7.14.0.jar:7.14.0]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170) [elasticsearch-7.14.0.jar:7.14.0]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:399) [elasticsearch-7.14.0.jar:7.14.0]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) [elasticsearch-7.14.0.jar:7.14.0]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) [elasticsearch-7.14.0.jar:7.14.0]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:75) [elasticsearch-7.14.0.jar:7.14.0]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:116) [elasticsearch-cli-7.14.0.jar:7.14.0]
        at org.elasticsearch.cli.Command.main(Command.java:79) [elasticsearch-cli-7.14.0.jar:7.14.0]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115) [elasticsearch-7.14.0.jar:7.14.0]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:81) [elasticsearch-7.14.0.jar:7.14.0]

See the link for a discussion regarding this:
https://discuss.elastic.co/t/service-will-not-start-rhe7/282591/5

The current thoughts are that the error is caused as elasticsearch cannot assume the SystemdPlugin is running.
https://github.com/elastic/elasticsearch/blob/master/modules/systemd/src/main/java/org/elasticsearch/systemd/Libsystemd.java#L21-L26

Steps to reproduce:

  1. service elasticsearch start
@passdev-sc passdev-sc added >bug needs:triage Requires assignment of a team area label labels Aug 31, 2021
@pugnascotia
Copy link
Contributor

The error message when running Elasticsearch directly looks like the problem discussed here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/executable-jna-tmpdir.html

So you can try mounting /tmp without noexec or setting an alternative temp directory using the ES_TMPDIR environment variable, and see if that resolves that issue.

Regarding the first error, that one certainly is strange. SystemdPlugin will be running unless the environment variable ES_SD_NOTIFY has been overridden to false. We don't have a stack trace in this case that points to Libsystemd, unlike the stack trace in the forum post.

I'd like to resolve the tmp dir issue first, and see if that affects the other problem - they may have the same root cause.

@pugnascotia pugnascotia added the :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts label Aug 31, 2021
@elasticmachine elasticmachine added the Team:Delivery Meta label for Delivery team label Aug 31, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@passdev-sc
Copy link
Author

@pugnascotia We would prefer to keep /tmp as noexec. Setting the ES_TMPDIR causes elasticsearch to fail when run directly with the same error message as the service above (i.e. NoSuchMethodError)

@passdev-sc
Copy link
Author

For the sake of completeness below is a dump of the output following a direct call after ES_TMPDIR is set:

[0.007s][warning][logging] Output options for existing outputs are ignored.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f12965be152, pid=44781, tid=45033
#
# JRE version: OpenJDK Runtime Environment AdoptOpenJDK-16.0.1+9 (16.0.1+9) (build 16.0.1+9)
# Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK-16.0.1+9 (16.0.1+9, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [jna15201070723724713805.tmp+0x13152]  ffi_prep_closure_loc+0x32
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /var/log/elasticsearch/hs_err_pid44781.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/AdoptOpenJDK/openjdk-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
2021-08-31 13:43:42,968283 UTC [45087] INFO  Main.cc@106 Parent process died - ML controller exiting
Aborted

I have attached the hs_err_pid44781.log file
hs_err_pid44781.log

If you require anything else, please let me know.

@pugnascotia
Copy link
Contributor

You may be running into #77014 - can you also try setting the LIBFFI_TMPDIR environment variable?

@passdev-sc
Copy link
Author

Thanks for the suggestion, but setting the TMPDIR has no effect

@pugnascotia
Copy link
Contributor

Can you attach the config you have tried, with the environment variables above?

@passdev-sc
Copy link
Author

Contents of /etc/sysconfig/elasticsearch

################################
# Elasticsearch
################################

# Elasticsearch home directory
#ES_HOME=/usr/share/elasticsearch

# Elasticsearch Java path
#ES_JAVA_HOME=

# Elasticsearch configuration directory
# Note: this setting will be shared with command-line tools
ES_PATH_CONF=/etc/elasticsearch

#TMP DIR
ES_TMPDIR=/var/lib/elasticsearch/tmp
LIBFFI_TMPDIR=/var/lib/elasticsearch/tmp

# Elasticsearch PID directory
PID_DIR=/var/run/elasticsearch



# Additional Java OPTS
#ES_JAVA_OPTS="-Djava.io.tmpdir=$ES_TMPDIR -Djna.tmpdir=$ES_TMPDIR"
#ES_JAVA_OPTS="-Djna.boot.library.path=$ES_TMPDIR -Djna.debug_load.jna=true -Djna.debug_load=true -Djava.io.tmpdir=$ES_TMPDIR -Djna.tmpdir=$ES_TMPDIR"

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

################################
# Elasticsearch service
################################

# SysV init.d
#
# The number of seconds to wait before checking if Elasticsearch started successfully as a daemon process
ES_STARTUP_SLEEP_TIME=5

################################
# System properties
################################

# Specifies the maximum file descriptor number that can be opened by this process
# When using Systemd, this setting is ignored and the LimitNOFILE defined in
# /usr/lib/systemd/system/elasticsearch.service takes precedence
#MAX_OPEN_FILES=65535

# The maximum number of bytes of memory that may be locked into RAM
# Set to "unlimited" if you use the 'bootstrap.memory_lock: true' option
# in elasticsearch.yml.
# When using systemd, LimitMEMLOCK must be set in a unit file such as
# /etc/systemd/system/elasticsearch.service.d/override.conf.
#MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
# When using Systemd, this setting is ignored and the 'vm.max_map_count'
# property is set at boot time in /usr/lib/sysctl.d/elasticsearch.conf
#MAX_MAP_COUNT=262144

@pugnascotia
Copy link
Contributor

Can I also check that whatever filesystem /var/lib/elasticsearch/tmp is on isn't mounted with noexec?

If so, maybe we need to try using strace -f to figure out what is going on when the segfault happens.

@passdev-sc
Copy link
Author

passdev-sc commented Aug 31, 2021

/var/lib/elasticsearch/tmp is on a file system with exec, so that doesn't appear to be the issue. As an aside, why would running the process directly without ES_TMPDIR not cause an issue?

Dump of strace -f
strace_dump.log

@DaveCTurner
Copy link
Contributor

setting the TMPDIR has no effect

Did you set $TMPDIR or $LIBFFI_TMPDIR? We did a bit of digging and it turns out $LIBFFI_TMPDIR isn't supported yet, but I believe setting TMPDIR=/var/lib/elasticsearch/tmp will help (in addition to setting ES_TMPDIR). If it doesn't, could you send another strace dump and the full hs_err_pid*.log file?

why would running the process directly without ES_TMPDIR not cause an issue?

I suspect this completely disables JNA, rather than enabling it in a semi-broken state.

@passdev-sc
Copy link
Author

Setting TMPDIR appears to have made no difference.

See attached the strace dump
strace_dump.log

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 1, 2021

It doesn't look like it's seeing $TMPDIR:

[pid 44250] statfs("/selinux",  <unfinished ...>
[pid 44250] <... statfs resumed>0x7f2f4dbf52c0) = -1 ENOENT (No such file or directory)
[pid 44250] open("/proc/mounts", O_RDONLY <unfinished ...>
[pid 44250] <... open resumed>)         = 56
[pid 44250] fstat(56, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid 44250] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid 44250] <... mmap resumed>)         = 0x7f2f4dbfd000
[pid 44250] read(56,  <unfinished ...>
[pid 44250] <... read resumed>"rootfs / rootfs rw 0 0\nsysfs /sy"..., 1024) = 1024
[pid 44250] read(56,  <unfinished ...>
[pid 44250] <... read resumed>"l,nosuid,nodev,noexec,relatime,d"..., 1024) = 1024
[pid 44250] close(56)                   = 0
[pid 44250] munmap(0x7f2f4dbfd000, 4096 <unfinished ...>
[pid 44250] <... munmap resumed>)       = 0
[pid 44250] open("/tmp/ffizfQkLs", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600 <unfinished ...>
[pid 44250] <... open resumed>)         = 56
[pid 44250] unlink("/tmp/ffizfQkLs" <unfinished ...>
[pid 44250] <... unlink resumed>)       = 0
[pid 44250] write(56, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096 <unfinished ...>
[pid 44250] <... write resumed>)        = 4096
[pid 44250] mmap(NULL, 4096, PROT_READ|PROT_EXEC, MAP_SHARED, 56, 0 <unfinished ...>
[pid 44250] <... mmap resumed>)         = -1 EPERM (Operation not permitted)
[pid 44250] close(56 <unfinished ...>
[pid 44250] <... close resumed>)        = 0
[pid 44250] open("/var/tmp/ffiunwzFv", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600 <unfinished ...>
[pid 44250] <... open resumed>)         = 56
[pid 44250] unlink("/var/tmp/ffiunwzFv" <unfinished ...>
[pid 44250] <... unlink resumed>)       = 0
[pid 44250] write(56, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096 <unfinished ...>
[pid 44250] <... write resumed>)        = 4096
[pid 44250] mmap(NULL, 4096, PROT_READ|PROT_EXEC, MAP_SHARED, 56, 0 <unfinished ...>
[pid 44250] <... mmap resumed>)         = -1 EPERM (Operation not permitted)
[pid 44250] close(56 <unfinished ...>
[pid 44250] <... close resumed>)        = 0
[pid 44250] open("/dev/shm/ffiFfQ8zy", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600 <unfinished ...>
[pid 44250] <... open resumed>)         = 56
[pid 44250] unlink("/dev/shm/ffiFfQ8zy") = 0
[pid 44250] write(56, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 44250] mmap(NULL, 4096, PROT_READ|PROT_EXEC, MAP_SHARED, 56, 0) = -1 EPERM (Operation not permitted)
[pid 44250] close(56)                   = 0
[pid 44250] open("/nonexistent/ffi2s6VuB", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600) = -1 ENOENT (No such file or directory)

The check for /selinux and the reading of /proc/mounts is coming from selinux_enabled_check(). Then it goes through these paths and variables looking for somewhere to write an executable:

  • $TMPDIR
  • /tmp
  • /var/tmp
  • /dev/shm
  • $HOME

We see it hit /tmp, /var/tmp, /dev/shm and /nonexistent (presumably that's $HOME) but not $TMPDIR so this variable must not be set in the environment of this process.

@passdev-sc
Copy link
Author

passdev-sc commented Sep 1, 2021

We tried setting the $TMPDIR manually (via export), but this doesn't appear to have made a difference.

strace_dump.log

@pugnascotia pugnascotia added :Core/Infra/Core Core issues without another label and removed needs:triage Requires assignment of a team area label labels Sep 2, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Sep 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@rjernst
Copy link
Member

rjernst commented Sep 2, 2021

I don’t think straces are helping here. We know this error is the result of jna trying to load on /tmp with noexec. Can you please provide the Elasticsearch log file? That should help determine what Elasticsearch is seeing.

@passdev-sc
Copy link
Author

Okay elasticsearch.log is empty, but attached are files added to the log folder.

hs_err_pid4435.log
gc.log
gc.log.00.log

@rjernst
Copy link
Member

rjernst commented Sep 2, 2021

Are you still starting via bin/elasticsearch directly or using systemd? If the former, what is the exact command line and env you have set?

@passdev-sc
Copy link
Author

Further update. The service actually starts with the TMPDIR set (sorry I didn't try this before). Directly does not.
Service cmd: service elasticsearch start [WORKS NOW]
Direct: sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch [DOES NOT WORK]

@passdev-sc
Copy link
Author

passdev-sc commented Sep 2, 2021

Further updates... If we keep TMPDIR, but remove ES_TMPDIR (and LIBFFI_TMPDIR) it works for both direct and service.

Therefore, invoking elasticsearch directly with ES_TMPDIR set causes a failure and the service requires the TMPDIR set.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 11, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Largely supersedes elastic#77285
Relates elastic#80617
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 11, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Largely supersedes elastic#77285
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 11, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Largely supersedes elastic#77285
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 11, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Relates elastic#77285
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 11, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Relates elastic#77285
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 11, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Relates elastic#77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
DaveCTurner added a commit that referenced this issue Nov 15, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since #80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes #18272
Closes #73309
Closes #74545
Closes #77014
Closes #77053
Relates #77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 15, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Relates elastic#77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 15, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Relates elastic#77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
elasticsearchmachine pushed a commit that referenced this issue Nov 15, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since #80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes #18272
Closes #73309
Closes #74545
Closes #77014
Closes #77053
Relates #77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
elasticsearchmachine pushed a commit that referenced this issue Nov 15, 2021
* Set LIBFFI_TMPDIR at startup (#80651)

Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since #80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes #18272
Closes #73309
Closes #74545
Closes #77014
Closes #77053
Relates #77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>

* Fix incorrect SSL usage

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Core/Infra Meta label for core/infra team Team:Delivery Meta label for Delivery team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants