Elasticsearch 8.11.1 RPM restarts process during update #102103

holgr · 2023-11-13T20:44:12Z

Elasticsearch Version

8.11.1

Installed Plugins

No response

Java Version

bundled

OS Version

Rocky 9.2 5.14.0-70.26.1.el9_0.x86_64

Problem Description

While updating several nodes from ES 8.11.0 to 8.11.1 I noticed that the nodes tried to restart the elasticsearch process. In previous versions this was not the case and required a manual step. Additionally the restart process failed on all nodes I updated. A manual systemctl restart elasticsearch worked however and the nodes are back up.

Steps to Reproduce

Install the elasticsearch-8.11.1 RPM from https://artifacts.elastic.co/packages/8.x/yum/

Logs (if relevant)

Nov 13 21:21:09 hostname systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Nov 13 21:21:09 hostname systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
Nov 13 21:21:09 hostname systemd[1]: elasticsearch.service: Consumed 21min 32.788s CPU time.

The text was updated successfully, but these errors were encountered:

ceeeekay · 2023-11-13T22:40:15Z

I've also run into this issue with the .deb installer. Normally I would preinstall the upgrade and then restart the nodes in an orderly fashion, however this upgrade caused my entire cluster to restart at the same time on 8.11.1, and required manual intervention.

holgr · 2023-11-13T22:41:40Z

Yep, that's my exact workflow as well. Glad I started on a cluster where it didn't really matter.

elasticsearchmachine · 2023-11-20T13:39:44Z

Pinging @elastic/es-delivery (Team:Delivery)

mark-vieira · 2023-11-20T22:55:38Z

I was unable to replicate this when upgrading 8.10.0 to 8.11.1. The only scenario in which the service should be restarted on upgrade is if RESTART_ON_UPGRADE=true in /etc/sysconfig/elasticsearch and by default that is commented out. Was this perhaps the case? Can you provide which version you upgraded from?

See: https://www.elastic.co/guide/en/elasticsearch/reference/current/rpm.html#rpm-configuring

holgr · 2023-11-20T23:04:41Z

The ones described above went from 8.11.0 to 8.11.1. I've since upgraded one additional cluster. It came from 8.10.4 (I think? – it was on the last 8.10.x version) and didn't have this issue.

RESTART_ON_UPGRADE=true is commented out on all of these nodes.

ceeeekay · 2023-11-20T23:08:54Z

I have the same, but in /etc/default/elasticsearch in my case. Also upgrading 8.11.0 to 8.11.1.

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#ES_RESTART_ON_UPGRADE=true

mark-vieira · 2023-11-20T23:20:22Z

I wasn't able to reproduce this going from 8.11.0 to 8.11.1 either. There's nothing in our packaging scripts that would call systemctl unless RESTART_ON_UPGRADE is set. Something must have triggered the restart though. Can you share any of the journalctl log entries immediately prior to the start failure. I'm wondering if there's anything that might indicate what triggered the restart.

ceeeekay · 2023-11-21T00:49:30Z

Here's what I have: https://gist.github.com/ceeeekay/8e407092ef24ac89dec897c1c1748e1a

Seems to be looking for x-pack-core-8.11.0.jar and not finding it.

mark-vieira · 2023-11-21T01:08:41Z

This sound suspiciously like the server actually errored during the upgrade and systemd attempted to restart it. Looks like we tried to load a class from a jar that no longer exist because it was replaced during the upgrade. This is likely a real problem.

@rjernst Thoughts here? This was likely always a problem. We don't really support in-place distribution upgrades and we might even document this somewhere but this is exactly what a Linux package upgrade is doing. I'm wondering if doing a package upgrade on a running node should be supported, or if we should stop, upgrade, then restart to avoid this scenario.

Second, I'm wondering if there have been some changes that make this more likely. Are we loading service providers more often? Or not caching results of these when we probably should? I would expect issues with trying to load classes from jars on disk to be rare for nodes that have been running for any length of time. It does indeed look like there were some recent changes in 8.11 in the specific stacktrace linked above.

ceeeekay · 2023-11-21T01:21:05Z

FWIW in several years of upgrading ES I have never seen a node stop during the package upgrade, but it happened simultaneously to 12 nodes this time (I lost all my data nodes, one of my masters, and two ingest nodes).

Usually, by the time I've finished all my rolling restarts, the last node will have been running fine for an hour after the package upgrade occurred.

holgr · 2023-12-07T16:29:41Z

Updating this to mention that the same is also happening for me with the upgrade to 8.11.2-1.

Update: It also seems to be stumbling over this one:

[2023-12-07T16:31:15,533][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [nodename] fatal error in thread [elasticsearch[nodename][management][T#4]], exiting
java.util.ServiceConfigurationError: Error loading SPI class list from URL: jar:file:///usr/share/elasticsearch/modules/x-pack-core/x-pack-core-8.11.1.jar!/META-INF/services/org.elasticsearch.action.admin.cluster.node.info.ComponentVersionNumber
[..]

Looking up component versions through SPI should not change. This commit captures the component versions of the running node once during startup, rather than every time node info is called. closes elastic#102103

rjernst · 2023-12-13T17:58:56Z

Since loading classes could in theory happen at any time, we won't ever be able to stop this error from occurring completely. However, in this specific case, I think it is more likely to happen now because the component versions are loaded by SPI every time the node info api is called. I've opened #103408 to fix that.

DomDaigle · 2023-12-13T21:15:21Z

Hi, I got the same problem today upgrading from 8.11.1 to 8.11.3. Didn't have issue on my last upgrade from 8.10.4 to 8.11.1.

ERROR from journalctl

Dec 13 11:17:59 slqelk001 systemd-entrypoint[1116]: ERROR: Elasticsearch exited unexpectedly, with exit code 1
Dec 13 11:17:59 slqelk001 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Dec 13 11:17:59 slqelk001 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

Error from elasticsearch server log at that same time:

"@timestamp": "2023-12-13T16:17:58.659Z"
"logger": "org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler"
"error": {
      "stack_trace": "java.util.ServiceConfigurationError: Error loading SPI class list from URL: jar:file:///usr/share/elasticsearch/modules/x-pack-core/x-pack-core-8.11.1.jar!/META-INF/services/org.elasticsearch.action.admin.cluster.node.info.ComponentVersionNumber\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.spi.SPIClassIterator.loadNextProfile(SPIClassIterator.java:136)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.spi.SPIClassIterator.hasNext(SPIClassIterator.java:148)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.PluginsService.createExtensions(PluginsService.java:389)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.PluginsService.loadServiceProviders(PluginsService.java:344)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.node.NodeService.findComponentVersions(NodeService.java:143)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.node.NodeService.info(NodeService.java:124)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.admin.cluster.node.info.TransportNodesInfoAction.nodeOperation(TransportNodesInfoAction.java:82)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.admin.cluster.node.info.TransportNodesInfoAction.nodeOperation(TransportNodesInfoAction.java:31)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:204)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:565)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$3.onResponse(SecurityServerTransportInterceptor.java:618)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$3.onResponse(SecurityServerTransportInterceptor.java:607)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:455)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:1028)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:994)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$9(AuthorizationService.java:469)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.RBACEngine.authorizeClusterAction(RBACEngine.java:185)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeAction(AuthorizationService.java:459)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.maybeAuthorizeRunAs(AuthorizationService.java:435)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorize$3(AuthorizationService.java:322)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:177)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.RBACEngine.lambda$resolveAuthorizationInfo$0(RBACEngine.java:150)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.lambda$getRoles$4(CompositeRolesStore.java:194)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.lambda$getRole$5(CompositeRolesStore.java:212)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.xcore@8.11.1/org.elasticsearch.xpack.core.security.authz.store.RoleReferenceIntersection.lambda$buildRole$0(RoleReferenceIntersection.java:49)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.support.GroupedActionListener.onResponse(GroupedActionListener.java:56)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.buildRoleFromRoleReference(CompositeRolesStore.java:292)\n\tat org.elasticsearch.xcore@8.11.1/org.elasticsearch.xpack.core.security.authz.store.RoleReferenceIntersection.lambda$buildRole$1(RoleReferenceIntersection.java:53)\n\tat java.base/java.lang.Iterable.forEach(Iterable.java:75)\n\tat org.elasticsearch.xcore@8.11.1/org.elasticsearch.xpack.core.security.authz.store.RoleReferenceIntersection.buildRole(RoleReferenceIntersection.java:53)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRole(CompositeRolesStore.java:210)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRoles(CompositeRolesStore.java:187)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizationInfo(RBACEngine.java:146)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:338)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda$inbound$1(ServerTransportFilter.java:113)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:95)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authc.AuthenticatorChain.authenticateAsync(AuthenticatorChain.java:94)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:261)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:199)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.ServerTransportFilter.authenticate(ServerTransportFilter.java:126)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.ServerTransportFilter.inbound(ServerTransportFilter.java:104)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:629)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:1020)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/modules/x-pack-core/x-pack-core-8.11.1.jar\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)\n\tat java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:171)\n\tat java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)\n\tat java.base/java.nio.file.Files.readAttributes(Files.java:1853)\n\tat java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1445)\n\tat java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:724)\n\tat java.base/java.util.zip.ZipFile.<init>(ZipFile.java:251)\n\tat java.base/java.util.zip.ZipFile.<init>(ZipFile.java:180)\n\tat java.base/java.util.jar.JarFile.<init>(JarFile.java:345)\n\tat java.base/sun.net.www.protocol.jar.URLJarFile.<init>(URLJarFile.java:100)\n\tat java.base/sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)\n\tat java.base/sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:168)\n\tat java.base/sun.net.www.protocol.jar.JarFileFactory.getOrCreate(JarFileFactory.java:91)\n\tat java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:110)\n\tat java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:153)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.spi.SPIClassIterator.loadNextProfile(SPIClassIterator.java:112)\n\t... 57 more\n",
      "type": "java.util.ServiceConfigurationError",
      "message": "Error loading SPI class list from URL: jar:file:///usr/share/elasticsearch/modules/x-pack-core/x-pack-core-8.11.1.jar!/META-INF/services/org.elasticsearch.action.admin.cluster.node.info.ComponentVersionNumber"
    }

Looking up component versions through SPI should not change. This commit captures the component versions of the running node once during startup, rather than every time node info is called. closes #102103

Looking up component versions through SPI should not change. This commit captures the component versions of the running node once during startup, rather than every time node info is called. closes elastic#102103

Looking up component versions through SPI should not change. This commit captures the component versions of the running node once during startup, rather than every time node info is called. closes #102103

mark-vieira · 2023-12-14T16:21:32Z

The core issue causing the exception should be fixed in 8.12, but this still means that folks upgrading from 8.11.x to 8.12.x will be susceptible to this. Something to keep in mind.

holgr added >bug needs:triage Requires assignment of a team area label labels Nov 13, 2023

andreidan added :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts and removed needs:triage Requires assignment of a team area label labels Nov 20, 2023

elasticsearchmachine added the Team:Delivery Meta label for Delivery team label Nov 20, 2023

mark-vieira self-assigned this Nov 20, 2023

mark-vieira added the team-discuss label Dec 7, 2023

rjernst mentioned this issue Dec 13, 2023

Cache component versions #103408

Merged

rjernst closed this as completed in #103408 Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch 8.11.1 RPM restarts process during update #102103

Elasticsearch 8.11.1 RPM restarts process during update #102103

holgr commented Nov 13, 2023

ceeeekay commented Nov 13, 2023

holgr commented Nov 13, 2023

elasticsearchmachine commented Nov 20, 2023

mark-vieira commented Nov 20, 2023

holgr commented Nov 20, 2023

ceeeekay commented Nov 20, 2023

mark-vieira commented Nov 20, 2023

ceeeekay commented Nov 21, 2023

mark-vieira commented Nov 21, 2023 •

edited

ceeeekay commented Nov 21, 2023 •

edited

holgr commented Dec 7, 2023 •

edited

rjernst commented Dec 13, 2023

DomDaigle commented Dec 13, 2023 •

edited

mark-vieira commented Dec 14, 2023 •

edited

Elasticsearch 8.11.1 RPM restarts process during update #102103

Elasticsearch 8.11.1 RPM restarts process during update #102103

Comments

holgr commented Nov 13, 2023

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

ceeeekay commented Nov 13, 2023

holgr commented Nov 13, 2023

elasticsearchmachine commented Nov 20, 2023

mark-vieira commented Nov 20, 2023

holgr commented Nov 20, 2023

ceeeekay commented Nov 20, 2023

mark-vieira commented Nov 20, 2023

ceeeekay commented Nov 21, 2023

mark-vieira commented Nov 21, 2023 • edited

ceeeekay commented Nov 21, 2023 • edited

holgr commented Dec 7, 2023 • edited

rjernst commented Dec 13, 2023

DomDaigle commented Dec 13, 2023 • edited

mark-vieira commented Dec 14, 2023 • edited

mark-vieira commented Nov 21, 2023 •

edited

ceeeekay commented Nov 21, 2023 •

edited

holgr commented Dec 7, 2023 •

edited

DomDaigle commented Dec 13, 2023 •

edited

mark-vieira commented Dec 14, 2023 •

edited