Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch 8.11.1 RPM restarts process during update #102103

Closed
holgr opened this issue Nov 13, 2023 · 14 comments · Fixed by #103408
Closed

Elasticsearch 8.11.1 RPM restarts process during update #102103

holgr opened this issue Nov 13, 2023 · 14 comments · Fixed by #103408
Assignees
Labels
>bug :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team team-discuss

Comments

@holgr
Copy link

holgr commented Nov 13, 2023

Elasticsearch Version

8.11.1

Installed Plugins

No response

Java Version

bundled

OS Version

Rocky 9.2 5.14.0-70.26.1.el9_0.x86_64

Problem Description

While updating several nodes from ES 8.11.0 to 8.11.1 I noticed that the nodes tried to restart the elasticsearch process. In previous versions this was not the case and required a manual step. Additionally the restart process failed on all nodes I updated. A manual systemctl restart elasticsearch worked however and the nodes are back up.

Steps to Reproduce

Install the elasticsearch-8.11.1 RPM from https://artifacts.elastic.co/packages/8.x/yum/

Logs (if relevant)

Nov 13 21:21:09 hostname systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Nov 13 21:21:09 hostname systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
Nov 13 21:21:09 hostname systemd[1]: elasticsearch.service: Consumed 21min 32.788s CPU time.
@holgr holgr added >bug needs:triage Requires assignment of a team area label labels Nov 13, 2023
@ceeeekay
Copy link

I've also run into this issue with the .deb installer. Normally I would preinstall the upgrade and then restart the nodes in an orderly fashion, however this upgrade caused my entire cluster to restart at the same time on 8.11.1, and required manual intervention.

@holgr
Copy link
Author

holgr commented Nov 13, 2023

Yep, that's my exact workflow as well. Glad I started on a cluster where it didn't really matter.

@andreidan andreidan added :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts and removed needs:triage Requires assignment of a team area label labels Nov 20, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@elasticsearchmachine elasticsearchmachine added the Team:Delivery Meta label for Delivery team label Nov 20, 2023
@mark-vieira mark-vieira self-assigned this Nov 20, 2023
@mark-vieira
Copy link
Contributor

I was unable to replicate this when upgrading 8.10.0 to 8.11.1. The only scenario in which the service should be restarted on upgrade is if RESTART_ON_UPGRADE=true in /etc/sysconfig/elasticsearch and by default that is commented out. Was this perhaps the case? Can you provide which version you upgraded from?

See: https://www.elastic.co/guide/en/elasticsearch/reference/current/rpm.html#rpm-configuring

@holgr
Copy link
Author

holgr commented Nov 20, 2023

The ones described above went from 8.11.0 to 8.11.1. I've since upgraded one additional cluster. It came from 8.10.4 (I think? – it was on the last 8.10.x version) and didn't have this issue.

RESTART_ON_UPGRADE=true is commented out on all of these nodes.

@ceeeekay
Copy link

I have the same, but in /etc/default/elasticsearch in my case. Also upgrading 8.11.0 to 8.11.1.

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#ES_RESTART_ON_UPGRADE=true

@mark-vieira
Copy link
Contributor

I wasn't able to reproduce this going from 8.11.0 to 8.11.1 either. There's nothing in our packaging scripts that would call systemctl unless RESTART_ON_UPGRADE is set. Something must have triggered the restart though. Can you share any of the journalctl log entries immediately prior to the start failure. I'm wondering if there's anything that might indicate what triggered the restart.

@ceeeekay
Copy link

Here's what I have: https://gist.github.com/ceeeekay/8e407092ef24ac89dec897c1c1748e1a

Seems to be looking for x-pack-core-8.11.0.jar and not finding it.

@mark-vieira
Copy link
Contributor

mark-vieira commented Nov 21, 2023

This sound suspiciously like the server actually errored during the upgrade and systemd attempted to restart it. Looks like we tried to load a class from a jar that no longer exist because it was replaced during the upgrade. This is likely a real problem.

@rjernst Thoughts here? This was likely always a problem. We don't really support in-place distribution upgrades and we might even document this somewhere but this is exactly what a Linux package upgrade is doing. I'm wondering if doing a package upgrade on a running node should be supported, or if we should stop, upgrade, then restart to avoid this scenario.

Second, I'm wondering if there have been some changes that make this more likely. Are we loading service providers more often? Or not caching results of these when we probably should? I would expect issues with trying to load classes from jars on disk to be rare for nodes that have been running for any length of time. It does indeed look like there were some recent changes in 8.11 in the specific stacktrace linked above.

@ceeeekay
Copy link

ceeeekay commented Nov 21, 2023

FWIW in several years of upgrading ES I have never seen a node stop during the package upgrade, but it happened simultaneously to 12 nodes this time (I lost all my data nodes, one of my masters, and two ingest nodes).

Usually, by the time I've finished all my rolling restarts, the last node will have been running fine for an hour after the package upgrade occurred.

@holgr
Copy link
Author

holgr commented Dec 7, 2023

Updating this to mention that the same is also happening for me with the upgrade to 8.11.2-1.

Update: It also seems to be stumbling over this one:

[2023-12-07T16:31:15,533][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [nodename] fatal error in thread [elasticsearch[nodename][management][T#4]], exiting
java.util.ServiceConfigurationError: Error loading SPI class list from URL: jar:file:///usr/share/elasticsearch/modules/x-pack-core/x-pack-core-8.11.1.jar!/META-INF/services/org.elasticsearch.action.admin.cluster.node.info.ComponentVersionNumber
[..]

rjernst added a commit to rjernst/elasticsearch that referenced this issue Dec 13, 2023
Looking up component versions through SPI should not change. This commit
captures the component versions of the running node once during startup,
rather than every time node info is called.

closes elastic#102103
@rjernst
Copy link
Member

rjernst commented Dec 13, 2023

Since loading classes could in theory happen at any time, we won't ever be able to stop this error from occurring completely. However, in this specific case, I think it is more likely to happen now because the component versions are loaded by SPI every time the node info api is called. I've opened #103408 to fix that.

@DomDaigle
Copy link

DomDaigle commented Dec 13, 2023

Hi, I got the same problem today upgrading from 8.11.1 to 8.11.3. Didn't have issue on my last upgrade from 8.10.4 to 8.11.1.

ERROR from journalctl

Dec 13 11:17:59 slqelk001 systemd-entrypoint[1116]: ERROR: Elasticsearch exited unexpectedly, with exit code 1
Dec 13 11:17:59 slqelk001 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Dec 13 11:17:59 slqelk001 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

Error from elasticsearch server log at that same time:

"@timestamp": "2023-12-13T16:17:58.659Z"
"logger": "org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler"
"error": {
      "stack_trace": "java.util.ServiceConfigurationError: Error loading SPI class list from URL: jar:file:///usr/share/elasticsearch/modules/x-pack-core/x-pack-core-8.11.1.jar!/META-INF/services/org.elasticsearch.action.admin.cluster.node.info.ComponentVersionNumber\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.spi.SPIClassIterator.loadNextProfile(SPIClassIterator.java:136)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.spi.SPIClassIterator.hasNext(SPIClassIterator.java:148)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.PluginsService.createExtensions(PluginsService.java:389)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.PluginsService.loadServiceProviders(PluginsService.java:344)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.node.NodeService.findComponentVersions(NodeService.java:143)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.node.NodeService.info(NodeService.java:124)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.admin.cluster.node.info.TransportNodesInfoAction.nodeOperation(TransportNodesInfoAction.java:82)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.admin.cluster.node.info.TransportNodesInfoAction.nodeOperation(TransportNodesInfoAction.java:31)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:204)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:565)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$3.onResponse(SecurityServerTransportInterceptor.java:618)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$3.onResponse(SecurityServerTransportInterceptor.java:607)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:455)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:1028)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService$AuthorizationResultListener.onResponse(AuthorizationService.java:994)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$9(AuthorizationService.java:469)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.RBACEngine.authorizeClusterAction(RBACEngine.java:185)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeAction(AuthorizationService.java:459)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.maybeAuthorizeRunAs(AuthorizationService.java:435)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorize$3(AuthorizationService.java:322)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:177)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.RBACEngine.lambda$resolveAuthorizationInfo$0(RBACEngine.java:150)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.lambda$getRoles$4(CompositeRolesStore.java:194)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.lambda$getRole$5(CompositeRolesStore.java:212)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.xcore@8.11.1/org.elasticsearch.xpack.core.security.authz.store.RoleReferenceIntersection.lambda$buildRole$0(RoleReferenceIntersection.java:49)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.support.GroupedActionListener.onResponse(GroupedActionListener.java:56)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.buildRoleFromRoleReference(CompositeRolesStore.java:292)\n\tat org.elasticsearch.xcore@8.11.1/org.elasticsearch.xpack.core.security.authz.store.RoleReferenceIntersection.lambda$buildRole$1(RoleReferenceIntersection.java:53)\n\tat java.base/java.lang.Iterable.forEach(Iterable.java:75)\n\tat org.elasticsearch.xcore@8.11.1/org.elasticsearch.xpack.core.security.authz.store.RoleReferenceIntersection.buildRole(RoleReferenceIntersection.java:53)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRole(CompositeRolesStore.java:210)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.store.CompositeRolesStore.getRoles(CompositeRolesStore.java:187)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizationInfo(RBACEngine.java:146)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:338)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda$inbound$1(ServerTransportFilter.java:113)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:95)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authc.AuthenticatorChain.authenticateAsync(AuthenticatorChain.java:94)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:261)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:199)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.ServerTransportFilter.authenticate(ServerTransportFilter.java:126)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.ServerTransportFilter.inbound(ServerTransportFilter.java:104)\n\tat org.elasticsearch.security@8.11.1/org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:629)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:1020)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/modules/x-pack-core/x-pack-core-8.11.1.jar\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)\n\tat java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:171)\n\tat java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)\n\tat java.base/java.nio.file.Files.readAttributes(Files.java:1853)\n\tat java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1445)\n\tat java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:724)\n\tat java.base/java.util.zip.ZipFile.<init>(ZipFile.java:251)\n\tat java.base/java.util.zip.ZipFile.<init>(ZipFile.java:180)\n\tat java.base/java.util.jar.JarFile.<init>(JarFile.java:345)\n\tat java.base/sun.net.www.protocol.jar.URLJarFile.<init>(URLJarFile.java:100)\n\tat java.base/sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)\n\tat java.base/sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:168)\n\tat java.base/sun.net.www.protocol.jar.JarFileFactory.getOrCreate(JarFileFactory.java:91)\n\tat java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:110)\n\tat java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:153)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.spi.SPIClassIterator.loadNextProfile(SPIClassIterator.java:112)\n\t... 57 more\n",
      "type": "java.util.ServiceConfigurationError",
      "message": "Error loading SPI class list from URL: jar:file:///usr/share/elasticsearch/modules/x-pack-core/x-pack-core-8.11.1.jar!/META-INF/services/org.elasticsearch.action.admin.cluster.node.info.ComponentVersionNumber"
    }

rjernst added a commit that referenced this issue Dec 14, 2023
Looking up component versions through SPI should not change. This commit
captures the component versions of the running node once during startup,
rather than every time node info is called.

closes #102103
rjernst added a commit to rjernst/elasticsearch that referenced this issue Dec 14, 2023
Looking up component versions through SPI should not change. This commit
captures the component versions of the running node once during startup,
rather than every time node info is called.

closes elastic#102103
elasticsearchmachine pushed a commit that referenced this issue Dec 14, 2023
Looking up component versions through SPI should not change. This commit
captures the component versions of the running node once during startup,
rather than every time node info is called.

closes #102103
@mark-vieira
Copy link
Contributor

mark-vieira commented Dec 14, 2023

The core issue causing the exception should be fixed in 8.12, but this still means that folks upgrading from 8.11.x to 8.12.x will be susceptible to this. Something to keep in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team team-discuss
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants