Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NoClassDefFoundError can be thrown from LaunchedClassLoader when threads are interrupted #40096

Closed
liaozan opened this issue Mar 25, 2024 · 14 comments
Assignees
Labels
type: regression A regression from a previous release
Milestone

Comments

@liaozan
Copy link
Contributor

liaozan commented Mar 25, 2024

I am running SpringBoot 3.2.4 on jdk21.
I wrote a startup script that stops the application gracefully and restarts it
While stopping the application, the following error is reported

17:52:42.567  WARN nHook .c.s.DefaultLifecycleProcessor: Failed to stop bean 'webServerGracefulShutdown'

java.lang.NoClassDefFoundError: org/springframework/boot/web/server/GracefulShutdownCallback
	at org.springframework.boot.web.context.WebServerGracefulShutdownLifecycle.stop(WebServerGracefulShutdownLifecycle.java:62)
	at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:348)
	at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:487)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:317)
	at org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:218)
	at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1139)
	at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.doClose(ServletWebServerApplicationContext.java:174)
	at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1093)
	at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: org.springframework.boot.web.server.GracefulShutdownCallback
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)
	at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.loadClass(JarUrlClassLoader.java:104)
	at org.springframework.boot.loader.launch.LaunchedClassLoader.loadClass(LaunchedClassLoader.java:91)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
	... 13 common frames omitted

17:53:12.571  INFO nHook .c.s.DefaultLifecycleProcessor: Shutdown phase 2147482623 ends with 1 bean still running after timeout of 30000ms: [webServerGracefulShutdown]
17:53:12.577  WARN nHook .c.s.DefaultLifecycleProcessor: Failed to stop bean 'webServerStartStop'

java.lang.NoClassDefFoundError: org/apache/catalina/Lifecycle$SingleUse
	at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:254)
	at org.apache.catalina.core.StandardService.removeConnector(StandardService.java:298)
	at org.springframework.boot.web.embedded.tomcat.TomcatWebServer.lambda$removeServiceConnectors$1(TomcatWebServer.java:171)
	at org.springframework.boot.web.embedded.tomcat.TomcatWebServer.doWithConnectors(TomcatWebServer.java:190)
	at org.springframework.boot.web.embedded.tomcat.TomcatWebServer.removeServiceConnectors(TomcatWebServer.java:168)
	at org.springframework.boot.web.embedded.tomcat.TomcatWebServer.stop(TomcatWebServer.java:354)
	at org.springframework.boot.web.servlet.context.WebServerStartStopLifecycle.stop(WebServerStartStopLifecycle.java:53)
	at org.springframework.context.SmartLifecycle.stop(SmartLifecycle.java:117)
	at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:348)
	at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:487)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:317)
	at org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:218)
	at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1139)
	at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.doClose(ServletWebServerApplicationContext.java:174)
	at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1093)
	at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: org.apache.catalina.Lifecycle$SingleUse
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)
	at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.loadClass(JarUrlClassLoader.java:104)
	at org.springframework.boot.loader.launch.LaunchedClassLoader.loadClass(LaunchedClassLoader.java:91)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
	... 20 common frames omitted

17:53:42.577  INFO nHook .c.s.DefaultLifecycleProcessor: Shutdown phase 2147481599 ends with 1 bean still running after timeout of 30000ms: [webServerStartStop]
17:53:42.578  WARN nHook .c.s.DefaultLifecycleProcessor: Failed to stop bean 'redisConnectionFactoryVirtualThreads'

java.lang.NoClassDefFoundError: org/springframework/data/util/Optionals
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.lambda$resetConnection$1(LettuceConnectionFactory.java:1135)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.lambda$doInLock$20(LettuceConnectionFactory.java:1459)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.doInLock(LettuceConnectionFactory.java:1469)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.doInLock(LettuceConnectionFactory.java:1458)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.resetConnection(LettuceConnectionFactory.java:1133)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.stop(LettuceConnectionFactory.java:901)
	at org.springframework.context.SmartLifecycle.stop(SmartLifecycle.java:117)
	at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:348)
	at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:487)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:317)
	at org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:218)
	at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1139)
	at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.doClose(ServletWebServerApplicationContext.java:174)
	at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1093)
	at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: org.springframework.data.util.Optionals
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)
	at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.loadClass(JarUrlClassLoader.java:104)
	at org.springframework.boot.loader.launch.LaunchedClassLoader.loadClass(LaunchedClassLoader.java:91)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
	... 19 common frames omitted

17:54:12.579  INFO nHook .c.s.DefaultLifecycleProcessor: Shutdown phase 0 ends with 1 bean still running after timeout of 30000ms: [redisConnectionFactoryVirtualThreads]
17:54:12.582  WARN nHook .s.b.f.s.DisposableBeanAdapter: Custom destroy method 'shutdown' on bean with name 'lettuceClientResources' propagated an exception: java.lang.NoClassDefFoundError: io/netty/util/concurrent/ImmediateEventExecutor
17:54:12.582  INFO nHook c.j.c.c.c.CanalEventFetcher   : Canal fetcher ad_order_info ready to stop
17:54:12.582  INFO nHook c.j.c.c.c.CanalEventFetcher   : Canal fetcher ad_order_info stop normally
17:54:12.585  INFO nHook c.z.hikari.HikariDataSource   : HikariPool-1 - Shutdown initiated...
17:54:12.589  INFO nHook c.z.hikari.HikariDataSource   : HikariPool-1 - Shutdown completed.

Here is the start script

/bin/bash /bin/bash

APP_NAME="crm"
JAR_NAME="${APP_NAME}.jar"
TARGET_DIR="/data/${APP_NAME}"
JAR_PATH="${TARGET_DIR}/package.tgz"
PID_FILE="${TARGET_DIR}/${APP_NAME}.pid"
curtime() {
    date +'%Y-%m-%d %H:%M:%S'
}
kill_process() {
    if [ -f "$PID_FILE" ]; then
        pid=$(cat "$PID_FILE")
    elif pgrep -f "$JAR_NAME" >/dev/null; then
        pid=$(pgrep -f "$JAR_NAME")
    pid=$(pgrep -f "$JAR_NAME")
    echo "$(curtime) found old $JAR_NAME process, PID: $pid"
    if [ -n "$pid" ]; then
        kill "$pid"
        while ps -p "$pid" >/dev/null; do sleep 1; done
        rm -rf "$PID_FILE"
        echo "$(curtime) old process stopped"
    fi
}
tar xzf "$JAR_PATH" -C "$TARGET_DIR" --overwrite > /dev/null
kill_process
nohup java -XX:-OmitStackTraceInFastThrow -XX:+HeapDumpOnOutOfMemoryError -jar "$TARGET_DIR/$JAR_NAME" --spring.profiles.active=test > / dev/null 2>&1 &
new_pid=$!
echo $new_pid > "$PID_FILE"
echo "$(curtime) New process started, PID: $new_pid"
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Mar 25, 2024
@wilkinsona
Copy link
Member

Thanks for the report. Unfortunately, as things stand, there are too many unknowns for us to efficiently investigate the problem. Can you please provide a complete example that reproduces it?

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Mar 25, 2024
@liaozan
Copy link
Contributor Author

liaozan commented Mar 25, 2024

I also wish I could provide an EXAMPLE, unfortunately this error it is not 100% reproducible.
As it stands, the problem is more likely to occur when the application is running a little longer. If it's just restarted right after a short period of time, it's hardly ever reproducible!

@liaozan
Copy link
Contributor Author

liaozan commented Mar 25, 2024

I'll try to see if I can provide a 100% reproduction of the demo
Is there anything else I should provide other than a demo?

@wilkinsona
Copy link
Member

I'm afraid we really need a demo as it's very hard to diagnose a problem that we cannot reproduce.

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 25, 2024
@liaozan
Copy link
Contributor Author

liaozan commented Mar 25, 2024

Okay, I'll try to provide a minimal reproduction of the example

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 25, 2024
@liaozan
Copy link
Contributor Author

liaozan commented Mar 27, 2024

I did my best to try to provide an example that could be reproduced consistently, but it is completely unreproducible on my own computer. But on CentOS 7.9, it will fail to terminate properly without any regularity

@liaozan
Copy link
Contributor Author

liaozan commented Mar 27, 2024

I think my problem is very similar to #38930, the difference is that I'm using the latest SpringBoot 3.2.4

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 27, 2024
@scottfrederick scottfrederick added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Mar 27, 2024
@wilkinsona
Copy link
Member

wilkinsona commented Mar 27, 2024

Yes, it does look similar to #39930. Unfortunately, that doesn't help us here as no reproducer was provided there either. The class loading failures would suggest that either the class loader has been closed (that should not have happened at this point in the shutdown process) or the jar file has been deleted while the app is running. The latter's out of Spring Boot's control and, judging by your script above, should not be happening here. That leaves the class loader being closed too soon. To confirm or eliminate that possibility, I'm afraid we'll need something that reproduces the problem.

@jocull
Copy link

jocull commented Mar 29, 2024

I am seeing a similar problem in an application that interrupts threads when workers need to rebalance. I am highly suspicious of #37668 but don't know where to start.

  • We already migrated to JDK 17 and didn't have issues with Spring Boot 2.7.x
  • When migrating to 3.2.x now we see ClassNotFoundException or NoClassDefFoundError, but this is highly correlated to the timings of the interrupts.
  • Nothing very helpful shows in the logs, but the interrupts can be traced and we see these errors appear immediately after.

Is it possible that the interrupts are hitting the classloaders and breaking definition lookups? It would be a similar case to what is described here: https://stackoverflow.com/q/56866573/97964

How can I help trace this further?

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 29, 2024
@bclozel bclozel added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Mar 29, 2024
@jocull
Copy link

jocull commented Apr 2, 2024

I have reproduce this locally with our application, but I cannot share the internal business code that will do it. Here is the gist of the operations in the meantime:

  1. Create a pool of threads. Get them doing some work that will require class loading (maybe they make web requests or something?)
  2. Interrupt them all (like Thread.interrupt()). You probably need a lot of class loading to be happening to catch it at the right point in time.
  3. Give them new jobs - classes will fail to load with various exceptions leading to the app being permanently unhealthy
  4. Repeat this in some kind of tight loop, maybe once per second you restart all the jobs.

As another point of information, using the "classic" class loader seems to alleviate the issue on initial testing. I still have to verify this further to be certain that it's improved.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Apr 2, 2024
@scottfrederick scottfrederick added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Apr 2, 2024
@philwebb philwebb removed status: waiting-for-feedback We need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged labels Apr 4, 2024
@philwebb philwebb added the type: regression A regression from a previous release label Apr 4, 2024
@philwebb philwebb added this to the 3.2.x milestone Apr 4, 2024
@philwebb
Copy link
Member

philwebb commented Apr 4, 2024

Thanks very much for the extra details @jocull. The Thread.interrupt() details make me wonder if it's a similar underling cause as #38611

@HamzaAkbar067
Copy link

Yes, it appears like of #39930. However, we will get some help if it reproduce again.

@philwebb philwebb changed the title Can not GracefulShutdown application NoClassDefFoundError can be thrown from LaunchedClassLoader when threads are interrupted Apr 16, 2024
@philwebb
Copy link
Member

I think I can see why this can happen, but I've not been able to replicate it. We're going to try changing the underlying way that we access file data.

philwebb added a commit that referenced this issue Apr 16, 2024
Rename the internal `FileChannelDataBlock` to `FileDataBlock` since we
want to fallback to a `RandomAccessFile` when a thread is interrupted.

See gh-40096
@philwebb
Copy link
Member

@jocull @liaozan I've just pushed something I hope will fix this. If you have the chance, please try the SNAPSHOT before the Thursday release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: regression A regression from a previous release
Projects
None yet
Development

No branches or pull requests

8 participants