Managed executor thread has the AppClassLoader on HttpClient dependent task #41311

antoniomacri · 2024-06-19T14:16:45Z

Describe the bug

I'm passing the ManagedExecutor to the Java HttpClient in order for it to run its tasks. However, when I set a "small" CPU limit on the docker container (via docker run or k8s CPU limit) in dependent tasks the contextual classloader is the JVM jdk.internal.loader.ClassLoaders$AppClassLoader. As a consequence, I have ClassNotFoundExceptions.

Expected behavior

The classloader on threads running dependent tasks should be the RunnerClassLoader from Quarkus.

Actual behavior

The classloader on threads running dependent tasks is the JVM jdk.internal.loader.ClassLoaders$AppClassLoader.

How to Reproduce?

Clone https://github.com/antoniomacri/quarkus-reproducer-http-client-classloader.git

Then run

./build-image.sh && ./run-image.sh

On another shell run:

curl http://localhost:8080/test

The expected output should be:

io.quarkus.bootstrap.runner.RunnerClassLoader

but instead it is:

jdk.internal.loader.ClassLoaders$AppClassLoader

If I set --cpus=2.1 or greater, the classloader is correct.

Output of `uname -a` or `ver`

No response

Output of `java -version`

No response

Quarkus version or git rev

3.11.2

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

No response

Additional information

Running MacOS on M2 Pro with 12 cores

The text was updated successfully, but these errors were encountered:

antoniomacri · 2024-06-20T07:28:59Z

Setting -XX:ActiveProcessorCount=N, instead of running in a docker container with cpu limit, produces the same results.

geoand · 2024-06-21T08:31:01Z

Thanks for reporting.

I believe this is not a Quarkus issue

antoniomacri · 2024-06-21T20:13:38Z

Hi @geoand. Could you expand on that?

geoand · 2024-06-22T12:21:24Z

We don't do anything specific for either the HTTP client or respond specifically to container limits.

I would suggest trying to reproduce something similar (essentially the executor not being taken into account) with plain Java

antoniomacri · 2024-06-23T06:57:44Z

I may be wrong, but the executor seems OK to me. From this code:

    @PostConstruct
    void createHttpClient() {
        this.httpClient = HttpClient.newBuilder()
                .executor(managedExecutor)
                .version(HttpClient.Version.HTTP_2)
                .build();

        managedExecutor.supplyAsync(() -> "").thenApplyAsync(response -> {
            Log.infof("PostConstruct thread=%s", Thread.currentThread().hashCode());
            Log.infof("PostConstruct classLoaderAfter=%s", Thread.currentThread().getContextClassLoader());
            return Thread.currentThread().getContextClassLoader().getClass().getName();
        }, managedExecutor);
    }


    @GET
    @Produces({MediaType.TEXT_PLAIN})
    public CompletableFuture<String> test() {
        URI uri = URI.create("http://www.columbia.edu/~fdc/sample.html");
        HttpRequest request = HttpRequest.newBuilder().GET().uri(uri).build();

        Log.infof("classLoaderBefore=%s", Thread.currentThread().getContextClassLoader());
        return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofString()).thenApplyAsync(response -> {
            Log.infof("thread=%s", Thread.currentThread().hashCode());
            Log.infof("classLoaderAfter=%s", Thread.currentThread().getContextClassLoader());
            return Thread.currentThread().getContextClassLoader().getClass().getName();
        }, managedExecutor);
    }

I get this log:

2024-06-23 08:46:15,765 INFO  [ClassloadingResource] (executor-thread-1) PostConstruct thread=10217343
2024-06-23 08:46:15,765 INFO  [ClassloadingResource] (executor-thread-1) PostConstruct classLoaderAfter=io.quarkus.bootstrap.runner.RunnerClassLoader@1
2024-06-23 08:46:15,766 INFO  [ClassloadingResource] (vert.x-eventloop-thread-1) classLoaderBefore=io.quarkus.bootstrap.runner.RunnerClassLoader@1
2024-06-23 08:46:16,093 INFO  [ClassloadingResource] (executor-thread-2) thread=1224931749
2024-06-23 08:46:16,094 INFO  [ClassloadingResource] (executor-thread-2) classLoaderAfter=jdk.internal.loader.ClassLoaders$AppClassLoader@3d71d552

The thread appears to be from the managed executor.

The problem seems related to the classloader not being set/propagated on thread context.

Correct?

dmlloyd · 2024-06-24T16:00:53Z

This does seem like a real bug to me. Our thread pools should always be propagating the correct class loader, so it's odd that it's being erased somehow.

geoand · 2024-06-25T07:49:47Z

cc @manovotn for the ManagedExecutor part

manovotn · 2024-06-25T10:32:39Z

cc @manovotn for the ManagedExecutor part

Looking at this part:

        return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofString()).thenApplyAsync(response -> {
            Log.infof("thread=%s", Thread.currentThread().hashCode());
            Log.infof("classLoaderAfter=%s", Thread.currentThread().getContextClassLoader());
            return Thread.currentThread().getContextClassLoader().getClass().getName();
        }, managedExecutor);

The ManagedExecutor will capture contexts from the thread where it is invoked.
You firstly call httpClient.sendAsync and IIUIC, this goes outside of Quarkus and can use different TCCL. The ManagedExecutor then just captures whatever was used and keeps using it within the subsequent thenApplyAsync call.

Now, the above is what I think happens, I need to get a closer look this afternoon to verify that claim.

antoniomacri · 2024-06-25T12:51:37Z

Thanks @manovotn. I set the .executor(managedExecutor) on the HttpClient builder, so it should already have the correct TCCL, I think.

manovotn · 2024-06-25T13:02:14Z

Thanks @manovotn. I set the .executor(managedExecutor) on the HttpClient builder, so it should already have the correct TCCL, I think.

Yes, I noticed that. I am looking deeper into it just now.
What's off is this - below is a string representation of the ManagedExecutor you're getting and it includes the list of contexts it is propagating (which by default should be all that it finds providers for):

io.smallrye.context.SmallRyeManagedExecutor, with maxAsync: -1, with maxQueued: -1, with cleared contexts: [], with propagated contexts: [io.quarkus.arc.runtime.context.ArcContextProvider@34e1e537], with unchanged contexts: [],  with injection point name: no-ip

This looks like only ArcContextProvider is present and so only CDI context would be propagated and not the TCCL.

FTR, this section of docs describes what contexts are (conditionally) present in Quarkus. The one handling TCCL should be the Application context.
I am not yet sure why it isn't present...

manovotn · 2024-06-25T15:15:24Z

I am not yet sure why it isn't present...

Ok, so normally SR CP has a dedicated module that performs this (here) but Quarkus doesn't seem to use this at all, so I assume our thread pool does this elsewhere automagically.
@dmlloyd out of curiosity, any pointers as to where our thread pools handle the tccl setup?

Anyhow, I did try adding the SR module manually and even with that in place the same issue appears - propagation actually happens but at some point the TCCL underneath changes and the propagation carries that information forward.
In other words, this doesn't seem like the problem of propagation or of ManagedExecutor.

I cannot put my finger on the exact spot which causes the issue but it seems to be originating inside HttpClientImpl.
It appears their (async) tasks are handled via "selector manager thread" (see HttpClientImpl$SelectorManager JDK class) which then delegates to the user provided executor - however, the context capture happens at this point and you therefore read the TCCL from their thread instead of the original one in your reproducer class.

I won't pretend I understand why is this happening and especially why it isn't happening when you don't set the CPU limits so ideas are welcome 🤔

dmlloyd · 2024-06-25T15:48:41Z

I don't know about how any of the higher-level frameworks do it, but at the lowest level the thread pool (EnhancedQueueExecutor) captures the TCCL of the calling thread at the time that Executor#execute (or any other task-submit method) is called. So if it's wrong at one call site, the "wrongness" could spread through task propagation as those tasks in turn execute more tasks. If it originates in the JDK, that would explain a lot.

manovotn · 2024-06-25T21:17:59Z

I see.
That corresponds to what the SR impl does just at a lower level.

The biggest mystery here is that the reproducer actually works if you increase the --cpus=2.1 option with docker...

dmlloyd · 2024-06-25T21:58:25Z

Poking around in the JDK, it looks like it only uses the delegate executor sometimes. Sometimes you're running from the selector manager thread which doesn't necessarily do anything with TCCL and is controlled completely within the HttpClient implementation. I guess the problem likely originates here. Maybe having more threads available increases the likelihood of the source "infecting" task running on a safe thread pool thread instead of the selector thread?

It doesn't look like there's any way to access that thread, in any event, other than through a subscription to a socket flow event (like read- or write-ready events).

manovotn · 2024-06-25T23:29:08Z

Poking around in the JDK, it looks like it only uses the delegate executor sometimes. Sometimes you're running from the selector manager thread which doesn't necessarily do anything with TCCL and is controlled completely within the HttpClient implementation. I guess the problem likely originates here. Maybe having more threads available increases the likelihood of the source "infecting" task running on a safe thread pool thread instead of the selector thread?

Ok that seems to confirm my earlier findings, thanks for looking into it as well!

It doesn't look like there's any way to access that thread, in any event, other than through a subscription to a socket flow event (like read- or write-ready events).

I think there is no guarantee that the executor you pass to the HttpClient will "just execute the task" on it - it has no notion of possible context propagation we are trying to do and hence can affect the result in certain ways, TCCL included 🤷

geoand · 2024-06-26T05:04:24Z

Thanks for reporting.

I believe this is not a Quarkus issue

In addition to the findings of Matej and David, another reason I still believe this is that we've never encountered such an issue with any Quarkus managed component

dmlloyd · 2024-06-26T12:47:27Z

A workaround could be to wrap the injected ManagedExecutor with your own executor that simply initializes the TCCL before submitting the task.

antoniomacri · 2024-06-28T14:15:42Z

Poking around in the JDK, it looks like it only uses the delegate executor sometimes. Sometimes you're running from the selector manager thread which doesn't necessarily do anything with TCCL and is controlled completely within the HttpClient implementation. I guess the problem likely originates here. Maybe having more threads available increases the likelihood of the source "infecting" task running on a safe thread pool thread instead of the selector thread?

By "selector thread" you mean the thread that is awoken when the socket receives data? Any pointer to the source code?

However it seems deterministic. I've never encountered the issue with -XX:ActiveProcessorCount=3 but always had it with -XX:ActiveProcessorCount=2.

A workaround could be to wrap the injected ManagedExecutor with your own executor that simply initializes the TCCL before submitting the task.

Something like this?

private static Executor createContextualExecutor(Executor innerExecutor) {
    final ClassLoader classLoaderToPropagate = Thread.currentThread().getContextClassLoader();
    return command -> {
        Thread currentThread = Thread.currentThread();
        ClassLoader old = currentThread.getContextClassLoader();
        currentThread.setContextClassLoader(classLoaderToPropagate);
        try {
            innerExecutor.execute(command);
        } finally {
            currentThread.setContextClassLoader(old);
        }
    };
}

Thanks!

manovotn · 2024-06-28T15:28:32Z

By "selector thread" you mean the thread that is awoken when the socket receives data? Any pointer to the source code?

This is code directly in the JDK's HttpClientImpl
Take a look at the private static final class SelectorManager defined inside jdk.internal.net.http.HttpClientImpl.

However it seems deterministic. I've never encountered the issue with -XX:ActiveProcessorCount=3 but always had it with -XX:ActiveProcessorCount=2.

Yes, the reproducer works well in that regard (thanks for taking the time to make it!).
What David means, or at least what I observed as well, is that the selector thread is not unconditionally used but the choice is outside of our reach - JDK internals seem to make that decision, maybe with the processor limitation in mind 🤷

Something like this?

Yes, that should work

manovotn · 2024-07-04T10:27:38Z

Closing this issue as I don't think we can do more here.
Feel free to reopen if you feel otherwise.

antoniomacri added the kind/bug Something isn't working label Jun 19, 2024

manovotn added kind/question Further information is requested and removed kind/bug Something isn't working labels Jul 4, 2024

manovotn closed this as completed Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managed executor thread has the AppClassLoader on HttpClient dependent task #41311

Managed executor thread has the AppClassLoader on HttpClient dependent task #41311

antoniomacri commented Jun 19, 2024

antoniomacri commented Jun 20, 2024 •

edited

Loading

geoand commented Jun 21, 2024

antoniomacri commented Jun 21, 2024

geoand commented Jun 22, 2024

antoniomacri commented Jun 23, 2024

dmlloyd commented Jun 24, 2024

geoand commented Jun 25, 2024

manovotn commented Jun 25, 2024

antoniomacri commented Jun 25, 2024

manovotn commented Jun 25, 2024

manovotn commented Jun 25, 2024

dmlloyd commented Jun 25, 2024

manovotn commented Jun 25, 2024

dmlloyd commented Jun 25, 2024

manovotn commented Jun 25, 2024

geoand commented Jun 26, 2024

dmlloyd commented Jun 26, 2024

antoniomacri commented Jun 28, 2024

manovotn commented Jun 28, 2024 •

edited

Loading

manovotn commented Jul 4, 2024

Managed executor thread has the AppClassLoader on HttpClient dependent task #41311

Managed executor thread has the AppClassLoader on HttpClient dependent task #41311

Comments

antoniomacri commented Jun 19, 2024

Describe the bug

Expected behavior

Actual behavior

How to Reproduce?

Output of uname -a or ver

Output of java -version

Quarkus version or git rev

Build tool (ie. output of mvnw --version or gradlew --version)

Additional information

antoniomacri commented Jun 20, 2024 • edited Loading

geoand commented Jun 21, 2024

antoniomacri commented Jun 21, 2024

geoand commented Jun 22, 2024

antoniomacri commented Jun 23, 2024

dmlloyd commented Jun 24, 2024

geoand commented Jun 25, 2024

manovotn commented Jun 25, 2024

antoniomacri commented Jun 25, 2024

manovotn commented Jun 25, 2024

manovotn commented Jun 25, 2024

dmlloyd commented Jun 25, 2024

manovotn commented Jun 25, 2024

dmlloyd commented Jun 25, 2024

manovotn commented Jun 25, 2024

geoand commented Jun 26, 2024

dmlloyd commented Jun 26, 2024

antoniomacri commented Jun 28, 2024

manovotn commented Jun 28, 2024 • edited Loading

manovotn commented Jul 4, 2024

Output of `uname -a` or `ver`

Output of `java -version`

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

antoniomacri commented Jun 20, 2024 •

edited

Loading

manovotn commented Jun 28, 2024 •

edited

Loading