Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest fails on MacOS M1 #28033

Open
holly-cummins opened this issue Sep 17, 2022 · 9 comments
Labels
env/m1 Impacts Apple M1 machines kind/bug Something isn't working

Comments

@holly-cummins
Copy link
Contributor

Describe the bug

Note: This test has been disabled on M1 as part of #27156

This is the error:

[INFO] Running io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest
2022-09-17 13:00:53,398 INFO  [io.qua.ely.sec.pro.dep.ElytronPropertiesProcessor] (build-12) Configuring from MPRealmConfig
2022-09-17 13:00:53,960 INFO  [io.quarkus] (main) quarkus-smallrye-graphql-deployment 999-SNAPSHOT on JVM (powered by Quarkus 999-SNAPSHOT) started in 0.336s. Listening on: http://localhost:8081
2022-09-17 13:00:53,960 INFO  [io.quarkus] (main) Profile test activated. 
2022-09-17 13:00:53,960 INFO  [io.quarkus] (main) Installed features: [cdi, hibernate-validator, jaeger, resteasy-reactive, security, security-properties-file, smallrye-context-propagation, smallrye-graphql, smallrye-metrics, smallrye-opentracing, vertx]
2022-09-17 13:00:57,265 INFO  [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-29) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files
[zillions of repeated log warnings]
2022-09-17 13:00:57,278 INFO  [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-32) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files
2022-09-17 13:00:57,278 INFO  [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-32) Retrying connect to {}->http://localhost:8081
2022-09-17 13:00:58,267 WARN  [io.net.cha.DefaultChannelPipeline] (vert.x-acceptor-thread-0) An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: java.io.IOException: Too many open files
        at java.base/sun.nio.ch.Net.accept(Native Method)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:425)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
        at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:119)
        at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:116)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
        at io.netty.util.internal.SocketUtils.accept(SocketUtils.java:116)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:154)
        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:79)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:487)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:833)

2022-09-17 13:00:59,268 WARN  [io.net.cha.DefaultChannelPipeline] (vert.x-acceptor-thread-0) An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: java.io.IOException: Too many open files
        at java.base/sun.nio.ch.Net.accept(Native Method)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:425)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
        at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:119)
        at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:116)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
        at io.netty.util.internal.SocketUtils.accept(SocketUtils.java:116)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:154)
        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:79)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:487)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:833)

2022-09-17 13:00:59,365 INFO  [io.quarkus] (main) quarkus-smallrye-graphql-deployment stopped in 0.087s
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.169 s <<< FAILURE! - in io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest
[ERROR] io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest.concurrentAllFilmsOnly  Time elapsed: 5.291 s  <<< ERROR!

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

Undo the disabling of the test, and then

mvn verify -f extensions/smallrye-graphql/deployment

Also can be reproduced on the Mac CI.

Output of uname -a or ver

No response

Output of java -version

No response

GraalVM version (if different from Java)

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@holly-cummins holly-cummins added the kind/bug Something isn't working label Sep 17, 2022
@quarkus-bot quarkus-bot bot added area/graphql area/smallrye env/m1 Impacts Apple M1 machines labels Sep 17, 2022
@quarkus-bot
Copy link

quarkus-bot bot commented Sep 17, 2022

@holly-cummins holly-cummins changed the title io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest fails on M1 io.quarkus.smallrye.graphql.deployment.ConcurrentAuthTest fails on MacOS M1 Sep 17, 2022
holly-cummins added a commit to holly-cummins/quarkus that referenced this issue Sep 17, 2022
@gastaldi
Copy link
Contributor

I've seen this error happen sporadically with other tests as well.

Can you reproduce this issue when running this single test only?

@gsmet
Copy link
Member

gsmet commented Sep 17, 2022

Yeah I wonder if we are hitting the infamous file descriptor exhaustion issue you all had with macOS.

@holly-cummins
Copy link
Contributor Author

I didn't raise a defect until I'd reproduced locally .... but now I can't reproduce locally (even after un-disabling). So I suspect Guillaume may be right about the file descriptor issue.

holly-cummins added a commit to holly-cummins/quarkus that referenced this issue Sep 18, 2022
@cescoffier
Copy link
Member

I just built the graphql extension on my M1 - no problem.

@holly-cummins
Copy link
Contributor Author

holly-cummins commented Sep 18, 2022

I un-disabled the test, ran a build, and same thing - the build overall was a bit unhappy and hung. That complicates diagnosis, but https://github.com/quarkusio/quarkus/actions/runs/3076919532/jobs/4971564330 shows too many open files in smallrye graphql deployment. I'll add the guard again, with an extra condition to only disable things in CI. We certainly won't have less M1 coverage with that than we do now.

2022-09-18T12:25:23.8743260Z 2022-09-18 08:25:23,873 INFO  [io.quarkus] (main) Profile test activated. 
2022-09-18T12:25:23.8743750Z 2022-09-18 08:25:23,873 INFO  [io.quarkus] (main) Installed features: [cdi, hibernate-validator, jaeger, resteasy-reactive, security, security-properties-file, smallrye-context-propagation, smallrye-graphql, smallrye-metrics, smallrye-opentracing, vertx]
2022-09-18T12:25:26.4024660Z 2022-09-18 08:25:26,210 INFO  [org.apa.htt.imp.cli.DefaultHttpClient] (pool-183-thread-3) I/O exception (java.net.SocketException) caught when connecting to {}->http://localhost:8081: Too many open files

(When I had the test disabled, I had a CI run which was clean apart from #28035. So there's clearly some intermittency, and possibly a machine-specific issue, but it seems to be specific to that test (based on the sample of 4).)

holly-cummins added a commit to holly-cummins/quarkus that referenced this issue Sep 18, 2022
Also specify Mac OS

Reverse disabling test

Re-disable test
@cescoffier
Copy link
Member

the too many open files is just happening at that point because we reach the limit.
So, it's either a leak somewhere or just life (Maven opens lots of files).

It would be interesting to run the build with a profile or collecting JFR events (there is probably an even tracking open / close files)

You may need to configure your ulimit - but that's something I generally avoid as, after that, you may have a different behavior during load tests (as you will be able to open more connections)

holly-cummins added a commit to holly-cummins/quarkus that referenced this issue Sep 19, 2022
Also specify Mac OS

Reverse disabling test

Re-disable test
@holly-cummins
Copy link
Contributor Author

I agree that it happens at that point because we hit the limit, but the interesting thing is that we don't hit the limit at all without that test. So that test/underlying code must be particularly hungry (or leaky) in its file consumption. The good news is it perhaps narrows down the scope of what we will need to profile, following @cescoffier's suggestions.

This issue can track the investigation to decide whether there's something we can/should change on our side, or whether the test and implementation are hungry-but-correct and we need to adjust the machine configuration.

@cescoffier
Copy link
Member

That is weird, I would need to track how many file descriptors are opened by this test or if we have a leak.

geoand pushed a commit to holly-cummins/quarkus that referenced this issue Sep 19, 2022
Also specify Mac OS

Reverse disabling test

Re-disable test
holly-cummins added a commit to holly-cummins/quarkus that referenced this issue Sep 19, 2022
Also specify Mac OS

Reverse disabling test

Re-disable test
igorregis pushed a commit to igorregis/quarkus that referenced this issue Oct 16, 2022
Also specify Mac OS

Reverse disabling test

Re-disable test
igorregis pushed a commit to igorregis/quarkus that referenced this issue Oct 16, 2022
Also specify Mac OS

Reverse disabling test

Re-disable test
igorregis pushed a commit to igorregis/quarkus that referenced this issue Oct 16, 2022
Also specify Mac OS

Reverse disabling test

Re-disable test
igorregis pushed a commit to igorregis/quarkus that referenced this issue Oct 17, 2022
Also specify Mac OS

Reverse disabling test

Re-disable test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
env/m1 Impacts Apple M1 machines kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants