Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ESSingleNodeTestCase failed with java.net.BindException: Address already in use #102870

Closed
DaveCTurner opened this issue Dec 1, 2023 · 4 comments
Labels
:Delivery/Build Build or test infrastructure low-risk An open issue or test failure that is a low risk to future releases Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Dec 1, 2023

Apparently we failed to bind to a port, given a range of 100 ports? Seems unlikely it's anything to do with this specific test and more of a test infra problem.

Build scan:
https://gradle-enterprise.elastic.co/s/d72etqezvbpz6/tests/:x-pack:plugin:security:internalClusterTest/org.elasticsearch.xpack.security.authc.pki.PkiOptionalClientAuthTests/testRestClientWithoutClientCertificate

Reproduction line:

gradlew ':x-pack:plugin:security:internalClusterTest' --tests "org.elasticsearch.xpack.security.authc.pki.PkiOptionalClientAuthTests.testRestClientWithoutClientCertificate" -Dtests.seed=56E4D6BA925A014E -Dtests.locale=es-GT -Dtests.timezone=Etc/GMT+10 -Druntime.java=21

Applicable branches:
This specific failure was on 8.11 but seems like it could happen on main too?

Reproduces locally?:
Didn't try

Failure history:
https://es-delivery-stats.elastic.dev/app/dashboards#/view/dcec9e60-72ac-11ee-8f39-55975ded9e63?_g=(refreshInterval:(pause:!t,value:60000),time:(from:now-7d%2Fd,to:now))&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testRestClientWithoutClientCertificate'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.xpack.security.authc.pki.PkiOptionalClientAuthTests'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium))))

Failure excerpt:

org.elasticsearch.transport.BindTransportException: Failed to bind to 127.0.0.1:[60516-60616]

  at org.elasticsearch.transport.TcpTransport.bindToPort(TcpTransport.java:512)
  at org.elasticsearch.transport.TcpTransport.bindServer(TcpTransport.java:473)
  at org.elasticsearch.transport.netty4.Netty4Transport.doStart(Netty4Transport.java:154)
  at org.elasticsearch.xpack.core.security.transport.netty4.SecurityNetty4Transport.doStart(SecurityNetty4Transport.java:126)
  at org.elasticsearch.xpack.security.transport.netty4.SecurityNetty4ServerTransport.doStart(SecurityNetty4ServerTransport.java:62)
  at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:50)
  at org.elasticsearch.transport.TransportService.doStart(TransportService.java:331)
  at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:50)
  at org.elasticsearch.node.Node.start(Node.java:1489)
  at org.elasticsearch.test.ESSingleNodeTestCase.newNode(ESSingleNodeTestCase.java:275)
  at com.carrotsearch.randomizedtesting.RandomizedContext.runWithPrivateRandomness(RandomizedContext.java:187)
  at com.carrotsearch.randomizedtesting.RandomizedContext.runWithPrivateRandomness(RandomizedContext.java:211)
  at org.elasticsearch.test.ESSingleNodeTestCase.startNode(ESSingleNodeTestCase.java:86)
  at org.elasticsearch.test.ESSingleNodeTestCase.setUp(ESSingleNodeTestCase.java:121)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

  Caused by: java.net.BindException: Address already in use: bind

    at sun.nio.ch.Net.bind0(Net.java:-2)
    at sun.nio.ch.Net.bind(Net.java:565)
    at sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:344)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:301)
    at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:562)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
    at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:600)
    at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:579)
    at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
    at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:260)
    at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
    at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:1583)

@DaveCTurner DaveCTurner added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Dec 1, 2023
@elasticsearchmachine elasticsearchmachine added blocker Team:Delivery Meta label for Delivery team labels Dec 1, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@DaveCTurner
Copy link
Contributor Author

https://gradle-enterprise.elastic.co/s/5ouzdltwx22ty looks similar, although this one was an internal-cluster test

@DaveCTurner DaveCTurner added low-risk An open issue or test failure that is a low risk to future releases and removed blocker labels Dec 2, 2023
@mark-vieira
Copy link
Contributor

https://gradle-enterprise.elastic.co/s/5ouzdltwx22ty looks similar, although this one was an internal-cluster test

The initial test above was also an internal cluster test.

So presumably we're giving the test a range of 100 ports to bind to and all of them are taken? Even if we do get overlaps due to our naive algorithm I can't imagine enough tests running in parallel such that all 100 ports are in use. There must be something else on the system consuming these ports. I wonder if there is a way to catch such an error and call out to lsof to see what's taking up these ports at the time?

@slobodanadamovic
Copy link
Contributor

The underlying issue is the same as in #102349. Hence, I'm going to close this one as duplicate in favour of #102349, since it has a bigger context and steps to reproduce it.

@slobodanadamovic slobodanadamovic closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure low-risk An open issue or test failure that is a low risk to future releases Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants