Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoteClusterConnectionTests#testCloseWhileConcurrentlyConnecting leaks a connection #33756

Closed
DaveCTurner opened this issue Sep 17, 2018 · 4 comments · Fixed by #35038
Closed
Labels
:Distributed/Network Http and internode communication implementations >test-failure Triaged test failures from CI v5.6.12

Comments

@DaveCTurner
Copy link
Contributor

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-unix-compatibility/os=oraclelinux/2/console failed as follows:

java.lang.AssertionError: still open connections: {{seed_node_1}{CslcNVg2RUmfUOjafi8V-g}{OCHf8nq9Q-afBlEmBTuW8A}{127.0.0.1}{127.0.0.1:10301}=[org.elasticsearch.test.transport.MockTransportService$6@3f3b53e]}
	at __randomizedtesting.SeedInfo.seed([E0882D34B487919C:E6C871A1A79BE13A]:0)
	at org.elasticsearch.test.transport.MockTransportService.doClose(MockTransportService.java:865)
	at org.elasticsearch.common.component.AbstractLifecycleComponent.close(AbstractLifecycleComponent.java:109)
	at org.elasticsearch.transport.RemoteClusterConnectionTests.testCloseWhileConcurrentlyConnecting(RemoteClusterConnectionTests.java:716)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.lang.Thread.run(Thread.java:748)

Similar failures occurred on other branches (although the logs are no longer accessible)

July 4th 2018: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+java-time+feature-branch-periodic/169/console

June 18th 2018: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/6511/console

The REPRODUCE WITH line does not reproduce this for me:

./gradlew :core:test \
  -Dtests.seed=E0882D34B487919C \
  -Dtests.class=org.elasticsearch.transport.RemoteClusterConnectionTests \
  -Dtests.method="testCloseWhileConcurrentlyConnecting" \
  -Dtests.security.manager=true \
  -Dtests.locale=de-LU \
  -Dtests.timezone=America/Argentina/Jujuy
@DaveCTurner DaveCTurner added :Distributed/Network Http and internode communication implementations >test-failure Triaged test failures from CI v7.0.0 v5.6.12 labels Sep 17, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@iverase
Copy link
Contributor

iverase commented Sep 21, 2018

We have another failure with a slightly different message:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=amazon/14/console

I cannot reproduce it locally.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Sep 26, 2018
* This should surface what errors are thrown on CI
and in org.elasticsearch.transport.RemoteClusterConnection.ConnectHandler#collectRemoteNodes
(the sequence of caught error in the last catch block and moving on to the next seed node
seems to be the only path by which the errors logged in elastic#33756 could come about)
* Relates elastic#33756
original-brownbear added a commit that referenced this issue Sep 27, 2018
* This should surface what errors are thrown on CI
and in org.elasticsearch.transport.RemoteClusterConnection.ConnectHandler#collectRemoteNodes
(the sequence of caught error in the last catch block and moving on to the next seed node
seems to be the only path by which the errors logged in #33756 could come about)
* Relates #33756
original-brownbear added a commit that referenced this issue Sep 27, 2018
* This should surface what errors are thrown on CI
and in org.elasticsearch.transport.RemoteClusterConnection.ConnectHandler#collectRemoteNodes
(the sequence of caught error in the last catch block and moving on to the next seed node
seems to be the only path by which the errors logged in #33756 could come about)
* Relates #33756
@original-brownbear
Copy link
Member

I removed the 7.0.0 label here. The problem in the issue description cannot occur in 7.x anymore because 2464b68 causes the remote cluster connection to not use the mock tcp transport directly anymore in that branch => mock tcp transport doesn't track open connections because its open connection method isn't used => impossible to run into this for that version (as well as 6.x).

@original-brownbear original-brownbear removed their assignment Oct 25, 2018
kcm pushed a commit that referenced this issue Oct 30, 2018
* This should surface what errors are thrown on CI
and in org.elasticsearch.transport.RemoteClusterConnection.ConnectHandler#collectRemoteNodes
(the sequence of caught error in the last catch block and moving on to the next seed node
seems to be the only path by which the errors logged in #33756 could come about)
* Relates #33756
@original-brownbear
Copy link
Member

This will be fixed by #35038

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Network Http and internode communication implementations >test-failure Triaged test failures from CI v5.6.12
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants