Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Connection failures in repository-s3:integTestMinioRunner credentials tests #32208

Closed
cbuescher opened this issue Jul 19, 2018 · 4 comments
Assignees
Labels
:Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@cbuescher
Copy link
Member

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=ubuntu&&virtual/1190/consoleFull

I cannot reproduce this locally, so it might be temporary. Just filing an issue in case there are more occasions of this:

REPRODUCE WITH: ./gradlew :plugins:repository-s3:integTestMinioRunner \
  -Dtests.seed=180F1CCF65203FAC \
  -Dtests.class=org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT \
  -Dtests.method="test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials}" \
  -Dtests.security.manager=true \
  -Dtests.locale=lt-LT \
  -Dtests.timezone=Asia/Pontianak \
  -Dtests.rest.blacklist=repository_s3/30_repository_temporary_credentials/*,repository_s3/40_repository_ec2_credentials/*

REPRODUCE WITH: ./gradlew :plugins:repository-s3:integTestMinioRunner \
  -Dtests.seed=180F1CCF65203FAC \
  -Dtests.class=org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT \
  -Dtests.method="test {yaml=repository_s3/20_repository_permanent_credentials/Register a repository with a non existing bucket}" \
  -Dtests.security.manager=true \
  -Dtests.locale=lt-LT \
  -Dtests.timezone=Asia/Pontianak \
  -Dtests.rest.blacklist=repository_s3/30_repository_temporary_credentials/*,repository_s3/40_repository_ec2_credentials/*

REPRODUCE WITH: ./gradlew :plugins:repository-s3:integTestMinioRunner \
  -Dtests.seed=180F1CCF65203FAC \
  -Dtests.class=org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT \
  -Dtests.method="test {yaml=repository_s3/20_repository_permanent_credentials/teardown}" \
  -Dtests.security.manager=true \
  -Dtests.locale=lt-LT \
  -Dtests.timezone=Asia/Pontianak \
  -Dtests.rest.blacklist=repository_s3/30_repository_temporary_credentials/*,repository_s3/40_repository_ec2_credentials/*

REPRODUCE WITH: ./gradlew :plugins:repository-s3:integTestMinioRunner \
  -Dtests.seed=180F1CCF65203FAC \
  -Dtests.class=org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT \
  -Dtests.method="test {yaml=repository_s3/20_repository_permanent_credentials/Restore a non existing snapshot}" \
  -Dtests.security.manager=true \
  -Dtests.locale=lt-LT \
  -Dtests.timezone=Asia/Pontianak \
  -Dtests.rest.blacklist=repository_s3/30_repository_temporary_credentials/*,repository_s3/40_repository_ec2_credentials/*

REPRODUCE WITH: ./gradlew :plugins:repository-s3:integTestMinioRunner \
  -Dtests.seed=180F1CCF65203FAC \
  -Dtests.class=org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT \
  -Dtests.method="test {yaml=repository_s3/20_repository_permanent_credentials/Delete a non existing snapshot}" \
  -Dtests.security.manager=true \
  -Dtests.locale=lt-LT \
  -Dtests.timezone=Asia/Pontianak \
  -Dtests.rest.blacklist=repository_s3/30_repository_temporary_credentials/*,repository_s3/40_repository_ec2_credentials/*

REPRODUCE WITH: ./gradlew :plugins:repository-s3:integTestMinioRunner \
  -Dtests.seed=180F1CCF65203FAC \
  -Dtests.class=org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT \
  -Dtests.method="test {yaml=repository_s3/20_repository_permanent_credentials/Get a non existing snapshot}" \
  -Dtests.security.manager=true \
  -Dtests.locale=lt-LT \
  -Dtests.timezone=Asia/Pontianak \
  -Dtests.rest.blacklist=repository_s3/30_repository_temporary_credentials/*,repository_s3/40_repository_ec2_credentials/*

REPRODUCE WITH: ./gradlew :plugins:repository-s3:integTestMinioRunner \
  -Dtests.seed=180F1CCF65203FAC \
  -Dtests.class=org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT \
  -Dtests.method="test {yaml=repository_s3/20_repository_permanent_credentials/Register a repository with a non existing client}" \
  -Dtests.security.manager=true \
  -Dtests.locale=lt-LT \
  -Dtests.timezone=Asia/Pontianak \
  -Dtests.rest.blacklist=repository_s3/30_repository_temporary_credentials/*,repository_s3/40_repository_ec2_credentials/*
09:06:01 Suite: org.elasticsearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT
09:06:01   1> [2018-07-19T14:05:56,864][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] [test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials}]: before test
09:06:01   1> [2018-07-19T14:05:56,967][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] initializing REST clients against [http://[::1]:40403]
09:06:01   1> [2018-07-19T14:05:57,518][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] initializing yaml client, minimum es version: [6.4.0] master version: [6.4.0] hosts: [http://[::1]:40403]
09:06:01   1> [2018-07-19T14:05:58,178][INFO ][o.e.r.s.RepositoryS3ClientYamlTestSuiteIT] Stash dump on test failure [{
09:06:01   1>   "stash" : {
09:06:01   1>     "body" : {
09:06:01   1>       "error" : {
09:06:01   1>         "root_cause" : [
09:06:01   1>           {
09:06:01   1>             "type" : "repository_verification_exception",
09:06:01   1>             "reason" : "[repository_permanent] path [integration_test] is not accessible on master node",
09:06:01   1>             "stack_trace" : "RepositoryVerificationException[[repository_permanent] path [integration_test] is not accessible on master node]; nested: IOException[Unable to upload object [integration_test/tests-Wze0pZCRRB6KLdsRZAwZOg/master.dat] using a single upload]; nested: SdkClientException[Unable to execute HTTP request: Connect to 127.0.0.1:60920 [/127.0.0.1] failed: Connection refused (Connection refused)]; nested: HttpHostConnectException[Connect to 127.0.0.1:60920 [/127.0.0.1] failed: Connection refused (Connection refused)]; nested: ConnectException[Connection refused (Connection refused)];
09:06:01   1> 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.startVerification(BlobStoreRepository.java:644)
09:06:01   1> 	at org.elasticsearch.repositories.RepositoriesService.lambda$verifyRepository$2(RepositoriesService.java:218)
09:06:01   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624)
09:06:01   1> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
09:06:01   1> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
09:06:01   1> 	at java.base/java.lang.Thread.run(Thread.java:844)
09:06:01   1> Caused by: java.io.IOException: Unable to upload object [integration_test/tests-Wze0pZCRRB6KLdsRZAwZOg/master.dat] using a single upload
09:06:01   1> 	at org.elasticsearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:199)
09:06:01   1> 	at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$writeBlob$2(S3BlobContainer.java:100)
09:06:01   1> 	at java.base/java.security.AccessController.doPrivileged(Native Method)
09:06:01   1> 	at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedIOException(SocketAccess.java:48)
09:06:01   1> 	at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlob(S3BlobContainer.java:98)
09:06:01   1> 	at org.elasticsearch.common.blobstore.BlobContainer.writeBlobAtomic(BlobContainer.java:102)
09:06:01   1> 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.startVerification(BlobStoreRepository.java:639)
09:06:01   1> 	... 5 more
09:06:01   1> Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:60920 [/127.0.0.1] failed: Connection refused (Connection refused)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1114)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1064)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
09:06:01   1> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4247)
09:06:01   1> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194)
09:06:01   1> 	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1738)
09:06:01   1> 	at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$executeSingleUpload$7(S3BlobContainer.java:196)
09:06:01   1> 	at org.elasticsearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:57)
09:06:01   1> 	at java.base/java.security.AccessController.doPrivileged(Native Method)
09:06:01   1> 	at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:56)
09:06:01   1> 	at org.elasticsearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:195)
09:06:01   1> 	... 11 more
09:06:01   1> Caused by: org.apache.http.conn.HttpHostConnectException: Connect to 127.0.0.1:60920 [/127.0.0.1] failed: Connection refused (Connection refused)
09:06:01   1> 	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:158)
09:06:01   1> 	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
09:06:01   1> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
09:06:01   1> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
09:06:01   1> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
09:06:01   1> 	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
09:06:01   1> 	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
09:06:01   1> 	at com.amazonaws.http.conn.$Proxy28.connect(Unknown Source)
09:06:01   1> 	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
09:06:01   1> 	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
09:06:01   1> 	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
09:06:01   1> 	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
09:06:01   1> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
09:06:01   1> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
09:06:01   1> 	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1236)
09:06:01   1> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
09:06:01   1> 	... 25 more
@cbuescher cbuescher added >test-failure Triaged test failures from CI :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure labels Jul 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@cbuescher cbuescher added the :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Jul 19, 2018
@DaveCTurner
Copy link
Contributor

DaveCTurner commented Aug 6, 2018

I am guessing that Minio failed to start up properly for some reason, but we close its stdout after finding the port to which it is bound so all I see is this:

 You are running an older version of Minio released 2 weeks ago 
 Update: Run `minio update` 


Endpoint:  http://127.0.0.1:60920

I wondered if perhaps it couldn't bind to that port since it's hard-wired:

final String minioAddress = "127.0.0.1:60920"

However when I tried this the output from Minio looked like the following.

ERROR Unable to start the server: Port :9000 is already in use.
      > Please ensure no other program uses the same address/port.

This suggests it's something else, although not with certainty. I am not confident that it logs anything useful to the console in this situation, but I think that's the first thing we should try. There's also a chance that it falls over because stdout is closed after it's printed the address to which it's bound but before it's finished printing other things.

@dnhatn
Copy link
Member

dnhatn commented Aug 13, 2018

ywelsch added a commit that referenced this issue Aug 14, 2018
Minio does not support dynamic ports. The workaround here is to scan for a free port first. This is
not foolproof, but as we don't expect too many of these builds to run at once on the same machine,
this should do the trick.

Closes #32701
Closes #32208
ywelsch added a commit that referenced this issue Aug 14, 2018
Minio does not support dynamic ports. The workaround here is to scan for a free port first. This is
not foolproof, but as we don't expect too many of these builds to run at once on the same machine,
this should do the trick.

Closes #32701
Closes #32208
@danielmitterdorfer
Copy link
Member

The error Unable to start the server: Port :60920 is already in use happened again in the last few months (after closing this issue) with the following build ids:

  • 20181015194135-D85743D4 (6.4)
  • 20181010213026-2AE444DC (6.x)
  • 20181010060625-F13DC04D (6.x)
  • 20180927090303-A166204A (6.x)

I wonder whether this might be caused by a timing issue between the socket being closed by the code that determines a free port:

javax.net.ServerSocketFactory.getDefault().createServerSocket(port, 1, InetAddress.getByName(minioAddress)).close()

and Minio attempting to bind to that part soon after. I also wonder whether instead of opening a server socket we attempt to open a TCP client socket to that port. Iff the connection fails, the port is considered to be free.

@original-brownbear original-brownbear self-assigned this Oct 21, 2018
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 26, 2018
* Binding to `0` gives us free ports that are assigned sequentially by Linux making collisions much less likely compared to manually finding a free port in a range
* Closes elastic#32208
original-brownbear added a commit that referenced this issue Oct 26, 2018
* Binding to `0` gives us free ports that are assigned sequentially by Linux making collisions much less likely compared to manually finding a free port in a range
* Closes #32208
kcm pushed a commit that referenced this issue Oct 30, 2018
* Binding to `0` gives us free ports that are assigned sequentially by Linux making collisions much less likely compared to manually finding a free port in a range
* Closes #32208
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants