Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] S3BlobStoreRepositoryTests testRequestStats failing #88841

Open
fcofdez opened this issue Jul 27, 2022 · 2 comments
Open

[CI] S3BlobStoreRepositoryTests testRequestStats failing #88841

fcofdez opened this issue Jul 27, 2022 · 2 comments
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@fcofdez
Copy link
Contributor

fcofdez commented Jul 27, 2022

Build scan:
https://gradle-enterprise.elastic.co/s/mkusffe2imt3i/tests/:plugins:repository-s3:internalClusterTest/org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests/testRequestStats

Reproduction line:
./gradlew ':plugins:repository-s3:internalClusterTest' --tests "org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests.testRequestStats" -Dtests.seed=E93397D14D4A1FEE -Dtests.locale=de-DE -Dtests.timezone=Asia/Chongqing -Druntime.java=17 -Dtests.fips.enabled=true

Applicable branches:
7.17

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests&tests.test=testRequestStats

Failure excerpt:

java.lang.AssertionError: SDK sent [{ListObjects=15, PutMultipartObject=36, PutObject=127, GetObject=129}] calls and handler measured [{PutObject=126, ListObjects=15, GetObject=129, PutMultipartObject=36}] calls expected:<{PutObject=126, ListObjects=15, GetObject=129, PutMultipartObject=36}> but was:<{ListObjects=15, PutMultipartObject=36, PutObject=127, GetObject=129}>

  at __randomizedtesting.SeedInfo.seed([E93397D14D4A1FEE:F6B723C8789934D1]:0)
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:834)
  at org.junit.Assert.assertEquals(Assert.java:118)
  at org.elasticsearch.repositories.blobstore.ESMockAPIBasedRepositoryIntegTestCase.testRequestStats(ESMockAPIBasedRepositoryIntegTestCase.java:237)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)

@fcofdez fcofdez added Team:Distributed Meta label for distributed team :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Jul 27, 2022
@fcofdez fcofdez self-assigned this Jul 27, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

fcofdez added a commit to fcofdez/elasticsearch that referenced this issue Sep 8, 2022
fcofdez added a commit that referenced this issue Sep 8, 2022
thecoop pushed a commit to thecoop/elasticsearch that referenced this issue Sep 9, 2022
@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 20, 2022

After adding more logging in #89912 I've been trying to figure out what happens here and it looks like somehow the connection is closed before the stats collector handler is called, in that case the SDK retries an extra time. See the logs:

  1> [2022-10-16T20:31:04,330][DEBUG][c.a.request              ] [node_t0] Sending Request: PUT http://127.0.0.1:43125 /bucket/test/1/meta-95tImZOxS8OIUXJ2xOP5bQ.dat Headers: (amz-sdk-invocation-id: 97e83af6-38bd-
a874-f90b-8094c050d850, Content-Length: 644, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36-2238 java/19 vendor/Oracle_Co
rporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )
  1> [2022-10-16T20:31:04,347][DEBUG][c.a.request              ] [node_t0] Retrying Request: PUT http://127.0.0.1:43125 /bucket/test/1/meta-95tImZOxS8OIUXJ2xOP5bQ.dat Headers: (amz-sdk-invocation-id: 97e83af6-38bd
-a874-f90b-8094c050d850, Content-Length: 644, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36-2238 java/19 vendor/Oracle_C
orporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, ) <--- This happens without receiving a response
  1> [2022-10-16T20:31:04,374][DEBUG][c.a.request              ] [node_t0] Sending Request: PUT http://127.0.0.1:43125 /bucket/test/1/indices/oqJtOmwkQI6m_QujVtEz2A/meta-lfQ15IMBiJVIEG-QpOyx.dat Headers: (amz-sdk-
invocation-id: e60d4d9a-d071-bba2-940b-3d1134e719c0, Content-Length: 705, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36-
2238 java/19 vendor/Oracle_Corporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )
  1> [2022-10-16T20:31:04,376][DEBUG][c.a.request              ] [node_t0] Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Internal Server Error (Service: Amazon S3; Status Code: 500; E
rror Code: 500 Internal Server Error; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null
  1> [2022-10-16T20:31:04,376][DEBUG][c.a.request              ] [node_t0] Retrying Request: PUT http://127.0.0.1:43125 /bucket/test/1/meta-95tImZOxS8OIUXJ2xOP5bQ.dat Headers: (amz-sdk-invocation-id: 97e83af6-38bd
-a874-f90b-8094c050d850, Content-Length: 644, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36-2238 java/19 vendor/Oracle_C
orporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )
  1> [2022-10-16T20:31:04,383][DEBUG][c.a.request              ] [node_t0] Sending Request: PUT http://127.0.0.1:43125 /bucket/test/1/snap-95tImZOxS8OIUXJ2xOP5bQ.dat Headers: (amz-sdk-invocation-id: 86848675-4392-
87b6-9ff1-7283aa6a880d, Content-Length: 376, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36-2238 java/19 vendor/Oracle_Co
rporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )
  1> [2022-10-16T20:31:04,399][DEBUG][c.a.request              ] [node_t0] Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Internal Server Error (Service: Amazon S3; Status Code: 500; E
rror Code: 500 Internal Server Error; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null
  1> [2022-10-16T20:31:04,400][DEBUG][c.a.request              ] [node_t0] Retrying Request: PUT http://127.0.0.1:43125 /bucket/test/1/snap-95tImZOxS8OIUXJ2xOP5bQ.dat Headers: (amz-sdk-invocation-id: 86848675-4392
-87b6-9ff1-7283aa6a880d, Content-Length: 376, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36-2238 java/19 vendor/Oracle_C
orporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )
  1> [2022-10-16T20:31:04,403][DEBUG][c.a.request              ] [node_t0] Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Internal Server Error (Service: Amazon S3; Status Code: 500; E
rror Code: 500 Internal Server Error; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null
  1> [2022-10-16T20:31:04,404][DEBUG][c.a.request              ] [node_t0] Retrying Request: PUT http://127.0.0.1:43125 /bucket/test/1/indices/oqJtOmwkQI6m_QujVtEz2A/meta-lfQ15IMBiJVIEG-QpOyx.dat Headers: (amz-sdk
-invocation-id: e60d4d9a-d071-bba2-940b-3d1134e719c0, Content-Length: 705, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36
-2238 java/19 vendor/Oracle_Corporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )
  1> [2022-10-16T20:31:04,451][DEBUG][c.a.request              ] [node_t0] Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Internal Server Error (Service: Amazon S3; Status Code: 500; E
rror Code: 500 Internal Server Error; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null
  1> [2022-10-16T20:31:04,451][DEBUG][c.a.request              ] [node_t0] Retrying Request: PUT http://127.0.0.1:43125 /bucket/test/1/snap-95tImZOxS8OIUXJ2xOP5bQ.dat Headers: (amz-sdk-invocation-id: 86848675-4392
-87b6-9ff1-7283aa6a880d, Content-Length: 376, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36-2238 java/19 vendor/Oracle_C
orporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )
  1> [2022-10-16T20:31:04,491][DEBUG][c.a.request              ] [node_t0] Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Internal Server Error (Service: Amazon S3; Status Code: 500; E
rror Code: 500 Internal Server Error; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null
  1> [2022-10-16T20:31:04,491][DEBUG][c.a.request              ] [node_t0] Retrying Request: PUT http://127.0.0.1:43125 /bucket/test/1/indices/oqJtOmwkQI6m_QujVtEz2A/meta-lfQ15IMBiJVIEG-QpOyx.dat Headers: (amz-sdk
-invocation-id: e60d4d9a-d071-bba2-940b-3d1134e719c0, Content-Length: 705, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36
-2238 java/19 vendor/Oracle_Corporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )
  1> [2022-10-16T20:31:04,512][DEBUG][c.a.request              ] [node_t0] Received successful response: 200, AWS Request ID: null
  1> [2022-10-16T20:31:04,513][DEBUG][c.a.request              ] [node_t0] x-amzn-RequestId: not available
  1> [2022-10-16T20:31:04,513][DEBUG][c.a.request              ] [node_t0] AWS Request ID: not available
  1> [2022-10-16T20:31:04,575][DEBUG][c.a.request              ] [node_t0] Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Internal Server Error (Service: Amazon S3; Status Code: 500; E
rror Code: 500 Internal Server Error; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null
  1> [2022-10-16T20:31:04,576][DEBUG][c.a.request              ] [node_t0] Retrying Request: PUT http://127.0.0.1:43125 /bucket/test/1/meta-95tImZOxS8OIUXJ2xOP5bQ.dat Headers: (amz-sdk-invocation-id: 97e83af6-38bd
-a874-f90b-8094c050d850, Content-Length: 644, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.12.270 Linux/3.10.0-1160.76.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/19+36-2238 java/19 vendor/Oracle_C
orporation cfg/retry-mode/legacy, x-amz-acl: private, x-amz-storage-class: STANDARD, )

I'm not entirely sure what's causing this, any exception thrown within the test handlers should log an error and that's not happening here, therefore something is messing around with the connection.

@volodk85 volodk85 added the low-risk An open issue or test failure that is a low risk to future releases label Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants