Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (HTTP max retries querying cloud storage status) in EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures #10364

Closed
andrwng opened this issue Apr 26, 2023 · 3 comments · Fixed by #10491
Assignees
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@andrwng
Copy link
Contributor

andrwng commented Apr 26, 2023

https://buildkite.com/redpanda/redpanda/builds/27780#0187a64c-f0d3-407b-976f-b6b12bc2dee2

Module: rptest.tests.e2e_shadow_indexing_test
Class:  EndToEndShadowIndexingTestWithDisruptions
Method: test_write_with_node_failures
Arguments:
{
  "cloud_storage_type": 2
}
test_id:    rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.ABS
status:     FAIL
run time:   43.911 seconds


    ConnectionError(MaxRetryError("HTTPConnectionPool(host='docker-rp-21', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/__consumer_offsets/2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3ecb961210>: Failed to establish a new connection: [Errno 111] Connection refused'))"))
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f3ecb961210>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='docker-rp-21', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/__consumer_offsets/2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3ecb961210>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 95, in wrapped
    self.redpanda.stop_and_scrub_object_storage()
  File "/root/tests/rptest/services/redpanda.py", line 2951, in stop_and_scrub_object_storage
    wait_until(all_partitions_uploaded_manifest,
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 53, in wait_until
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 44, in wait_until
    if condition():
  File "/root/tests/rptest/services/redpanda.py", line 2925, in all_partitions_uploaded_manifest
    status = self._admin.get_partition_cloud_storage_status(
  File "/root/tests/rptest/services/admin.py", line 907, in get_partition_cloud_storage_status
    return self._request("GET",
  File "/root/tests/rptest/services/admin.py", line 334, in _request
    r = self._session.request(verb, url, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='docker-rp-21', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/__consumer_offsets/2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3ecb961210>: Failed to establish a new connection: [Errno 111] Connection refused'))
@andrwng andrwng added kind/bug Something isn't working ci-failure area/cloud-storage Shadow indexing subsystem sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. labels Apr 26, 2023
@andrwng
Copy link
Contributor Author

andrwng commented Apr 26, 2023

sev/med the test has some process restarts so maybe it's normal for retries, so maybe it's expected to have some instability at the HTTP layer. Still a little surprising to see the HTTP endpoint being bogged down for very long, so worth investigating

@michael-redpanda
Copy link
Contributor

@jcsp
Copy link
Contributor

jcsp commented May 2, 2023

This appears to be a resurgence of what #10196 was meant to fix

jcsp added a commit to jcsp/redpanda that referenced this issue May 2, 2023
The run() method was calling Admin.ready without handling
connection errors.  This would be an unhandled exception
that caused us to fall out of run() prematurely without
waiting for nodes to be ready.

Fixes redpanda-data#10364
@jcsp jcsp self-assigned this May 2, 2023
@jcsp jcsp added sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages and removed sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. labels May 2, 2023
travisdowns pushed a commit to travisdowns/redpanda that referenced this issue May 5, 2023
The run() method was calling Admin.ready without handling
connection errors.  This would be an unhandled exception
that caused us to fall out of run() prematurely without
waiting for nodes to be ready.

Fixes redpanda-data#10364
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants