-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rptest: log kubectl errors #16181
rptest: log kubectl errors #16181
Conversation
@property | ||
def logger(self) -> Logger: | ||
return self._redpanda.logger | ||
|
||
def _cmd(self, cmd): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now I only do this for _cmd but I'll extend it to other command invocations in another change: but this should be enough get the debug output you need
def _cmd(self, cmd): | ||
# Log and run | ||
ssh_prefix = self._ssh_prefix() | ||
remote_cmd = ssh_prefix + cmd | ||
self._redpanda.logger.info(remote_cmd) | ||
return subprocess.check_output(remote_cmd) | ||
try: | ||
return subprocess.check_output(remote_cmd, stderr=subprocess.PIPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was two-fold: we don't capture stderr
and when an excpetion occurs we don't log anything and the exception text only logs the command and the failure code, not the output.
@@ -283,6 +284,20 @@ def test_cloud(self): | |||
self.logger.info(f'deleting topic {topic_name}') | |||
rpk.delete_topic(topic_name) | |||
|
|||
def test_kubectl_tool(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little test for kubectl, may move it somewhere else later.
new failures in https://buildkite.com/redpanda/redpanda/builds/43970#018d2316-216a-483c-b523-a159984aa7a4:
new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2884-5678-4d6e-831f-6863ce41d6ac:
new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2884-567e-4f67-b7d1-fa26871ea8b7:
new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2894-f48d-4bb2-95de-b6d8b1b8bfe7:
new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2894-f490-453c-b9cb-3a8a710d3b52:
new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2894-f494-455e-bf6f-1015ea7667c8:
new failures in https://buildkite.com/redpanda/redpanda/builds/44122#018d3410-5dd6-4984-97ff-92da15724c40:
new failures in https://buildkite.com/redpanda/redpanda/builds/44153#018d3714-be1c-4f2b-b045-bac7cd5b245a:
new failures in https://buildkite.com/redpanda/redpanda/builds/44153#018d3724-4d52-4845-be86-2253b1b53698:
|
Test spoiler block Some details are hiddenthese are details |
/ci-repeat |
1 similar comment
/ci-repeat |
Capture kubectl stderr and ensure stdout and stderr are logged when a command fails, since otherwise you'll just have the return code to go on.
5f08ce4
to
aaff2e5
Compare
Force aaff2e5 fixes an issue in the added self-test when running in non-cloud. |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44122#018d33ff-66b0-45e8-8e3e-8957e13e1a34 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
just a nit: looks like output is only on stderr. could this be interleaving of concurrent log output?
i ran:
ducktape \
--debug \
--globals=/home/ubuntu/redpanda/tests/globals.json \
--cluster=ducktape.cluster.json.JsonCluster \
--cluster-file=/home/ubuntu/redpanda/tests/cluster.json \
--test-runner-timeout=3600000 \
tests/rptest/tests/services_self_test.py::KubectlSelfTest
and looks like only output from stderr of tsh
command:
[INFO - 2024-01-23 04:01:32,549 - kubectl - _cmd - lineno:145]: ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--identity=/tmp/machine-id/identity', 'redpanda@cmkt5nckqcttq57qgir0-agent', 'kubectl', 'foobar']
[INFO - 2024-01-23 04:01:35,098 - kubectl - _cmd - lineno:149]: Command failed (rc=1).
--------- stdout -----------
--------- stderr -----------
error: unknown command "foobar" for "kubectl"
ERROR: Process exited with status 1
whereas running the tsh
command throwing out stdout:
tsh ssh --proxy=proxy.tp.redpanda.com --auth github --tty redpanda@cmkt5nckqcttq57qgir0-agent kubectl foobar > /dev/null
outputs
ERROR: Process exited with status 1
Failure is missing flink: https://redpandadata.slack.com/archives/C02LZGSS66M/p1706006707948709 |
@andrewhsu wrote:
Hmm, looking into this. I thought all the output ended up on stderr here, but your test shows that it's split. I'll look into it. |
@andrewhsu the difference is because
The last line: If I run
Without |
Let me know if I need to force-merge given the flink failure. |
/ci-repeat |
Capture kubectl stderr and ensure stdout and stderr are logged when a command fails, since otherwise you'll just have the return code to go on.
Backports Required
Release Notes