Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rptest: log kubectl errors #16181

Merged

Conversation

travisdowns
Copy link
Member

Capture kubectl stderr and ensure stdout and stderr are logged when a command fails, since otherwise you'll just have the return code to go on.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

@property
def logger(self) -> Logger:
return self._redpanda.logger

def _cmd(self, cmd):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now I only do this for _cmd but I'll extend it to other command invocations in another change: but this should be enough get the debug output you need

def _cmd(self, cmd):
# Log and run
ssh_prefix = self._ssh_prefix()
remote_cmd = ssh_prefix + cmd
self._redpanda.logger.info(remote_cmd)
return subprocess.check_output(remote_cmd)
try:
return subprocess.check_output(remote_cmd, stderr=subprocess.PIPE)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was two-fold: we don't capture stderr and when an excpetion occurs we don't log anything and the exception text only logs the command and the failure code, not the output.

@@ -283,6 +284,20 @@ def test_cloud(self):
self.logger.info(f'deleting topic {topic_name}')
rpk.delete_topic(topic_name)

def test_kubectl_tool(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little test for kubectl, may move it somewhere else later.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 19, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/43970#018d2316-216a-483c-b523-a159984aa7a4:

"rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage_with_partition_moves.cleanup_policy=delete"

new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2884-5678-4d6e-831f-6863ce41d6ac:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Delete==True.SpilloverManifestUploaded==True.TS_Spillover_ManifestDeleted==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.SpilloverManifestUploaded==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.S3.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True.SpilloverManifestUploaded==True"

new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2884-567e-4f67-b7d1-fa26871ea8b7:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True.SpilloverManifestUploaded==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.S3.test_case=.TS_Read==True.SpilloverManifestUploaded==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.S3.test_case=.TS_Delete==True.SpilloverManifestUploaded==True.TS_Spillover_ManifestDeleted==True"

new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2894-f48d-4bb2-95de-b6d8b1b8bfe7:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.TS_Timequery==True.SpilloverManifestUploaded==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.S3.test_case=.TS_Read==True.TS_TxRangeMaterialized==True.SpilloverManifestUploaded==True"

new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2894-f490-453c-b9cb-3a8a710d3b52:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.SpilloverManifestUploaded==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Delete==True.SpilloverManifestUploaded==True.TS_Spillover_ManifestDeleted==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.S3.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True.SpilloverManifestUploaded==True"

new failures in https://buildkite.com/redpanda/redpanda/builds/44046#018d2894-f494-455e-bf6f-1015ea7667c8:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.S3.test_case=.TS_Read==True.TS_Timequery==True.SpilloverManifestUploaded==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.TS_TxRangeMaterialized==True.SpilloverManifestUploaded==True"

new failures in https://buildkite.com/redpanda/redpanda/builds/44122#018d3410-5dd6-4984-97ff-92da15724c40:

"rptest.tests.simple_e2e_test.SimpleEndToEndTest.test_leader_acks"

new failures in https://buildkite.com/redpanda/redpanda/builds/44153#018d3714-be1c-4f2b-b045-bac7cd5b245a:

"rptest.tests.flink_basic_test.FlinkBasicTests.test_transaction_workload"

new failures in https://buildkite.com/redpanda/redpanda/builds/44153#018d3724-4d52-4845-be86-2253b1b53698:

"rptest.tests.flink_basic_test.FlinkBasicTests.test_transaction_workload"

@travisdowns
Copy link
Member Author

#16026

@travisdowns
Copy link
Member Author

travisdowns commented Jan 20, 2024

Test spoiler block

Some details are hidden these are details

@travisdowns
Copy link
Member Author

/ci-repeat

1 similar comment
@travisdowns
Copy link
Member Author

/ci-repeat

@travisdowns travisdowns requested review from a team and removed request for a team January 22, 2024 20:22
Capture kubectl stderr and ensure stdout and stderr are logged when
a command fails, since otherwise you'll just have the return code
to go on.
@travisdowns
Copy link
Member Author

Force aaff2e5 fixes an issue in the added self-test when running in non-cloud.

@vbotbuildovich
Copy link
Collaborator

Copy link
Member

@andrewhsu andrewhsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

just a nit: looks like output is only on stderr. could this be interleaving of concurrent log output?

i ran:

ducktape \
  --debug \
  --globals=/home/ubuntu/redpanda/tests/globals.json \
  --cluster=ducktape.cluster.json.JsonCluster \
  --cluster-file=/home/ubuntu/redpanda/tests/cluster.json \
  --test-runner-timeout=3600000 \
  tests/rptest/tests/services_self_test.py::KubectlSelfTest

and looks like only output from stderr of tsh command:

[INFO  - 2024-01-23 04:01:32,549 - kubectl - _cmd - lineno:145]: ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--identity=/tmp/machine-id/identity', 'redpanda@cmkt5nckqcttq57qgir0-agent', 'kubectl', 'foobar']
[INFO  - 2024-01-23 04:01:35,098 - kubectl - _cmd - lineno:149]: Command failed (rc=1).
--------- stdout -----------
--------- stderr -----------
error: unknown command "foobar" for "kubectl"
ERROR: Process exited with status 1

whereas running the tsh command throwing out stdout:

tsh ssh --proxy=proxy.tp.redpanda.com --auth github --tty redpanda@cmkt5nckqcttq57qgir0-agent kubectl foobar > /dev/null

outputs

ERROR: Process exited with status 1

@travisdowns
Copy link
Member Author

@travisdowns
Copy link
Member Author

@andrewhsu wrote:

just a nit: looks like output is only on stderr. could this be interleaving of concurrent log output?

Hmm, looking into this. I thought all the output ended up on stderr here, but your test shows that it's split. I'll look into it.

@travisdowns
Copy link
Member Author

travisdowns commented Jan 23, 2024

@andrewhsu the difference is because --tty was in your by-hand command line test:

tsh ssh --proxy=proxy.tp.redpanda.com --auth github --tty redpanda@cmkt5nckqcttq57qgir0-agent kubectl foobar > /dev/null

--tty is going to make all command output appear on stdout regardless of where it is output by the underlying command since the remote tty is absorbing both streams and returning the tty output on stdout to the local end.

The last line: ERROR: Process exited with status 1 comes from tsh itself, not the remote end, so it appears on stderr regardless of --tty.

If I run KubectlTool._cmd with --tty I get equivalent results to your by-hand result:

--------- stdout -----------
error: unknown command "foobar" for "kubectl"
--------- stderr -----------
ERROR: Process exited with status 1

Without --tty the error: appears on stderr because that's where kubectl is really putting it (you can veryify by tsh ssh into the agent, then in bash doing the same test).

@piyushredpanda
Copy link
Contributor

Let me know if I need to force-merge given the flink failure.

@piyushredpanda
Copy link
Contributor

/ci-repeat

@piyushredpanda piyushredpanda merged commit afd8ebf into redpanda-data:dev Jan 23, 2024
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants