Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rptest: add ability to restart pods in RedpandaServiceCloud #16319

Merged
merged 7 commits into from
Feb 7, 2024

Conversation

andrewhsu
Copy link
Member

@andrewhsu andrewhsu commented Jan 27, 2024

Fixes https://github.com/redpanda-data/core-internal/issues/1001

Add RedpandaServiceCloud.restart_pod() and RedpandaServiceCloud.rolling_restart_pods() to aid in tests that need to simulate a graceful restart of these k8s resources.

Implemented using the kubectl delete pod command to allow k8s to gracefully manage the termination process.

Also added a simple self test of these 2 methods in redpanda_cloud_tests/rolling_restart_test.py to the cloud test suite.

verified with a tier-3-aws-v2-arm cluster:

ducktape \
  --debug \
  --globals=/home/ubuntu/redpanda/tests/globals.json \
  --cluster=ducktape.cluster.json.JsonCluster \
  --cluster-file=/home/ubuntu/redpanda/tests/cluster.json \
  tests/rptest/redpanda_cloud_tests/rolling_restart_test.py::RollingRestartTest

output:

test_id:    rptest.redpanda_cloud_tests.rolling_restart_test.RollingRestartTest.test_restart_pod
status:     PASS
run time:   1 minute 49.533 seconds
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_id:    rptest.redpanda_cloud_tests.rolling_restart_test.RollingRestartTest.test_rolling_restart
status:     PASS
run time:   4 minutes 40.765 seconds
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
========================================================================================================================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.8.18
session_id:       2024-01-30--018
run time:         6 minutes 30.331 seconds
tests run:        2
passed:           2
flaky:            0
failed:           0
ignored:          0
opassed:          0
ofailed:          0
========================================================================================================================================================================================

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

delete_cmd = ['delete', 'pod', pod_name, '-n=redpanda']
self.logger.info(
f'deleting pod {pod_name} so the cluster can recreate it')
# kubectl delete pod rp-clo88krkqkrfamptsst0-0 -n=redpanda'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from docs, this sends a SIGTERM

@andrewhsu andrewhsu marked this pull request as ready for review January 27, 2024 00:45
@andrewhsu
Copy link
Member Author

example test run in description, ready for review

@@ -733,7 +733,7 @@ def test_ht003_kgofailure(self):
timeout_sec=self.msg_timeout)

# Run a rolling restart.
self.stage_rolling_restart()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ht003_kgofailure() is the only method that uses stage_rolling_restart(). so, i've replaced stage_rolling_restart() with the service level rolling_restart_pods()

@andrewhsu
Copy link
Member Author

putting this into draft to resolve merge conflicts

@andrewhsu andrewhsu marked this pull request as draft January 30, 2024 18:10
@andrewhsu andrewhsu force-pushed the rolling-restart-test branch 2 times, most recently from d7c3e47 to 5224800 Compare January 30, 2024 19:34
@andrewhsu andrewhsu marked this pull request as ready for review January 30, 2024 19:38
@andrewhsu
Copy link
Member Author

addressed merge conflicts from PR #16232. re-ran tests, stuff still works. ready for review

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/44492#018d5c25-3521-4364-831e-6dfa8ee0484d:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True.SpilloverManifestUploaded==True"

@vbotbuildovich
Copy link
Collaborator

Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@travisdowns travisdowns merged commit 6dadde6 into redpanda-data:dev Feb 7, 2024
17 checks passed
@andrewhsu andrewhsu deleted the rolling-restart-test branch February 8, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants