Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced partition shutdown watchdog timer #16067

Merged
merged 3 commits into from
Jan 12, 2024

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jan 11, 2024

Introduced a watchdog that is tracking partition shutdown state. The
watchdog is intended to provide a mechanism that will ease debugging of
partition shutdown issues. The watchdog tracks state of partitions that
were requested to be stopped or removed. When there was no state update
for the time longer than the configurable threshold (by default 30
seconds) the watchdog will emit error log entry which will inform user
about the problem.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 11, 2024

ztlpn
ztlpn previously approved these changes Jan 11, 2024
src/v/cluster/partition_manager.cc Outdated Show resolved Hide resolved
src/v/config/configuration.cc Outdated Show resolved Hide resolved
@@ -101,6 +107,11 @@ partition_manager::get_topic_partition_table(
return rs;
}

ss::future<> partition_manager::start() {
maybe_arm_shutdown_watchdog();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is a good usecase for arm_periodic()

src/v/cluster/partition_manager.cc Outdated Show resolved Hide resolved
Introduced property that is going to be used by
`cluster::partition_manager` shutdown watchdog mechanism.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Introduced a watchdog that is tracking partition shutdown state. The
watchdog is intended to provide a mechanism that will ease debugging of
partition shutdown issues. The watchdog tracks state of partitions that
were requested to be stopped or removed. When there was no state update
for the time longer than the configurable threshold (by default 30
seconds) the watchdog will emit error log entry which will inform user
about the problem.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv merged commit 7eb30b2 into redpanda-data:dev Jan 12, 2024
19 checks passed
@mmaslankaprv mmaslankaprv deleted the shutdown-tracking branch January 12, 2024 14:05
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16067-v23.3.x-915 remotes/upstream/v23.3.x
git cherry-pick -x 166dda6018d5d70fc393fafac40afc59fca6ec36 f66bf5b1802a9bbaeb58529018c74079c516da9d 3adfdfc8fd5ac1dd52a09f6e4617e0208dd46a07

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16067-v23.2.x-475 remotes/upstream/v23.2.x
git cherry-pick -x 166dda6018d5d70fc393fafac40afc59fca6ec36 f66bf5b1802a9bbaeb58529018c74079c516da9d 3adfdfc8fd5ac1dd52a09f6e4617e0208dd46a07

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants