-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-2243 archival: Start housekeeping jobs only on a leader #17839
CORE-2243 archival: Start housekeeping jobs only on a leader #17839
Conversation
new failures in https://buildkite.com/redpanda/redpanda/builds/47747#018ed409-d50d-4ed3-840d-040c8a962826:
new failures in https://buildkite.com/redpanda/redpanda/builds/47767#018ed742-3310-4b68-bf4e-7edc815aa70b:
new failures in https://buildkite.com/redpanda/redpanda/builds/47767#018ed732-2b87-4e6b-9492-f342733acfb3:
new failures in https://buildkite.com/redpanda/redpanda/builds/47767#018ed742-3313-4907-ae38-5f5bb6ad94f6:
new failures in https://buildkite.com/redpanda/redpanda/builds/47767#018ed742-330b-4d3d-98c9-5b4576cc4b8b:
new failures in https://buildkite.com/redpanda/redpanda/builds/47767#018ed732-2b84-4005-b48b-0cefcca05581:
new failures in https://buildkite.com/redpanda/redpanda/builds/47767#018ed742-330d-4356-9e00-906352050678:
new failures in https://buildkite.com/redpanda/redpanda/builds/47767#018ed732-2b80-45bb-b8ae-d2bf7f397ebe:
new failures in https://buildkite.com/redpanda/redpanda/builds/47768#018ed874-08c7-4bc9-929d-8cba3f4c0129:
new failures in https://buildkite.com/redpanda/redpanda/builds/47820#018ee3a4-3efb-4aff-8c34-402654f0ae72:
|
Can you add the annotations for the corresponding jira issues? As well as the github issues that are duplicates of this? I think I saw a few more fly by in CI fialure triage recently. |
44cacb1
to
4090e5e
Compare
@@ -369,6 +355,18 @@ ss::future<> ntp_archiver::upload_until_abort() { | |||
} | |||
vlog(_rtclog.debug, "upload loop synced in term {}", _start_term); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check here once again the may_begin_uploads()
condition? we just yielded and the sync might have advanced the term, right?
Nothing bad should happen but it should make debugging in the future easier if we log "upload loop exited because term changed" instead of "upload loop synced in term <<current_term-1>>" and then enabling the merger, scrubber and exiting after starting upload_until_term_change
@@ -315,11 +300,12 @@ void ntp_archiver::notify_leadership(std::optional<model::node_id> leader_id) { | |||
if (is_leader) { | |||
_leader_cond.signal(); | |||
} | |||
if (_local_segment_merger) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reviewing this part another question popped: Why we don't set _start_term
? I believe _start_term have a bit of an overloaded meaning, but judging from the way it is used it would be better to set synchronously here instead of in places like upload_until_abort
where it is uncertain at which point will execute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to set it only after we synchronized with the STM. Before we did this _start_term
doesn't have any meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do set it before the sync if you read upload_until_abort
. Are we doing it wrong there?
My suggestions comes from the fact that we do signal the leader_cond here so our intention is to start the machinery in this term we are notified. For all later terms we will get separate notifications anyway.
I believe this to lead to safer code and code that is easier to reason about compared to how it is now where we got the leadership and signal the _leader_cond and in parallel we have another fiber which who knows in which state it is and what _start_term did it record. I think it is possible for this notify_leadership method to be called multiple times while the upload loop didn't even start and it runs with is_leader true and wrong _start_term...
Unlikely for a seastar task to be blocked for that long in practice but the code is hard to reason about nevertheless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you suggesting to enable/disable the housekeeping jobs only here and not in 'notify_leadership'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not what I said but I like it more the way it is now 👍
3f3ff42
to
51097a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes at least for the confusing scrubber test change.
@@ -367,7 +353,22 @@ ss::future<> ntp_archiver::upload_until_abort() { | |||
if (!is_synced.has_value()) { | |||
continue; | |||
} | |||
|
|||
vlog(_rtclog.debug, "upload loop synced in term {}", _start_term); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Change this line to vlog(_rtclog.debug, "upload loop synced, current term: {}", _parent.term());
.
@@ -621,6 +621,9 @@ def test_scrubber(self, cloud_storage_type): | |||
self._produce() | |||
self._assert_no_anomalies() | |||
|
|||
self.redpanda.set_cluster_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if scrubbing is disabled, what is the purpose of the above self._assert_no_anomalies()
? shall it be moved after cluster config is changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it checks that we don't start with some anomalies I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you just disabled scrubbing at start, why we still have the assert in place? It makes no sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i reverted this change
@@ -315,11 +300,12 @@ void ntp_archiver::notify_leadership(std::optional<model::node_id> leader_id) { | |||
if (is_leader) { | |||
_leader_cond.signal(); | |||
} | |||
if (_local_segment_merger) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do set it before the sync if you read upload_until_abort
. Are we doing it wrong there?
My suggestions comes from the fact that we do signal the leader_cond here so our intention is to start the machinery in this term we are notified. For all later terms we will get separate notifications anyway.
I believe this to lead to safer code and code that is easier to reason about compared to how it is now where we got the leadership and signal the _leader_cond and in parallel we have another fiber which who knows in which state it is and what _start_term did it record. I think it is possible for this notify_leadership method to be called multiple times while the upload loop didn't even start and it runs with is_leader true and wrong _start_term...
Unlikely for a seastar task to be blocked for that long in practice but the code is hard to reason about nevertheless
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47805#018ee2a3-c9df-4732-9838-23327f3d34c6 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47820#018ee3a4-3f04-4a76-856b-cb93f40adf57 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47820#018ee3c3-5375-4b7a-a4b6-efe49d2a0706 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47863#018ee7a6-140b-425a-9ae6-54350d0fa7b5 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47863#018ee7b5-2b5b-4ffe-938d-eb3fe673e8c0 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47863#018ee7b5-2b56-4e4e-9fa1-38fade5a3e93 |
51097a1
to
c9ce764
Compare
Currently, the housekeeping jobs are enabled in the 'start' method. First the 'sync' method of the archival metadata stm is called. We don't check the result of the 'sync' call which is a mistake. Normally, we're calling 'start' method of the 'ntp_archiver' when the partition is already a leader so this code works as expected. But if the partition is moved to another shard we could potentially create and start archiver when the partition is not a leader yet. In this case the 'sync' call returns 'nullopt'. The housekeeping jobs are not enabled because _parent.is_leader() returns 'false' in this case. Then, when the partition becomes a leader the 'notify_leadership' method is invoked. This method enables the housekeeping jobs. The problem is that the 'sync' method of the archival STM may not be called yet. So the adjacent_segment_merger starts reuploading segments based on stale manifest. The fix delays enablement of the housekeeping jobs until the background loop is started and 'sync' is successfully called. The 'notify_leadership' method can enable or disable the jobs after that.
Log all headers in situation when we fail to parse them
c9ce764
to
6bd70e0
Compare
Increase the timeout and scrubbing frequency.
6bd70e0
to
7747c48
Compare
/backport v23.3.x |
Currently, the housekeeping jobs are enabled in the 'start' method. First the 'sync' method of the archival metadata stm is called. We don't check the result of the 'sync' call which is a mistake. Normally, we're calling 'start' method of the 'ntp_archiver' when the partition is already a leader so this code works as expected. But if the partition is moved to another shard we could potentially create and start archiver when the partition is not a leader yet. In this case the 'sync' call returns 'nullopt'. The housekeeping jobs are not enabled because _parent.is_leader() returns 'false' in this case.
Then, when the partition becomes a leader the 'notify_leadership' method is invoked. This method enables the housekeeping jobs. The problem is that the 'sync' method of the archival STM may not be called yet. So the adjacent_segment_merger starts reuploading segments based on stale manifest.
The fix delays creation of the housekeeping jobs until the background loop is started and 'sync' is successfully called. The 'notify_leadership' method can enable or disable the jobs after that.
Extend error logging for the HeadObject request
Fixes #17750
Refs #15528
Backports Required
Release Notes