You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in #13962 (comment), we need a mechanism to clear obsolete CDC generation data because we send the entire contents of CDC_GENERATIONS_V3 as a part of the group 0 snapshot. We need to prevent these snapshots from endlessly growing as we introduce new generations over time.
The first step of implementing such a mechanism is making the CDC_GENERATIONS_V3 single-partition and ordered by timeuuids. This change enables the efficient clearing of old generations by inserting a range tombstone. #15163 has already introduced this change.
The second (and the last) step is actually clearing old generations. The proposed solution is to make the topology coordinator systematically insert a range tombstone covering all generations that are:
published,
not current (its timeuuid does not equal current_cdc_generation_uuid),
older than now() - 24 h, where now() is the current time point of the coordinator's clock.
The explanation for the two first requirements is that we cannot delete unpublished and current generations. The last requirement addresses the clock discrepancies with a large reserve. We must ensure all nodes' clocks are in the future compared to a generation we want to remove.
Tombstones can be inserted by the topology coordinator, but it seems like its CDC publisher fiber introduced in #15281 is a better candidate for this task. It can be adjusted to do it.
The text was updated successfully, but these errors were encountered:
As discussed in #13962 (comment), we need a mechanism to clear obsolete CDC generation data because we send the entire contents of
CDC_GENERATIONS_V3
as a part of the group 0 snapshot. We need to prevent these snapshots from endlessly growing as we introduce new generations over time.The first step of implementing such a mechanism is making the
CDC_GENERATIONS_V3
single-partition and ordered bytimeuuid
s. This change enables the efficient clearing of old generations by inserting a range tombstone. #15163 has already introduced this change.The second (and the last) step is actually clearing old generations. The proposed solution is to make the topology coordinator systematically insert a range tombstone covering all generations that are:
timeuuid
does not equalcurrent_cdc_generation_uuid
),now() - 24 h
, wherenow()
is the current time point of the coordinator's clock.The explanation for the two first requirements is that we cannot delete unpublished and current generations. The last requirement addresses the clock discrepancies with a large reserve. We must ensure all nodes' clocks are in the future compared to a generation we want to remove.
Tombstones can be inserted by the topology coordinator, but it seems like its CDC publisher fiber introduced in #15281 is a better candidate for this task. It can be adjusted to do it.
The text was updated successfully, but these errors were encountered: