remote partition_manifest validation for topic recovery #16915

andijcr · 2024-03-06T14:28:53Z

Introduced a new function to perform metadata validation for a remote topic to be used during the process of topic recovery.

For each partition, checks if the manifest exists in the cloud and optionally checks if the file can be decoded and self-consistent up to N most recent segment_meta,

For each partition, the S3 cost is 1 HeadObject request for the partition_manifest OR 1Get + 1HeadObject request for N segment_meta.

in general, we could limit the depth of the check by

num segments
offset
total time

num segments correlates directly with the speed of the operation while being deterministic.
offset could map better with what the user wants, but it's more challenging to implement with the current reverse iteration implementation of segment_meta_cstore.
total time, it's a strong guarantee, but the final result depends on the load of the system.

This PR implements a max_num_segment limit, as it's easier to reason about. A follow-up can implement the other two modes if there is a request for this.

The check is performed in parallel for each partition with a cap.

The result can be

passed   // checks are on
missing_manifest // allowed for topics that did not have time to upload a manifest

failure    // decoding failure or metadata inconsistencies; should stop 
download_issue // cloud storage configuration error or service error; should stop

The validation mode is driven by a new (nullopt) topic property and a cluster-level default + force flag.

A new topic_property allows users to define validation manually by editing the topic_manifest.json file.
otherwise, cluster-level defaults will be used.

    enum_property<model::recovery_validation_mode>
      cloud_storage_recovery_topic_validation_mode;
    property<uint16_t> cloud_storage_recovery_topic_validation_depth;
    property<bool> cloud_storage_recovery_topic_force_ovveride_cfg;

The last one is an escape hatch to override recovery_checks with the cluster defaults

Fixes https://github.com/redpanda-data/core-internal/issues/1138
Fixes https://github.com/redpanda-data/core-internal/issues/1139

Backports Required

Release Notes

Improvements

before creating a recovery topic, perform metadata validation on the cloud data to ensure that each partition can be recovered successfully

src/v/cloud_storage/anomalies_detector.h

src/v/cloud_storage/remote.cc

src/v/config/configuration.cc

src/v/cluster/topic_recovery_validator.cc

vbotbuildovich · 2024-03-20T12:46:38Z

new failures in https://buildkite.com/redpanda/redpanda/builds/46492#018e5ba2-f06b-4254-944c-8f621456b8ca:

"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_partition.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.cluster_config_test.ClusterConfigTest.test_rpk_export_import"
"rptest.tests.control_character_flag_test.ControlCharacterPermittedAfterUpgrade.test_upgrade_from_pre_v23_2.initial_version=.23.1.1"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_vcluster_id.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_size_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_empty_segments.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46492#018e5ba2-f06e-4180-b3a8-65034779a9d0:

"rptest.tests.consumer_group_recovery_test.ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_partition.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_creation_test.CreateTopicsTest.test_no_log_bloat_when_recreating_existing_topics"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_vcluster_id.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_empty_segments.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_size_based_retention.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/46492#018e5ba2-f067-4f2b-88c7-069a925426c0:

"rptest.tests.cluster_recovery_test.ClusterRecoveryTest.test_bootstrap_with_recovery"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_segment.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.cluster_config_test.ClusterConfigTest.test_valid_settings"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_no_data.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/46492#018e5ba2-f071-43b5-b667-bd2aa49f158f:

"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_topic_recovery_retention_settings"
"rptest.tests.cluster_recovery_test.ClusterRecoveryTest.test_basic_controller_snapshot_restore"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_segment.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_no_data.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46492#018e5bb4-6654-4dfa-8349-e920128e17e2:

"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_aborted_tx.recovery_overrides=.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.cluster_recovery_test.ClusterRecoveryTest.test_basic_controller_snapshot_restore"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore.message_size=5000.num_messages=100000.recovery_overrides=.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_recover"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_segment.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.cluster_config_test.ClusterConfigTest.test_restart"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_no_data.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46492#018e5bb4-665b-4689-b62f-152e1645b0c1:

"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_topic_recovery_retention_settings"
"rptest.tests.consumer_group_recovery_test.ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_aborted_tx.recovery_overrides=.retention.local.target.bytes.1024.redpanda.remote.write.True.redpanda.remote.read.True.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore.message_size=5000.num_messages=100000.recovery_overrides=.retention.local.target.bytes.1024.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_partition.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_vcluster_id.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_empty_segments.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_size_based_retention.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/46492#018e5bb4-6658-4926-a2da-c93177cd2c1b:

"rptest.tests.e2e_shadow_indexing_test.ShadowIndexingManyPartitionsTest.test_many_partitions_recovery"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_aborted_tx.recovery_overrides=.retention.local.target.bytes.1024.redpanda.remote.write.True.redpanda.remote.read.True.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore.message_size=5000.num_messages=100000.recovery_overrides=.retention.local.target.bytes.1024.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_partition.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.cluster_config_test.ClusterConfigTest.test_rpk_export_import"
"rptest.tests.topic_creation_test.CreateTopicsTest.test_no_log_bloat_when_recreating_existing_topics"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_vcluster_id.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_empty_segments.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_size_based_retention.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46492#018e5bb4-6656-4377-8779-0cf4949f088a:

"rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.num_to_upgrade=0.with_tiered_storage=False"
"rptest.tests.cluster_recovery_test.ClusterRecoveryTest.test_bootstrap_with_recovery"
"rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.num_to_upgrade=0.with_tiered_storage=False"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_recover_after_delete_records"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore.message_size=5000.num_messages=100000.recovery_overrides=.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_aborted_tx.recovery_overrides=.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_segment.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.cluster_config_test.ClusterConfigTest.test_valid_settings"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_no_data.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e0a-4a9c-47db-8868-807498b8dbfe:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_existence"

new failures in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e0a-4a94-40ed-b0a7-6143a871d571:

"rptest.tests.cluster_config_test.ClusterConfigTest.test_valid_settings"
"rptest.tests.control_character_flag_test.ControlCharacterPermittedAfterUpgrade.test_upgrade_from_pre_v23_2.initial_version=.22.3.11"
"rptest.tests.topic_creation_test.CreateTopicsTest.test_no_log_bloat_when_recreating_existing_topics"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_own_metadata_only"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=no_check"

new failures in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e0a-4aa0-43dd-a7bc-96bd7db712e5:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_existence"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_own_metadata_only"

new failures in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e0a-4a99-4e35-a423-269656f9ef63:

"rptest.tests.cluster_config_test.ClusterConfigTest.test_rpk_export_import"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=no_check"

new failures in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e1c-34e7-4b8f-a78e-a84eed8c4ce4:

"rptest.tests.topic_creation_test.CreateTopicsTest.test_no_log_bloat_when_recreating_existing_topics"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_existence"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_own_metadata_only"

new failures in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e1c-34ef-4c03-a951-f8ebde0cd29b:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_existence"

new failures in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e1c-34ec-4da9-a494-3ddde49d513c:

"rptest.tests.cluster_config_test.ClusterConfigTest.test_rpk_export_import"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=no_check"

new failures in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e1c-34ea-49ca-b480-2c2e5c118ba8:

"rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.num_to_upgrade=0.with_tiered_storage=False"
"rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.num_to_upgrade=0.with_tiered_storage=False"
"rptest.tests.cluster_config_test.ClusterConfigTest.test_valid_settings"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_own_metadata_only"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=no_check"

new failures in https://buildkite.com/redpanda/redpanda/builds/46568#018e61e6-c6cb-4fca-a45b-d3cbeeefa737:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/46568#018e61e6-c6c8-45c0-be1d-eb9cb0d6cf85:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46568#018e61e6-c6c5-4606-a862-8228315cfe14:

"rptest.tests.topic_creation_test.CreateTopicsTest.test_no_log_bloat_when_recreating_existing_topics"

new failures in https://buildkite.com/redpanda/redpanda/builds/46568#018e61f9-fa6d-4d1e-8c80-810f4abe6242:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/46568#018e61f9-fa6a-4301-a79c-38f16f6b0550:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46568#018e61f9-fa64-4d64-941f-b783c746bf60:

"rptest.tests.topic_creation_test.CreateTopicsTest.test_no_log_bloat_when_recreating_existing_topics"

new failures in https://buildkite.com/redpanda/redpanda/builds/46568#018e61f9-fa67-48b5-ab62-9c2d18000713:

"rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.num_to_upgrade=0.with_tiered_storage=False"
"rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.num_to_upgrade=0.with_tiered_storage=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/46717#018e76e5-1db9-4459-884e-3cda38001308:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_reset_from_cloud.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_creation_test.CreateTopicsTest.test_no_log_bloat_when_recreating_existing_topics"

new failures in https://buildkite.com/redpanda/redpanda/builds/46717#018e76e5-1db3-4876-91dc-3e97ffba9112:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/46717#018e76e5-1dbe-4cb0-bad6-94ded757fb18:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46717#018e76f7-fcd1-441a-a00f-0362c30c0046:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46717#018e76f7-fcd7-496f-a234-a2ab12290165:

"rptest.tests.topic_creation_test.CreateTopicsTest.test_no_log_bloat_when_recreating_existing_topics"

new failures in https://buildkite.com/redpanda/redpanda/builds/46717#018e76f7-fcd4-4ea2-a1a5-2ff4321e2916:

"rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.num_to_upgrade=0.with_tiered_storage=False"
"rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.num_to_upgrade=0.with_tiered_storage=False"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/46933#018e821a-f5a5-45c6-8123-08a27295983c:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_reset_from_cloud.cloud_storage_type=CloudStorageType.S3"

vbotbuildovich · 2024-03-20T12:55:26Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46492#018e5ba2-f06b-4254-944c-8f621456b8ca

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e1c-34ef-4c03-a951-f8ebde0cd29b

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46523#018e5e1c-34ea-49ca-b480-2c2e5c118ba8

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46568#018e61f9-fa6a-4301-a79c-38f16f6b0550

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46717#018e76f7-fcd1-441a-a00f-0362c30c0046

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46933#018e821a-f5a8-4daf-951b-62c0f66d66e2

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47070#018e8a8d-c1b0-4eac-96e3-fec1940e48cd

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47101#018e8c73-8edc-4b70-a0ae-f77353a9d3e1

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47101#018e8c73-8ed0-43cf-a144-54b5e8c649c8

CLAassistant · 2024-03-21T15:45:14Z

All committers have signed the CLA.

CLAassistant · 2024-03-21T15:45:15Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ andijcr
❌ nobody

nobody seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

andijcr · 2024-03-21T15:51:46Z

force push to fix conflicts and restart ci

src/v/ssx/future-util.h

src/v/cloud_storage/anomalies_detector.cc

src/v/cloud_storage/anomalies_detector.h

src/v/cloud_storage/anomalies_detector.cc

src/v/cloud_storage/anomalies_detector.h

src/v/cluster/topic_recovery_validator.cc

src/v/config/configuration.cc

andrwng

The high level structure seems reasonable. Left a bunch of nits, and some questions about how this is exposed via configs.

Also I think the topic recovery validator and new bits in the anomaly detector could use some unit testing.

dotnwat

minor comments/questions.

how do you feel about test coverage and what testing do you think we need to do?

agree with pretty much all of andrews comments as well.

src/v/ssx/future-util.h

src/v/cloud_storage/anomalies_detector.cc

src/v/model/metadata.h

src/v/model/model.cc

src/v/model/metadata.h

src/v/cluster/types.h

src/v/kafka/server/handlers/topics/types.cc

src/v/cluster/topic_recovery_validator.cc

andijcr · 2024-03-25T17:58:53Z

force push: rebase, fixing merge conflicts, addressed some of the comments.

TBD:

content and level of log messages in topic_recovery_validator
unit test for topic_recovery validator

waiting ci to check regessions

andijcr · 2024-03-27T14:06:00Z

fixed merge conflict, added unit test for anomalies detector, structured topic_recovery_validator with an internal class to factor common parts

TBD: extend TopicRecoveryTest to test more combinations of recovery checks

max_num_segments limits the total number of segment_meta that will be checked in a run. To check a segment meta the code checks the existance of its segment in remote storage. This done with a HEAD request. the limit is counted from most recent to oldest, so that a depth of 1 will only cause the newest segment to be checked. an appropriate low number will limit the scrub to only the stm manifest section. we could limit by - num segments - offset - total time a limit by num segments correlates directly with the speed of the operation while being deterministic. offset could map better with what the user wants, but the current implementation of reverse iteration in segment_meta_cstore makes it a bit involved to implement. total time it's a strong guarantee but the final result depends on the load of the system. This commit implements a limit by num of segments with the parameter max_num_segments. The implementation of this mode it's easier to reason about, and other PRs can implement the other two modes.

andijcr · 2024-03-28T14:27:52Z

last failure was interesting, i had to manually cast the mode enum to its unsigned type, otherwise it would not be deserializable as uint. Other enums are fine. i didn't investigate the reason (too many jumps), but i suspect that ODR violation might be the problem and i'm hitting the wrong impl
from the fix:

inline void rjson_serialize(
  json::Writer<json::StringBuffer>& w, const cluster::recovery_checks& rc) {
    w.StartObject();
    // TODO investigate the reason. seems like a manually casting to uint16_t of
    // rc.mode enum is needed, otherwise we get an assertion at decoding time,
    // when we try to read an unsigned but the json value is not tagged as
    // unsigned
    write_member(
      w,
      "mode",
      static_cast<std::underlying_type_t<decltype(rc.mode)>>(rc.mode));
    write_member(w, "max_segment_depth", rc.max_segment_depth);
    w.EndObject();
}

andijcr · 2024-03-28T15:53:43Z

#17198 issue is known

andrwng

In my last review I hadn't considered adding the option to the create request rather than the topic property, and the more I think about it the more natural it seems. WDYT?

src/v/cloud_storage/remote.cc

src/v/cluster/types.h

src/v/cluster/topic_recovery_validator.cc

andrwng · 2024-03-28T19:31:24Z

src/v/cluster/topic_recovery_validator.cc

+class partition_validator {
+public:
+    // Each partition gets a separate retry_chain_node and a logger tied to it.
+    // From this a common retry_chain_logger is created


Do you think it's worth unit testing this partition validator on its own? Defining it in the .cc file seems to discourage testing

Do you think it's worth unit testing this partition validator on its own? Defining it in the .cc file seems to discourage testing

good point. @andijcr we want unit testing to be the first line of defense for testing new code.

however, if we have some testing that we feel like will be exercising this, i'm ok if we do this as a follow up rather than continuing to iterate on this large PR. if we don't think any of this code is getting exercised, then we need to do that.

tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.[test_prevent_topic_recovery|test_many_partitions] are written to test this code. i'll get a unit test going like for anomalies_detector. the dimension are [validation mode, cardinality 4], [partition damage, cardinality ~6], [download results, cardinatily ~3]

src/v/cluster/topic_recovery_validator.h

andrwng · 2024-03-28T20:09:54Z

src/v/cluster/topics_frontend.cc

+        auto validation_map = co_await maybe_validate_recovery_topic(
+          assignable_config, bucket, _cloud_storage_api.local(), _as.local());
+        if (std::ranges::any_of(


Should we consider the timeout here too?

you mean the timeout for the create_topics operations?
yes we should (https://github.com/redpanda-data/core-internal/issues/1221)

in the ducktape local environment, and from some testing also on a cluster with s3, the current defaults seems to be ok.

setting the cluster default validation to no_check would work as an escape hatch in case this operation becomes the bottleneck.

maybe i can return an optional<map<partition_id, validation_res>> to remove the runtime cost in case of nocheck

partition_manifest_exists chains a check for a serde manifest and a json manifest

describes how to perform validation for a partition. one of manifest existance only manifest file integrity and metadata check on contained objects no_check

cluster level default values cloud_storage_recovery_topic_validation_mode is the recovery_validation_mode cloud_storage_recovery_topic_validation_depth is the validation depth, meaninful when validation_mode is model::recovery_validation_mode::check_manifest_own_metadata_only, cloud_storage_recovery_topic_force_ovveride_cfg is the escape hatch to use cluster level defaults instead of the topic properties

andijcr · 2024-03-29T12:52:44Z

as discussed, removed recovery_checks from topic_properites and updated the rest of the code.

Not checks are performed based on cluster properties.

Lazin · 2024-03-29T14:35:52Z

src/v/cloud_storage/remote.cc

@@ -988,6 +988,40 @@ ss::future<download_result> remote::segment_exists(
      existence_check_type::segment);
 }

+ss::future<remote::partition_manifest_existence>
+remote::partition_manifest_exists(


why do we need a dedicated method to check manifests?

It's to perform an existence check (HEAD instead of GET).
since partition_manifest could be serde/json format, i chained the checks here.

I guess would be difficult to find manifest.json in the wild, but I don't think we have a policy that prevents reading very old data

Lazin · 2024-03-29T14:38:45Z

src/v/cluster/topic_recovery_validator.cc

+  , rev_id_{rev_id}
+  , op_rtc_{retry_chain_node{
+      as,
+      300s,


Setting timeout in the root rtc is very often an error. It only works if the object is used immediately after creation.
You can create a temporary rtc based on this one right before invoking method of the _remote and pass the timeout to the c-tor.

so something like
opt_rtc_{as} {...} // have to check syntax

and in do_validate_manifest_existence() and do_validate_manifest_metadata()

auto rtc=retry_chain_node(&opt_rtc_, 300s, config::shard_local_cfg().cloud_storage_initial_backoff_ms.value())?

would the op_logger_ still work like this, or do i need to move it after rtc?

src/v/cluster/topic_recovery_validator.cc

tools/offline_log_viewer/controller.py

new function that will perform validation for all the partition of a topic, in parallel. returns a map(partition_id, validation_result) that can be used to decide if to block topic recovery in case of fatal error the function will read topic_property::recovery_checks and the cluster defaults to perform either manifest existance or metadata checks, via the anomalies_detector class. parallelism is limited, and the checks are performed with a long timeout to be resiliant against backoff requests. for each partition the possible result can be passed <- validation successful no_manifest <- no manifest in cloud storage, allowed failure <- some inconsistencies that needs intervention download_issue <- likely misconfigured cloud storage or service issue passed/no_manifest can be accepted for recovery, while failure/download_issue should raise an error error logs will point out partitions that did not pass the check

if any partition fails validation, stop creation. a missing manifest is not considered a failure, for the purpose of recovery

for consistency between version, be explicit on the default behavior of checks during recovery

simple test to ensure that a higher number of topics/partitions are not detrimental to the system

and an optional topic to restore, and an optional dict of topic properties to add during recovey

test that missing segments will stop recovery early

when the check mode is no_check, validation is skipped. this commit reduces the runtime cost by not populating the map<partition_id, validation_result>. on the caller side, this is already interpreted as "validation ok"

andijcr · 2024-03-29T21:53:19Z

updated on comments, the major change is match partition_probe to determine which segment_meta anomaly to consider fatal

github-actions bot added the area/redpanda label Mar 6, 2024

andijcr force-pushed the feat/topic_recovery_prevalidation branch from 6ea2d18 to de76f5f Compare March 6, 2024 14:30

Lazin reviewed Mar 13, 2024

View reviewed changes

andijcr force-pushed the feat/topic_recovery_prevalidation branch 2 times, most recently from 2b779c2 to 338a119 Compare March 20, 2024 10:38

andijcr force-pushed the feat/topic_recovery_prevalidation branch 4 times, most recently from 291d47f to 6396208 Compare March 21, 2024 15:45

andijcr force-pushed the feat/topic_recovery_prevalidation branch from 6396208 to a6bf506 Compare March 21, 2024 15:51

andijcr requested review from dotnwat, andrwng, abhijat, nvartolomei, Lazin, jcipar and WillemKauf March 21, 2024 15:52

andrwng reviewed Mar 21, 2024

View reviewed changes

dotnwat reviewed Mar 25, 2024

View reviewed changes

andijcr force-pushed the feat/topic_recovery_prevalidation branch from a6bf506 to fc65155 Compare March 25, 2024 17:42

andijcr force-pushed the feat/topic_recovery_prevalidation branch 2 times, most recently from 64a4d10 to 0fdd16c Compare March 27, 2024 14:04

andijcr requested review from andrwng and dotnwat March 27, 2024 22:25

andijcr force-pushed the feat/topic_recovery_prevalidation branch from dc06b08 to 4119e98 Compare March 28, 2024 12:58

andrwng reviewed Mar 28, 2024

View reviewed changes

andijcr added 2 commits March 29, 2024 11:38

cloud_storage/remote: partition_manifest_exists

4dbdafa

partition_manifest_exists chains a check for a serde manifest and a json manifest

model/metadata recovery_validation_mode

1dc4c7b

describes how to perform validation for a partition. one of manifest existance only manifest file integrity and metadata check on contained objects no_check

andijcr force-pushed the feat/topic_recovery_prevalidation branch from 4119e98 to 5cf31d6 Compare March 29, 2024 11:13

andijcr force-pushed the feat/topic_recovery_prevalidation branch from 5cf31d6 to 36f0e99 Compare March 29, 2024 12:51

andijcr requested a review from andrwng March 29, 2024 12:56

Lazin reviewed Mar 29, 2024

View reviewed changes

andrwng reviewed Mar 29, 2024

View reviewed changes

andijcr added 9 commits March 29, 2024 22:40

cluster/topic_fronted: wire in maybe_validate_recovery_topic

2414589

if any partition fails validation, stop creation. a missing manifest is not considered a failure, for the purpose of recovery

cluster/partition_manager: typo fix

10de62b

tests/topic_recovery_tests: typing and typos

3933ebb

tests/topic_recovery_test: set cluster default recovery check

e49878e

for consistency between version, be explicit on the default behavior of checks during recovery

tests/topic_recovery_test: ManyPartitionCase

ece98c3

simple test to ensure that a higher number of topics/partitions are not detrimental to the system

tests/topic_recovery_test: optional explicit restore of topic

e309e2d

and an optional topic to restore, and an optional dict of topic properties to add during recovey

tests/topic_recovery_test: test_prevent_recovery

48d97f6

test that missing segments will stop recovery early

cluster/topic_recovery_validator: optimize no_check case

0ce9a2a

when the check mode is no_check, validation is skipped. this commit reduces the runtime cost by not populating the map<partition_id, validation_result>. on the caller side, this is already interpreted as "validation ok"

andijcr force-pushed the feat/topic_recovery_prevalidation branch from 36f0e99 to 0ce9a2a Compare March 29, 2024 21:50

andijcr requested a review from andrwng March 29, 2024 21:57

andrwng approved these changes Mar 29, 2024

View reviewed changes

andijcr merged commit f8982ed into redpanda-data:dev Apr 2, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remote partition_manifest validation for topic recovery #16915

remote partition_manifest validation for topic recovery #16915

andijcr commented Mar 6, 2024 •

edited

vbotbuildovich commented Mar 20, 2024 •

edited

vbotbuildovich commented Mar 20, 2024 •

edited

CLAassistant commented Mar 21, 2024 •

edited

CLAassistant commented Mar 21, 2024

andijcr commented Mar 21, 2024

andrwng left a comment

dotnwat left a comment

andijcr commented Mar 25, 2024

andijcr commented Mar 27, 2024

andijcr commented Mar 28, 2024 •

edited

andijcr commented Mar 28, 2024

andrwng left a comment

andrwng Mar 28, 2024

dotnwat Mar 28, 2024

andijcr Mar 29, 2024

andrwng Mar 28, 2024

andijcr Mar 29, 2024

andijcr commented Mar 29, 2024

Lazin Mar 29, 2024

andijcr Mar 29, 2024

Lazin Mar 29, 2024

andijcr Mar 29, 2024

andijcr commented Mar 29, 2024

remote partition_manifest validation for topic recovery #16915

remote partition_manifest validation for topic recovery #16915

Conversation

andijcr commented Mar 6, 2024 • edited

Backports Required

Release Notes

Improvements

vbotbuildovich commented Mar 20, 2024 • edited

vbotbuildovich commented Mar 20, 2024 • edited

CLAassistant commented Mar 21, 2024 • edited

CLAassistant commented Mar 21, 2024

andijcr commented Mar 21, 2024

andrwng left a comment

Choose a reason for hiding this comment

dotnwat left a comment

Choose a reason for hiding this comment

andijcr commented Mar 25, 2024

andijcr commented Mar 27, 2024

andijcr commented Mar 28, 2024 • edited

andijcr commented Mar 28, 2024

andrwng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andijcr commented Mar 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andijcr commented Mar 29, 2024

andijcr commented Mar 6, 2024 •

edited

vbotbuildovich commented Mar 20, 2024 •

edited

vbotbuildovich commented Mar 20, 2024 •

edited

CLAassistant commented Mar 21, 2024 •

edited

andijcr commented Mar 28, 2024 •

edited