Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write caching - raft - follow up fixes #17215

Merged
merged 12 commits into from
Mar 27, 2024

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Mar 20, 2024

Summary of changes:

  • QoL fixes like better debug/trace logging around voting/notifications etc.
  • Generic code cleanup in raft without any logical changes.
  • Adds replication_monitor, a new way to track replication status of a locally appended offsets. This data structure unifies tracking of all waiters (write caching/otherwise) in one place. This was needed because majority replication needs to rely on committed offset updates to detect truncation (see individual commits for details) but it was awkward for the write caching code to listen for committed offset updates. The new data structure tracks everything in one place streamlining the logic.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@bharathv
Copy link
Contributor Author

/ci-repeat 5

@bharathv
Copy link
Contributor Author

/dt

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 20, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/46522#018e5e04-1404-41b2-a5b6-987a58ab3512:

"rptest.tests.control_character_flag_test.ControlCharacterPermittedAfterUpgrade.test_upgrade_from_pre_v23_2.initial_version=.23.1.1"

new failures in https://buildkite.com/redpanda/redpanda/builds/46535#018e5ef8-f2ac-44f6-ab65-712c14cdc1b5:

"rptest.tests.partition_balancer_test.PartitionBalancerTest.test_fuzz_admin_ops"

new failures in https://buildkite.com/redpanda/redpanda/builds/46535#018e5efc-eda8-402c-b84f-e5e1a628e3ed:

"rptest.tests.control_character_flag_test.ControlCharacterPermittedAfterUpgrade.test_upgrade_from_pre_v23_2.initial_version=.23.1.1"

new failures in https://buildkite.com/redpanda/redpanda/builds/46535#018e5efc-eda7-40c0-9175-9a9f6d448725:

"rptest.tests.control_character_flag_test.ControlCharacterPermittedAfterUpgrade.test_upgrade_from_pre_v23_2.initial_version=.23.1.1"

new failures in https://buildkite.com/redpanda/redpanda/builds/46535#018e5efc-edb1-4476-a60c-b136c7214250:

"rptest.tests.recovery_mode_test.DisablingPartitionsTest.test_disable"

new failures in https://buildkite.com/redpanda/redpanda/builds/46609#018e652a-5f04-4c58-b7e3-d2c5999ce67e:

"rptest.tests.e2e_shadow_indexing_test.ShadowIndexingInfiniteRetentionTest.test_segments_not_deleted.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.partition_move_interruption_test.PartitionMoveInterruption.test_forced_cancellation"
"rptest.tests.simple_e2e_test.SimpleEndToEndTest.test_consumer_interruption"
"rptest.tests.partition_movement_test.PartitionMovementTest.test_dynamic.num_to_upgrade=2"
"rptest.tests.pandaproxy_test.PandaProxyAutoAuthTest.test_restarts.move_controller_leader=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/46609#018e652a-5eff-4431-b459-9cd5e3383395:

"rptest.tests.partition_force_reconfiguration_test.PartitionForceReconfigurationTest.test_basic_reconfiguration.acks=1.restart=False.controller_snapshots=True"
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.schema_registry_test.SchemaRegistryAutoAuthTest.test_restarts.move_controller_leader=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/46609#018e652a-5efb-441b-9693-92742b312ef2:

"rptest.tests.scram_test.ScramBootstrapUserTest.test_invalid_scram_mechanism.mechanism=sCrAm-ShA-512.expect_fail=True"
"rptest.tests.partition_move_interruption_test.PartitionMoveInterruption.test_cancelling_partition_move_x_core.replication_factor=1.unclean_abort=True.recovery=restart_recovery.compacted=False"
"rptest.tests.partition_move_interruption_test.PartitionMoveInterruption.test_cancelling_partition_move_x_core.replication_factor=3.unclean_abort=True.recovery=restart_recovery.compacted=False"
"rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_working_node.delete_topic=True.tick_interval=5000"
"rptest.tests.partition_movement_test.PartitionMovementTest.test_bootstrapping_after_move.num_to_upgrade=0"
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46609#018e652a-5f08-4109-b463-01e42b9a206e:

"rptest.tests.e2e_shadow_indexing_test.ShadowIndexingInfiniteRetentionTest.test_segments_not_deleted.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.timely_shutdown_test.ShutdownTest.test_timely_shutdown_with_failures"
"rptest.tests.data_transforms_test.DataTransformsTest.test_tracked_offsets_cleaned_up"

new failures in https://buildkite.com/redpanda/redpanda/builds/46609#018e6552-8dae-4cc2-8012-d1af9a434817:

"rptest.tests.data_transforms_test.DataTransformsTest.test_tracked_offsets_cleaned_up"

new failures in https://buildkite.com/redpanda/redpanda/builds/46609#018e6552-8da9-4d7d-b23a-59b3fd8ffb39:

"rptest.tests.partition_balancer_test.PartitionBalancerTest.test_unavailable_nodes"

new failures in https://buildkite.com/redpanda/redpanda/builds/46628#018e6766-1dc7-40a8-9ecb-da678bdf4c04:

"rptest.tests.data_transforms_test.DataTransformsTest.test_tracked_offsets_cleaned_up"

new failures in https://buildkite.com/redpanda/redpanda/builds/46628#018e6777-e975-41b3-8660-8ce82db2be8d:

"rptest.tests.data_transforms_test.DataTransformsTest.test_tracked_offsets_cleaned_up"

new failures in https://buildkite.com/redpanda/redpanda/builds/46741#018e7849-24e5-46ee-bb20-cea35735d606:

"rptest.tests.data_transforms_test.DataTransformsTest.test_tracked_offsets_cleaned_up"

new failures in https://buildkite.com/redpanda/redpanda/builds/46753#018e78ce-723e-488c-92c1-76dc8a27a9ac:

"rptest.tests.e2e_iam_role_test.STSRoleFetchTests.test_write"

new failures in https://buildkite.com/redpanda/redpanda/builds/46844#018e7da0-6931-4717-9d10-c68e2b1df898:

"rptest.tests.data_transforms_test.DataTransformsTest.test_tracked_offsets_cleaned_up"

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 21, 2024

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46522#018e5e04-1404-41b2-a5b6-987a58ab3512

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46522#018e5e04-1401-4548-bdfe-ee78396dda63

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46522#018e5e16-5d11-45b4-8be4-4e3b40ca9479

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46535#018e5ef8-f2a8-4d36-b322-6a79b91e410a

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46535#018e5ef8-f2ac-4ff4-b94a-75755f52e2b3

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46535#018e5ef8-f2ab-4e33-b8bd-e47123c83092

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46535#018e5efc-edb2-4ca7-a0e5-4470fee92d8f

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46535#018e5ef8-f2a5-4da4-af06-a5dfa6a27bde

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46535#018e5ef8-f2a4-4807-b2dc-64ce7aaad813

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46535#018e5efc-eda7-4d32-a944-0cc48f294136

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46535#018e5efc-eda8-4b2d-8e3c-a0f00548134e

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46609#018e652a-5f04-4c58-b7e3-d2c5999ce67e

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46609#018e652a-5eff-4431-b459-9cd5e3383395

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46609#018e652a-5efb-441b-9693-92742b312ef2

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46609#018e652a-5f08-4109-b463-01e42b9a206e

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46609#018e6552-8da7-4caf-a5e6-9f4e65ad2096

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46753#018e78e0-7762-4af0-b230-862816e4d545

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46844#018e7da0-6935-4295-beb8-7d49a4e18022

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46866#018e7f3b-974a-4f18-a2a5-1d88db3e5cab

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46866#018e7f4d-830c-4c57-a0e6-f22ba17c8574

@bharathv
Copy link
Contributor Author

/ci-repeat 5

@bharathv
Copy link
Contributor Author

/dt

@bharathv bharathv force-pushed the wc_followups branch 2 times, most recently from ad6dc92 to e897352 Compare March 22, 2024 15:40
@bharathv
Copy link
Contributor Author

/dt

@bharathv
Copy link
Contributor Author

/dt

@bharathv
Copy link
Contributor Author

/dt

@bharathv bharathv marked this pull request as ready for review March 26, 2024 02:37
Comment on lines 130 to 131
// - There is new entry from a greater term replacing the
// appended entries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment doesn't seem to reflect the code, i think it should say "different term"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I meant by "greater term", but renamed it to "different term" as suggested.

This is a raft internal utility that keeps track of all pending
replication waiters. The waiter here is replicate_entries_stm that
successfully appended something to a local log and waiting for the
append to translate into a successful replication/truncation.
mmaslankaprv
mmaslankaprv previously approved these changes Mar 27, 2024
prior to this change, empty appends were silently passing with incorrect
last offsets set. This is not possible via kafka path because we have
checks that ensure that the records are non empty. Certain internal
paths that use raft APIs directly can potentially try to replicate empty
data, in which case appropriate error is thrown.
@piyushredpanda piyushredpanda merged commit 8aabffa into redpanda-data:dev Mar 27, 2024
17 checks passed
@bharathv bharathv self-assigned this Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants