Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: assert when appending empty batch #14745

Merged

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Nov 3, 2023

Empty batches have been seen to be not handled well in some cases (e.g. in v23.1 and below, read replicas could produce empty batches). This introduces an assert ensuring we no longer produce them.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@andrwng
Copy link
Contributor Author

andrwng commented Nov 6, 2023

/cdt

@andrwng
Copy link
Contributor Author

andrwng commented Nov 8, 2023

/cdt
rp_version=build
dt-log-level=debug

@andrwng
Copy link
Contributor Author

andrwng commented Nov 11, 2023

/ci-repeat 5

@andrwng andrwng marked this pull request as ready for review November 11, 2023 08:57
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 11, 2023

new failures in https://buildkite.com/redpanda/redpanda/builds/40917#018bbdc5-6e31-45e1-b7db-783b087a47d1:

"rptest.tests.offset_for_leader_epoch_archival_test.OffsetForLeaderEpochArchivalTest.test_querying_archive"

new failures in https://buildkite.com/redpanda/redpanda/builds/46954#018e833f-7228-4e5f-9fdc-1c82a6ebe265:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/46954#018e8351-6c39-433d-9587-d3c808011d16:

"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.TIRED_STORAGE.cleanup_policy=compact"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact"
"rptest.tests.compaction_e2e_test.CompactionWithRecoveryTest.test_tx_compaction_with_recovery"

new failures in https://buildkite.com/redpanda/redpanda/builds/46954#018e8351-6c3c-4155-9c6a-402d0e0ddbb0:

"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact.delete"
"rptest.tests.partition_balancer_test.PartitionBalancerTest.test_fuzz_admin_ops"

new failures in https://buildkite.com/redpanda/redpanda/builds/46954#018e8351-6c33-495b-95ac-cedb85c2d351:

"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.TIRED_STORAGE.cleanup_policy=compact"
"rptest.tests.partition_balancer_test.PartitionBalancerTest.test_rack_awareness"

new failures in https://buildkite.com/redpanda/redpanda/builds/46954#018e8351-6c36-4acf-8de8-8ae748679aea:

"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact.delete"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.TIRED_STORAGE.cleanup_policy=compact.delete"
"rptest.tests.partition_balancer_test.PartitionBalancerTest.test_unavailable_nodes"
"rptest.tests.partition_move_interruption_test.PartitionMoveInterruption.test_cancelling_partition_move_x_core.replication_factor=3.unclean_abort=False.recovery=restart_recovery.compacted=True"

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 11, 2023

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/40917#018bbdc5-6e31-45e1-b7db-783b087a47d1

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/40917#018bbdc5-6e35-454b-8ba8-722c6f715291

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/40917#018bbdc5-6e30-4af5-a5bd-dac7361af2dc

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/40917#018bbdd3-1575-487c-b130-ea12e377c7d2

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/40917#018bbdd3-157a-49c5-95de-54e8a3c88fc1

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46954#018e833f-7228-4e5f-9fdc-1c82a6ebe265

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46954#018e8351-6c3c-4155-9c6a-402d0e0ddbb0

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46954#018e8351-6c39-433d-9587-d3c808011d16

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46954#018e8351-6c36-4acf-8de8-8ae748679aea

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47099#018e8c68-0097-41c1-ad02-7ef185333795

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47422#018eac14-198f-4e4f-ad3c-653d1cf0cdfb

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47432#018eacf1-0882-4635-a59d-23a406342bcf

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47461#018eaf4b-db78-4b71-b34d-eea4b7cb975b

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47461#018eaf6e-e1a4-4624-8e66-6e9c5e532dea

@dotnwat
Copy link
Member

dotnwat commented Mar 28, 2024

is this still useful? seems like maybe?

@andrwng andrwng force-pushed the storage-empty-batch-append-assert branch from cffa497 to 1424e44 Compare March 28, 2024 03:16
@andrwng
Copy link
Contributor Author

andrwng commented Mar 28, 2024

is this still useful? seems like maybe?

Whoops! I don't recall if there was anything that fell out of the CDT runs. Rebased, and will rerun.

@andrwng andrwng force-pushed the storage-empty-batch-append-assert branch from 1424e44 to 8811027 Compare March 29, 2024 21:13
dotnwat
dotnwat previously approved these changes Apr 1, 2024
Empty batches have been seen to be not handled well in some cases (e.g.
in v23.1 and below, read replicas could produce empty batches). This
introduces an assert ensuring we no longer produce them.

Note that specifically, an empty batch refers to the offsets, where the
inclusive bounds of the batch are [N, N-1].

Here's an example of one bug caused by empty batches:
redpanda-data@036f04d
@andrwng andrwng force-pushed the storage-empty-batch-append-assert branch from 77a94bc to 6753247 Compare April 5, 2024 05:14
@andrwng andrwng requested review from dotnwat and bharathv April 5, 2024 22:35
@dotnwat
Copy link
Member

dotnwat commented Apr 6, 2024

Presumably none of the write-caching stuff would invalidate this assertion? cc @bharathv @nvartolomei

Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably none of the write-caching stuff would invalidate this assertion?

Nope.

@andrwng andrwng merged commit d4b5a07 into redpanda-data:dev Apr 6, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants