-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v23.2.x] Backport of #12544 #12596 #12644 #12586 #12737 #12368 #13480 #13784
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Switches the tree to operate on ss::shared_ptr<log>. This is a purely mechnical preparation for the next commit which will remove the pimpl wrapper around. Making this change first means that the two modifications do not need to be in the same commit. The oddities that arise (e.g. optional<shared_ptr>) will be cleaned-up in a subsequent commit. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 5c9e845) Conflicts: src/v/raft/recovery_stm.cc src/v/storage/log_manager.cc src/v/storage/tests/storage_e2e_test.cc
The get_impl wrapper is removed in the next commit as it is now unnecessary. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 98da038)
Now that holders of an ss::shared_ptr alreayd have a polymorphic pointer to the underlying implementation there is no need for an extra accessor. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 4674479) Conflicts: src/v/storage/log_manager.cc
Now a nullptr can serve the same purpose. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit a6df0a1)
Not strictly necessary, but a nice safety net since the only users of the log should be holding a shared pointer anyway. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit f722387)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 89f3ba3)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 9b8a051) Conflicts: src/v/storage/log_manager.cc
Exposes the segment set through the log interface, and removes the dynamic_cast to disk_log_impl. The segment is forward declared as its header includes some bits that wreck compilation due to naming conflicts in other parts of the tree. This will be addressed later. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 17f5354)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 5a18e56)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit a750cc9)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit a85ece4)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit f5fbe1c)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 8023f90)
Exposes the probe. The probe is forward declared because the probe.h header brings a long some bits that wreck compilation in a few places with naming clases, which will be fixed up later. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit d57a518)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 1dcbc8b)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 98c4d4d)
8f790f3
to
3d24ead
Compare
This can be useful in tests that want to truncate up to the max collectible offset. (cherry picked from commit 3f31502)
Some tests may want to control housekeeping manually. This makes this easier by optionally ommitting the start() of the housekeeping loop. (cherry picked from commit a4f4bc2)
Current tests use acks=1 because there's usually just one node. I have an upcoming test that will use multiple nodes, and I'd like to wait on acks. (cherry picked from commit 827c511)
When instantiating multiple fixtures, new fixtures overwrite node properties like the Kafka port. This makes it annoying to use Kafka transports against multiple fixtures. This patch makes this easier by caching the port and using it in the make_kafka_client() call. read_replica_test is updated to accommodate this change. (cherry picked from commit 4d71121)
I have an upcoming test that will use multiple replicas. (cherry picked from commit 3c03b82)
Similar to cluster_test_fixture, but brings in tiered storage. (cherry picked from commit 6443a8b)
Currently we have cooperative_spin_wait_with_timeout that throws if a condition isn't met after a given timeout. The way this typically shows up in tests is the exception is caught by the test fiber, the test's destructor is called, and the generic timeout exception is logged after everything is cleaned up. This makes it difficult to track down issues because it's rarely clear what timed out. This patch adds a new boost_require_eventually macro to BOOST_FAIL on timeout, indicating the log line that failed. Tests that use this will timeout like: ``` INFO 2023-08-03 12:53:16,023 [shard 0] cluster - members_manager.cc:373 - applying finish_reallocations_cmd, offset: 10, node id: 1 /home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(272): fatal error: in "test_delete_from_stm_consume": Timed out at /home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc:272 INFO 2023-08-03 12:53:16,638 [shard 0] kafka - server.cc:278 - kafka rpc protocol - Stopping 1 listeners INFO 2023-08-03 12:53:16,638 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 2 connections ``` Instead of logs like below that don't indicate what timed out: ``` INFO 2023-08-03 12:56:48,354 [shard 0] cluster - members_manager.cc:373 - applying finish_reallocations_cmd, offset: 10, node id: 1 INFO 2023-08-03 12:56:48,950 [shard 0] kafka - server.cc:278 - kafka rpc protocol - Stopping 1 listeners INFO 2023-08-03 12:56:48,950 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 2 connections ... INFO 2023-08-03 12:56:49,003 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 0 connections unknown location(0): fatal error: in "test_delete_from_stm_consume": seastar::timed_out_error: timedout /home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(270): last checkpoint /home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(256): Leaving test case "test_delete_from_stm_consume"; testing time: 4194410us ``` I don't love that boost permeates the interface, but until we move to another test framework, it seems reasonable to tie the implementation and the macro name to boost. (cherry picked from commit 372c159)
This will make it easier to figure out what is timing out, when it happens. (cherry picked from commit de08acd)
Seems to have been partially cleaned up in redpanda-data#9121. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit cfd6325)
The additional point where data is updated seems to have been overlooked in redpanda-data@26fb548#diff-708b13fb33ad235d5c420e38131e7188daeee60ff6a4e1054ed480a36142ccb2 Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 3981467)
When reclaimable space is calculated the size above local retention is cached so that it is available to the health report subsystem. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit dd8fbc8)
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit fb8657b)
Tests that the reclaimable local size from cloud storage topic is reflected back through health report. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit ac6a307)
This happens in several places so factoring it out into a reusable method. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit 9f6ada4) Conflicts: src/v/storage/log_manager.cc
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit fa9fb6d)
(cherry picked from commit 9728f4d)
The reason is twofold: 1. We have another option for controlling the concurrency of the movement batch - partition_autobalancing_concurrent_moves which is easier to understand 2. Part of initial motivation is gone - in the past we waited for the whole batch to finish before scheduling the next movements so it was important for the batch to finish in a reasonable amount of time, but now we can schedule new movement while the old ones are still in progress. (cherry picked from commit e9721e2)
With the introduction of space management data eviction becomes lazy: partitions can grow beyond their eviction limit if there is sufficient space to do it and only if disk space pressure becomes high, this "extra" data is evicted. For partition balancing this means two things: 1) partitions sizes can vary widely between replicas and 2) when predicting how much space the partition will take on a node if it is moved there, we must take into account the possibility of reclaim (otherwise we won't allow moves that are perfectly possible if reclaim is applied). So this commit we 1) store current sizes in a per-replica map instead of a single value and 2) use non-reclaimable size when predicting the partition size on the target replica. (cherry picked from commit 3480ba8)
(cherry picked from commit 794d280)
(cherry picked from commit dde8911)
k8s issues are #13669 |
mmaslankaprv
approved these changes
Sep 29, 2023
dotnwat
approved these changes
Oct 3, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of PR #12544
Backport of PR #12596
Backport of PR #12644
Backport of PR #12586
Backport of PR #12737
Backport of PR #12368
Backport of PR #13480