Fix oversized allocations in `kafka::fetch_session_cache` #26299

ballard26 · 2025-05-30T21:12:56Z

This PR adds the memory_usage_lower_bound function for chunked_hash_maps to get a lower bound estimate the memory being allocated by the map. It also switches from absl::flat_hash_map to chunked_hash_map in fetch sessions to avoid an oversized allocation.

Backports Required

Release Notes

none

StephanDollberg · 2025-05-30T21:32:10Z

src/v/container/chunked_hash_map.h

+size_t
+memory_usage_lower_bound(const chunked_hash_map<K, V, Hash, EqualTo>& m) {
+    return m.bucket_count()
+             * sizeof(typename chunked_hash_map<K, V>::bucket_type)


I guess this undercounts the bucket vector capacity but it should be good enough, especially with large values.

Yeah, as far as I can tell there isn't a public accessible way to get the bucket vector capacity unfortunately. Otherwise the bound could've been made a lot tighter.

vbotbuildovich · 2025-05-31T00:49:35Z

Retry command for Build#66646

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/shard_placement_test.py::ShardPlacementTest.test_node_join@{"disable_license":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false}
tests/rptest/tests/e2e_shadow_indexing_test.py::ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy@{"cloud_storage_type":2,"short_retention":false}
tests/rptest/tests/partition_balancer_test.py::PartitionBalancerTest.test_rack_awareness
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":false,"with_iceberg":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":false,"with_iceberg":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":false,"with_iceberg":false}
tests/rptest/tests/e2e_shadow_indexing_test.py::ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy@{"cloud_storage_type":2,"short_retention":true}
tests/rptest/tests/partition_balancer_test.py::PartitionBalancerTest.test_unavailable_nodes
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false}
tests/rptest/tests/e2e_shadow_indexing_test.py::ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy@{"cloud_storage_type":1,"short_retention":false}
tests/rptest/tests/partition_balancer_test.py::PartitionBalancerTest.test_fuzz_admin_ops

vbotbuildovich · 2025-05-31T03:50:03Z

CI test results

test results on build#66646

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
AvailabilityTests	test_recovery_after_catastrophic_failure		ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#0197235f-e26b-413b-bae0-f6bc1ec80e13	FLAKY	18/21	upstream reliability is '100.0'. current run reliability is '85.71428571428571'. drift is 14.28571 and the allowed drift is set to 50. The test should PASS
CloudRetentionTest	test_cloud_retention	{"cloud_storage_type": 2, "max_consume_rate_mb": null}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821d-46c0-9f11-830bed867b50	FLAKY	20/21	upstream reliability is '97.51552795031056'. current run reliability is '95.23809523809523'. drift is 2.27743 and the allowed drift is set to 50. The test should PASS
ShadowIndexingWhileBusyTest	test_create_or_delete_topics_while_busy	{"cloud_storage_type": 2, "short_retention": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821d-46c0-9f11-830bed867b50	FAIL	0/1	The test has failed across all retries
ShadowIndexingWhileBusyTest	test_create_or_delete_topics_while_busy	{"cloud_storage_type": 1, "short_retention": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4816-9431-4b97ee55bfe8	FAIL	0/21	The test has failed across all retries
ShadowIndexingWhileBusyTest	test_create_or_delete_topics_while_busy	{"cloud_storage_type": 2, "short_retention": true}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4fe2-afd5-ef66b99fc85a	FAIL	0/11	The test has failed across all retries
PartitionBalancerTest	test_fuzz_admin_ops		ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4816-9431-4b97ee55bfe8	FAIL	0/1	The test has failed across all retries
PartitionBalancerTest	test_rack_awareness		ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821d-46c0-9f11-830bed867b50	FAIL	0/1	The test has failed across all retries
PartitionBalancerTest	test_unavailable_nodes		ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4fe2-afd5-ef66b99fc85a	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 2, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": false, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4fe2-afd5-ef66b99fc85a	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 2, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821d-46c0-9f11-830bed867b50	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4816-9431-4b97ee55bfe8	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": true}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4816-9431-4b97ee55bfe8	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 2, "enable_failures": false, "mixed_versions": true, "with_chunked_compaction": false, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4fe2-afd5-ef66b99fc85a	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 2, "enable_failures": false, "mixed_versions": true, "with_chunked_compaction": true, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821d-46c0-9f11-830bed867b50	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 2, "enable_failures": true, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821d-46c0-9f11-830bed867b50	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": true, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4816-9431-4b97ee55bfe8	FAIL	0/1	The test has failed across all retries
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 2, "enable_failures": true, "mixed_versions": true, "with_chunked_compaction": false, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4fe2-afd5-ef66b99fc85a	FLAKY	7/21	upstream reliability is '98.86363636363636'. current run reliability is '33.33333333333333'. drift is 65.5303 and the allowed drift is set to 50. The test should FAIL
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": true, "mixed_versions": true, "with_chunked_compaction": true, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#01972364-821e-4816-9431-4b97ee55bfe8	FLAKY	11/21	upstream reliability is '100.0'. current run reliability is '52.38095238095239'. drift is 47.61905 and the allowed drift is set to 50. The test should PASS
ShardPlacementTest	test_node_join	{"disable_license": true}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#0197235f-e270-46f6-b45d-af2487bc43b2	FAIL	0/1	The test has failed across all retries
TxAtomicProduceConsumeTest	test_basic_tx_consumer_transform_produce	{"with_failures": true}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66646#0197235f-e270-46f6-b45d-af2487bc43b2	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS

test results on build#66659

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
PartitionReassignmentsTest	test_add_partitions_with_inprogress_reassignments		ducktape	https://buildkite.com/redpanda/redpanda/builds/66659#0197293e-97e4-45b9-aa0e-ec4e0a4b9869	FLAKY	19/21	upstream reliability is '88.39848675914249'. current run reliability is '90.47619047619048'. drift is -2.0777 and the allowed drift is set to 50. The test should PASS

test results on build#66661

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
CloudStorageTimingStressTest	test_cloud_storage_with_partition_moves	{"cleanup_policy": "compact,delete"}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66661#01972ca0-bb45-4f81-9588-70ed11323374	FLAKY	19/21	upstream reliability is '97.84482758620689'. current run reliability is '90.47619047619048'. drift is 7.36864 and the allowed drift is set to 50. The test should PASS
NodePoolMigrationTest	test_migrating_redpanda_nodes_to_new_pool	{"balancing_mode": "off", "cleanup_policy": "compact,delete", "test_mode": "tiered_storage"}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66661#01972ca0-bb45-4c5c-919a-94c05432cd3f	FLAKY	20/21	upstream reliability is '98.828125'. current run reliability is '95.23809523809523'. drift is 3.59003 and the allowed drift is set to 50. The test should PASS

…n_cache

ballard26 requested review from StephanDollberg and dotnwat May 30, 2025 21:12

github-actions bot added the area/redpanda label May 30, 2025

StephanDollberg previously approved these changes May 30, 2025

View reviewed changes

ballard26 dismissed StephanDollberg’s stale review via 652633a June 1, 2025 00:27

ballard26 force-pushed the CORE-10162 branch from 35fff0c to 652633a Compare June 1, 2025 00:27

github-actions bot added the area/build label Jun 1, 2025

ballard26 added 2 commits May 31, 2025 20:27

container: add memory_usage_lower_bound function for chunked hash maps

07453db

kafka/server: switch underlying_t to chunked_hash_map in fetch_sessio…

8bb7227

…n_cache

ballard26 force-pushed the CORE-10162 branch 2 times, most recently from fb32224 to 8bb7227 Compare June 1, 2025 00:28

kafka/server: switch underlying_t to chunked_hash_map in fetch_session

7b0ac8c

ballard26 requested a review from StephanDollberg June 1, 2025 17:29

StephanDollberg approved these changes Jun 2, 2025

View reviewed changes

ballard26 merged commit 4bfbc43 into redpanda-data:dev Jun 2, 2025
17 checks passed

ballard26 deleted the CORE-10162 branch June 2, 2025 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix oversized allocations in `kafka::fetch_session_cache` #26299

Fix oversized allocations in `kafka::fetch_session_cache` #26299

Uh oh!

ballard26 commented May 30, 2025

Uh oh!

StephanDollberg May 30, 2025

Uh oh!

ballard26 May 30, 2025

Uh oh!

vbotbuildovich commented May 31, 2025 •

edited

Loading

Uh oh!

vbotbuildovich commented May 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix oversized allocations in kafka::fetch_session_cache #26299

Fix oversized allocations in kafka::fetch_session_cache #26299

Uh oh!

Conversation

ballard26 commented May 30, 2025

Backports Required

Release Notes

Uh oh!

StephanDollberg May 30, 2025

Choose a reason for hiding this comment

Uh oh!

ballard26 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

vbotbuildovich commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Retry command for Build#66646

Uh oh!

vbotbuildovich commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix oversized allocations in `kafka::fetch_session_cache` #26299

Fix oversized allocations in `kafka::fetch_session_cache` #26299

vbotbuildovich commented May 31, 2025 •

edited

Loading

vbotbuildovich commented May 31, 2025 •

edited

Loading