demote ERROR message to DEBUG for timequerys at the edge of spillover retention #16302

andijcr · 2024-01-26T13:49:09Z

Consider this trace:
A list_offsets request with a timestamp hits the spillover region, but at the same time, retention happens, and the first few spillover manifests become eligible for GC.
In this case, the timequery will still start from the first spillover manifest, but if it hits one of the collectible manifests, this will be logged as an error, and no result will be returned.
The kafka client is okay with this result, but the ERROR line will trigger the tests.

In theory, we could restrict the search space (like it was attempted here ), but we are dealing with suspension points in an unstable moment. We could hit various edge cases (what if retention hits the whole spillover region?)
So this pr recognize this category of out_of_range error and logs a DEBUG message instead of an ERROR.

The change is in the last commit; the previous ones are minor things found while studying this trace.

annotated trace below

// list_offset request with timestamp at the edge of retention
TRACE 2024-01-21 07:20:50,130 [shard 0:main] kafka - requests.cc:96 - [172.16.16.9:52058] processing name:list_offsets, key:2, version:4 for KgoVerifierSeqConsumer-0-139932971794832, mem_units: 8182, ctx_size: 41
TRACE 2024-01-21 07:20:50,130 [shard 0:main] kafka - handler.h:69 - [client_id: {KgoVerifierSeqConsumer-0-139932971794832}] handling list_offsets v4 request {replica_id=-1 isolation_level=0 topics={{name={test-topic} partitions={{partition_index=0 current_leader_epoch=1 timestamp={timestamp: 1705821612756} max_num_offsets=0}}}}}
TRACE 2024-01-21 07:20:50,130 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:659 - Linearizable barrier requested. Log state: {start_offset:15197, committed_offset:15637, committed_offset_term:1, dirty_offset:15637, dirty_offset_term:1}, flushed offset: 15637
TRACE 2024-01-21 07:20:50,130 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:689 - Sending empty append entries request to {id: {3}, revision: {28}}
TRACE 2024-01-21 07:20:50,130 [shard 1:main] rpc - transport.cc:352 - Dispatched request with sequence: 15205, correlation_idx: 15205, pending queue_size: 0, target_address: {host: docker-rp-24, port: 33145}
TRACE 2024-01-21 07:20:50,130 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:689 - Sending empty append entries request to {id: {2}, revision: {28}}
TRACE 2024-01-21 07:20:50,130 [shard 1:main] rpc - transport.cc:352 - Dispatched request with sequence: 15206, correlation_idx: 15206, pending queue_size: 0, target_address: {host: docker-rp-10, port: 33145}
TRACE 2024-01-21 07:20:50,131 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:345 - Append entries response: {node_id: {id: {2}, revision: {28}}, target_node_id{id: {1}, revision: {28}}, group: {1}, term:{1}, last_dirty_log_index:{15637}, last_flushed_log_index:{15637}, last_term_base_offset:{-9223372036854775808}, result: success, may_recover:0}
TRACE 2024-01-21 07:20:50,131 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:443 - Updated node {id: {2}, revision: {28}} last committed log index: 15637
TRACE 2024-01-21 07:20:50,131 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:569 - Updated node {id: {2}, revision: {28}} match 15637 and next 15638 indices
TRACE 2024-01-21 07:20:50,131 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:743 - Linearizable offset: 15637

// proceed with timequery to learn the offset. this will returns nullopt 
DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cluster - partition.cc:553 - timequery (cloud) {kafka/test-topic/0} t={timestamp: 1705821612756} max_offset(k)=15198


DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber3 kafka/test-topic/0] - remote_partition.cc:1113 - remote partition make_reader invoked (waiting for units), config: {start_offset:{7350}, max_offset:{15198}, min_bytes:0, max_bytes:2048, type_filter:batch_type::raft_data, first_timestamp:{timestamp: 1705821612756}, bytes_consumed:0, over_budget:0, strict_max_bytes:0, skip_batch_cache:0, abortable:1, aborted:0}, num segments 38
TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber3 kafka/test-topic/0] - remote_partition.cc:1125 - remote partition make_reader invoked (units acquired), config: {start_offset:{7350}, max_offset:{15198}, min_bytes:0, max_bytes:2048, type_filter:batch_type::raft_data, first_timestamp:{timestamp: 1705821612756}, bytes_consumed:0, over_budget:0, strict_max_bytes:0, skip_batch_cache:0, abortable:1, aborted:0}, num segments 38
TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:226 - Constructing reader {kafka/test-topic/0}

// partition_record_batch_reader_impl::start will eventually *fail* at init_cursor. query will be done by model::timestamp
	DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:231 - abort_source is set
	TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:250 - partition_record_batch_reader_impl:: start: creating cursor: {start_offset:{7350}, max_offset:{15198}, min_bytes:0, max_bytes:2048, type_filter:batch_type::raft_data, first_timestamp:{timestamp: 1705821612756}, bytes_consumed:0, over_budget:0, strict_max_bytes:0, skip_batch_cache:0, abortable:1, aborted:0}
	
	// curson is begin initialized in the range [_stm_manifest.get_archive_clean_offset(), last offset], stm range is [_stm_manifest.get_start_offset(), last offset]
		DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:602 - creating_cursor: begin: 7543, end: 15594, stm_range[{14142}/15594]

		// seek(query)
		DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:178 - Manifest is not initialized

		// _view.get_materialized_manifest(q) (it will return a value)
			// query has to be performed in the spillovers (this returns false -  in_stm: _stm_manifest.get_spillover_map().last_segment()->max_timestamp < ts;)
			DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:873 - Checking timestamp {timestamp: 1705821612756} using timequery
			DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:873 - Checking timestamp {timestamp: 1705821612756} using timequery
			// search_spillover_manifests(q)
				DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:1472 - search_spillover_manifest query: {{timestamp: 1705821612756}}, num manifests: 6, first: {timestamp: 1705821614475}, last: {timestamp: 1705821644471}
				// search starts from the first manifest in list, and returns a manifest in the range [6231, 7542]. that is below the cursor range, and it is in fact below the _stm_manifest.archive_clean_offset_range()
				DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:1311 - Found spillover manifest meta: {{is_compacted: false, size_bytes: 42133123, base_offset: 6231, committed_offset: 7542, base_timestamp: {timestamp: 1705821614475}, max_timestamp: {timestamp: 1705821619470}, delta_offset: 161, ntp_revision: 28, archiver_term: 1, segment_term: 1, delta_offset_end: 193, sname_format: {v3}, metadata_size_hint: 3128}}
				DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - materialized_manifest_cache.cc:220 - Cache GET will return [{kafka/test-topic/0}:6231]
			// returned manifest is not in range: it spans a range that comes before the cursor 
			DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:268 - STM manifest range: [6231/7542], cursor range: [7543/15594]
		// reject the result because it is not in the range of the cursor
		DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:226 - Manifest is not in the specified range, range: [7543/15594]

		DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:634 - failed to seek to {{timestamp: 1705821612756}}, offset out of valid range
	ERROR 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:551 - Failed to query spillover manifests: cloud_storage::error_outcome:10, query: {{timestamp: 1705821612756}}
	TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:255 - partition_record_batch_reader_impl:: start: created cursor: {start_offset:{7350}, max_offset:{15198}, min_bytes:0, max_bytes:2048, type_filter:batch_type::raft_data, first_timestamp:{timestamp: 1705821612756}, bytes_consumed:0, over_budget:0, strict_max_bytes:0, skip_batch_cache:0, abortable:1, aborted:0}

TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:261 - Destructing reader {kafka/test-topic/0}
DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber3 kafka/test-topic/0] - remote_partition.cc:1167 - timequery: 0 batches

Fixes #15489
Fixes #16026

Backports Required

Release Notes

none

(cherry picked from commit 79585cd)

prev if block is a bouncer condition in the form if(is_stm(q)) so it's assured (!in_stm(q)) (cherry picked from commit ecc24d2)

(cherry picked from commit b7a7926)

log a warning if when trying to get a async_manifest_view_cursor for a timequery the result is out_of_range. such error can happen if retention is kicking in and deleting some or all spillover manifests: in this case there is a window where the manifest are reclaimable but still kept in memory, and the timequery hits one of the manifest in the reclaimable range. handling this as a warning and no result is acceptable because the kafka client issuing the request can handle this failure, and otherwise handling this edge cases would increase the complexity of the callstack

Lazin · 2024-01-26T15:33:02Z

src/v/cloud_storage/remote_partition.cc

-                      log_start_offset.value()());
-                    co_return;
-                }
+              && ss::visit(


why ss::visit is better than hods_alternative/get?

OK, now I see that it's not the full change

vbotbuildovich · 2024-01-26T16:26:46Z

new failures in https://buildkite.com/redpanda/redpanda/builds/44350#018d4656-2621-46f4-8e9b-d464c2fe8463:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.ABS"

andijcr · 2024-01-26T16:58:36Z

failure is likely unrelated "RuntimeError: Internal object storage scrub detected fatal anomalies: [{'ns': 'kafka', 'topic': '__consumer_offsets', 'partition': 15, 'revision_id': 32, 'missing_segments': ['f9fd300e/kafka/__consumer_offsets/15_32/44-219-21700-4-v1.log.5'], 'last_complete_scrub_at': 1706284164049}]"
investigating...

vbotbuildovich · 2024-01-26T18:26:12Z

/backport v23.3.x

vbotbuildovich · 2024-01-26T18:26:13Z

/backport v23.2.x

vbotbuildovich · 2024-01-26T18:27:05Z

Oops! Something went wrong.

Workflow run logs.

vbotbuildovich · 2024-01-26T18:27:12Z

Oops! Something went wrong.

Workflow run logs.

andijcr added 3 commits January 26, 2024 11:39

cloud_storage/remote_partition: remove unused variable

b34c114

(cherry picked from commit 79585cd)

cloud_storage/async_manifest_view: remove redundant call to in_stm

399406b

prev if block is a bouncer condition in the form if(is_stm(q)) so it's assured (!in_stm(q)) (cherry picked from commit ecc24d2)

cloud_storage/async_manifest_view: log message fix

a28c731

(cherry picked from commit b7a7926)

github-actions bot added the area/redpanda label Jan 26, 2024

andijcr changed the title ~~Issue/15489/timequery racing with retention~~ demote ERROR message to DEBUG for timequerys at the edge of spillover retention Jan 26, 2024

andijcr marked this pull request as ready for review January 26, 2024 13:54

andijcr requested a review from Lazin January 26, 2024 13:55

andijcr mentioned this pull request Jan 26, 2024

[DRAFT] Issue/15489/error outcome 10 timequery #16288

Closed

7 tasks

andijcr added 2 commits January 26, 2024 15:01

cloud_storage/remote_partition: refactor variant access to use ss::visit

9a541f6

andijcr force-pushed the issue/15489/timequery_racing_with_retention branch from 9c723c0 to 4ca7281 Compare January 26, 2024 14:01

andijcr requested review from dotnwat, abhijat, nvartolomei and andrwng January 26, 2024 14:16

Lazin approved these changes Jan 26, 2024

View reviewed changes

piyushredpanda merged commit a96ca93 into redpanda-data:dev Jan 26, 2024
14 of 17 checks passed

vbotbuildovich mentioned this pull request Jan 26, 2024

[v23.2.x] demote ERROR message to DEBUG for timequerys at the edge of spillover retention #16305

Closed

vbotbuildovich mentioned this pull request Jan 26, 2024

[v23.3.x] demote ERROR message to DEBUG for timequerys at the edge of spillover retention #16306

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demote ERROR message to DEBUG for timequerys at the edge of spillover retention #16302

demote ERROR message to DEBUG for timequerys at the edge of spillover retention #16302

andijcr commented Jan 26, 2024 •

edited

Lazin Jan 26, 2024

Lazin Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

andijcr commented Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

demote ERROR message to DEBUG for timequerys at the edge of spillover retention #16302

demote ERROR message to DEBUG for timequerys at the edge of spillover retention #16302

Conversation

andijcr commented Jan 26, 2024 • edited

Backports Required

Release Notes

Lazin Jan 26, 2024

Choose a reason for hiding this comment

Lazin Jan 26, 2024

Choose a reason for hiding this comment

vbotbuildovich commented Jan 26, 2024

andijcr commented Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

vbotbuildovich commented Jan 26, 2024

andijcr commented Jan 26, 2024 •

edited