Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demote ERROR message to DEBUG for timequerys at the edge of spillover retention #16302

Conversation

andijcr
Copy link
Contributor

@andijcr andijcr commented Jan 26, 2024

Consider this trace:
A list_offsets request with a timestamp hits the spillover region, but at the same time, retention happens, and the first few spillover manifests become eligible for GC.
In this case, the timequery will still start from the first spillover manifest, but if it hits one of the collectible manifests, this will be logged as an error, and no result will be returned.
The kafka client is okay with this result, but the ERROR line will trigger the tests.

In theory, we could restrict the search space (like it was attempted here ), but we are dealing with suspension points in an unstable moment. We could hit various edge cases (what if retention hits the whole spillover region?)
So this pr recognize this category of out_of_range error and logs a DEBUG message instead of an ERROR.

The change is in the last commit; the previous ones are minor things found while studying this trace.

annotated trace below

// list_offset request with timestamp at the edge of retention
TRACE 2024-01-21 07:20:50,130 [shard 0:main] kafka - requests.cc:96 - [172.16.16.9:52058] processing name:list_offsets, key:2, version:4 for KgoVerifierSeqConsumer-0-139932971794832, mem_units: 8182, ctx_size: 41
TRACE 2024-01-21 07:20:50,130 [shard 0:main] kafka - handler.h:69 - [client_id: {KgoVerifierSeqConsumer-0-139932971794832}] handling list_offsets v4 request {replica_id=-1 isolation_level=0 topics={{name={test-topic} partitions={{partition_index=0 current_leader_epoch=1 timestamp={timestamp: 1705821612756} max_num_offsets=0}}}}}
TRACE 2024-01-21 07:20:50,130 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:659 - Linearizable barrier requested. Log state: {start_offset:15197, committed_offset:15637, committed_offset_term:1, dirty_offset:15637, dirty_offset_term:1}, flushed offset: 15637
TRACE 2024-01-21 07:20:50,130 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:689 - Sending empty append entries request to {id: {3}, revision: {28}}
TRACE 2024-01-21 07:20:50,130 [shard 1:main] rpc - transport.cc:352 - Dispatched request with sequence: 15205, correlation_idx: 15205, pending queue_size: 0, target_address: {host: docker-rp-24, port: 33145}
TRACE 2024-01-21 07:20:50,130 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:689 - Sending empty append entries request to {id: {2}, revision: {28}}
TRACE 2024-01-21 07:20:50,130 [shard 1:main] rpc - transport.cc:352 - Dispatched request with sequence: 15206, correlation_idx: 15206, pending queue_size: 0, target_address: {host: docker-rp-10, port: 33145}
TRACE 2024-01-21 07:20:50,131 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:345 - Append entries response: {node_id: {id: {2}, revision: {28}}, target_node_id{id: {1}, revision: {28}}, group: {1}, term:{1}, last_dirty_log_index:{15637}, last_flushed_log_index:{15637}, last_term_base_offset:{-9223372036854775808}, result: success, may_recover:0}
TRACE 2024-01-21 07:20:50,131 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:443 - Updated node {id: {2}, revision: {28}} last committed log index: 15637
TRACE 2024-01-21 07:20:50,131 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:569 - Updated node {id: {2}, revision: {28}} match 15637 and next 15638 indices
TRACE 2024-01-21 07:20:50,131 [shard 1:main] raft - [group_id:1, {kafka/test-topic/0}] consensus.cc:743 - Linearizable offset: 15637

// proceed with timequery to learn the offset. this will returns nullopt 
DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cluster - partition.cc:553 - timequery (cloud) {kafka/test-topic/0} t={timestamp: 1705821612756} max_offset(k)=15198


DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber3 kafka/test-topic/0] - remote_partition.cc:1113 - remote partition make_reader invoked (waiting for units), config: {start_offset:{7350}, max_offset:{15198}, min_bytes:0, max_bytes:2048, type_filter:batch_type::raft_data, first_timestamp:{timestamp: 1705821612756}, bytes_consumed:0, over_budget:0, strict_max_bytes:0, skip_batch_cache:0, abortable:1, aborted:0}, num segments 38
TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber3 kafka/test-topic/0] - remote_partition.cc:1125 - remote partition make_reader invoked (units acquired), config: {start_offset:{7350}, max_offset:{15198}, min_bytes:0, max_bytes:2048, type_filter:batch_type::raft_data, first_timestamp:{timestamp: 1705821612756}, bytes_consumed:0, over_budget:0, strict_max_bytes:0, skip_batch_cache:0, abortable:1, aborted:0}, num segments 38
TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:226 - Constructing reader {kafka/test-topic/0}

// partition_record_batch_reader_impl::start will eventually *fail* at init_cursor. query will be done by model::timestamp
	DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:231 - abort_source is set
	TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:250 - partition_record_batch_reader_impl:: start: creating cursor: {start_offset:{7350}, max_offset:{15198}, min_bytes:0, max_bytes:2048, type_filter:batch_type::raft_data, first_timestamp:{timestamp: 1705821612756}, bytes_consumed:0, over_budget:0, strict_max_bytes:0, skip_batch_cache:0, abortable:1, aborted:0}
	
	// curson is begin initialized in the range [_stm_manifest.get_archive_clean_offset(), last offset], stm range is [_stm_manifest.get_start_offset(), last offset]
		DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:602 - creating_cursor: begin: 7543, end: 15594, stm_range[{14142}/15594]

		// seek(query)
		DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:178 - Manifest is not initialized

		// _view.get_materialized_manifest(q) (it will return a value)
			// query has to be performed in the spillovers (this returns false -  in_stm: _stm_manifest.get_spillover_map().last_segment()->max_timestamp < ts;)
			DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:873 - Checking timestamp {timestamp: 1705821612756} using timequery
			DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:873 - Checking timestamp {timestamp: 1705821612756} using timequery
			// search_spillover_manifests(q)
				DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:1472 - search_spillover_manifest query: {{timestamp: 1705821612756}}, num manifests: 6, first: {timestamp: 1705821614475}, last: {timestamp: 1705821644471}
				// search starts from the first manifest in list, and returns a manifest in the range [6231, 7542]. that is below the cursor range, and it is in fact below the _stm_manifest.archive_clean_offset_range()
				DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:1311 - Found spillover manifest meta: {{is_compacted: false, size_bytes: 42133123, base_offset: 6231, committed_offset: 7542, base_timestamp: {timestamp: 1705821614475}, max_timestamp: {timestamp: 1705821619470}, delta_offset: 161, ntp_revision: 28, archiver_term: 1, segment_term: 1, delta_offset_end: 193, sname_format: {v3}, metadata_size_hint: 3128}}
				DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - materialized_manifest_cache.cc:220 - Cache GET will return [{kafka/test-topic/0}:6231]
			// returned manifest is not in range: it spans a range that comes before the cursor 
			DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:268 - STM manifest range: [6231/7542], cursor range: [7543/15594]
		// reject the result because it is not in the range of the cursor
		DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:226 - Manifest is not in the specified range, range: [7543/15594]

		DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber2 kafka/test-topic/0] - async_manifest_view.cc:634 - failed to seek to {{timestamp: 1705821612756}}, offset out of valid range
	ERROR 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:551 - Failed to query spillover manifests: cloud_storage::error_outcome:10, query: {{timestamp: 1705821612756}}
	TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:255 - partition_record_batch_reader_impl:: start: created cursor: {start_offset:{7350}, max_offset:{15198}, min_bytes:0, max_bytes:2048, type_filter:batch_type::raft_data, first_timestamp:{timestamp: 1705821612756}, bytes_consumed:0, over_budget:0, strict_max_bytes:0, skip_batch_cache:0, abortable:1, aborted:0}

TRACE 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber663 kafka/test-topic/0] - remote_partition.cc:261 - Destructing reader {kafka/test-topic/0}
DEBUG 2024-01-21 07:20:50,131 [shard 1:main] cloud_storage - [fiber3 kafka/test-topic/0] - remote_partition.cc:1167 - timequery: 0 batches

Fixes #15489
Fixes #16026

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

prev if block is a bouncer condition in the form if(is_stm(q)) so it's
assured (!in_stm(q))

(cherry picked from commit ecc24d2)
@andijcr andijcr changed the title Issue/15489/timequery racing with retention demote ERROR message to DEBUG for timequerys at the edge of spillover retention Jan 26, 2024
@andijcr andijcr marked this pull request as ready for review January 26, 2024 13:54
@andijcr andijcr requested a review from Lazin January 26, 2024 13:55
log a warning if when trying to get a async_manifest_view_cursor for a
timequery the result is out_of_range.
such error can happen if retention is kicking in and deleting some or
all spillover manifests: in this case there is a window where the
manifest are reclaimable but still kept in memory, and the timequery
hits one of the manifest in the reclaimable range.

handling this as a warning and no result is acceptable because the kafka
client issuing the request can handle this failure, and otherwise
handling this edge cases would increase the complexity of the callstack
@andijcr andijcr force-pushed the issue/15489/timequery_racing_with_retention branch from 9c723c0 to 4ca7281 Compare January 26, 2024 14:01
log_start_offset.value()());
co_return;
}
&& ss::visit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ss::visit is better than hods_alternative/get?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I see that it's not the full change

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/44350#018d4656-2621-46f4-8e9b-d464c2fe8463:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.ABS"

@andijcr
Copy link
Contributor Author

andijcr commented Jan 26, 2024

failure is likely unrelated "RuntimeError: Internal object storage scrub detected fatal anomalies: [{'ns': 'kafka', 'topic': '__consumer_offsets', 'partition': 15, 'revision_id': 32, 'missing_segments': ['f9fd300e/kafka/__consumer_offsets/15_32/44-219-21700-4-v1.log.5'], 'last_complete_scrub_at': 1706284164049}]"
investigating...

@piyushredpanda piyushredpanda merged commit a96ca93 into redpanda-data:dev Jan 26, 2024
14 of 17 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Oops! Something went wrong.

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Oops! Something went wrong.

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants