cloud_storage: Timequery starts from the beginning of the read-replica #16479
Labels
area/cloud-storage
Shadow indexing subsystem
kind/bug
Something isn't working
sev/high
loss of availability, pathological performance degradation, recoverable corruption
Version & Environment
Redpanda version: v23.2.8
What went wrong?
From the client perspective the
rpk consume
command hangs and then fails with timeout error. The command is executed against the read replica. The request uses timestamp as query parameter. The timestamp is larger than timestamp of any segment in the manifest. Redpanda starts scanning the partition from the first segment available in the manifest. It correctly ignores all segments in all spillover manifests because their timestamps are smaller but it performs full scan on the partition.What should have happened instead?
The command should return data batches.
How to reproduce the issue?
Issue a timequery with timestamp larger than the timestamp of the last segment and check the logs?
Additional information
The partition manifest has the following characteristics:
12446351348
archive_start_offset
is 0, nine spillover manifests are presentbase_timestamp
andmax_timestamp
columns are monotonic.max_timestamp
is greater thanbase_timestamp
for every segment in the manifest.The timestamp in the query is
Jan 29 2024 02:00:00
which is1706511600000
. This value is larger than themax_timestamp
of the last segment which is1706302795000
. When the query is executed we see the following lines in the log:The client is scanning segments and discards data batches because their timestamps are too small:
There are a lot of messages like this in the log:
The client doesn't see any data batches. Eventually, it's giving up waiting and closes the connection. On the tiered-storage side we can see this in the log:
The text was updated successfully, but these errors were encountered: