Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v22.3.x] cloud_storage: fix timequery edge cases #9876

Merged

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #9815
Fixes #9875,

jcsp added 5 commits April 6, 2023 12:13
The local GC test issues many requests in parallel,
which can make it awkward to read logs.  On failure,
issue some queries sequentially to the offsets that failed,
which also gives us a measurement of whether the failure
was transient or permanent.

Related: redpanda-data#9669
(cherry picked from commit 8ddab78)
_start_offset is only set once some background housekeeping
happens: on a freshly created partition with a few records
produced into it, it can be -1.  In this case when we read
the start timestamp, we should not try to find a segment
containing this negative start offset.

We may later refactor to avoid having non-empty logs carry
a confusingly -1 start_offset.

(cherry picked from commit f440c89)
... so that it may be executed against Apache Kafka
to check the compatibility of the behavior.

(cherry picked from commit 8693ad2)
This was incorrectly expecting a -1 when querying below
the start of the log: the Kafka-compatible behavior is
to return the first offset in the log with a timestamp
greater than the query (i.e. the lwm of the log)

(cherry picked from commit e2e4186)
...where the queried timestamp falls equal to
the local log start_offset, or between the local log's
start_timestamp() and start_offset.

The existing code was incorrectly assuming that start_timestamp
pointed to the same record as start_offset, whereas actually
start_timestamp gives the timestamp at the start of the first
segment, and start_offset may point to some location within
a segment.

This would result in timequery returning an offset too high
if there was data in tiered storage that could have found the
timestamp at a lower offset, or returning an offset when it should
have returned null, if there was not tiered storage data available
before start_offset.

Fixes redpanda-data#9669

(cherry picked from commit f3f77f0)
@vbotbuildovich vbotbuildovich added this to the v22.3.x-next milestone Apr 6, 2023
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Apr 6, 2023
@piyushredpanda piyushredpanda modified the milestones: v22.3.x-next, v22.3.16 Apr 10, 2023
@BenPope
Copy link
Member

BenPope commented Apr 10, 2023

Ping

@BenPope BenPope requested review from jcsp and VladLazar April 10, 2023 20:35
Methods 'rbegin()' and 'rend()' of the partition_manifest are only used
to get last value in the manifest. The methods are replaced with
'last_segment' method that returns optional value which may contain a
segment_meta value of the last uploaded segment in the partition.

(cherry picked from commit 78a70b3)
@jcsp jcsp marked this pull request as ready for review April 12, 2023 19:06
@piyushredpanda
Copy link
Contributor

Failure appears to be #10024

Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the only failure is a scaling_up_test which isn't related to cloud storage bits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants