Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: fix cont lsn jump on vectored read path #7412

Merged
merged 2 commits into from Apr 18, 2024

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Apr 17, 2024

Problem

Vectored read path may return an image that's newer than the request lsn under certain circumstances.

  LSN
    ^
    |
    |
500 | ------------------------- -> branch point
400 |        X
300 |        X
200 | ------------------------------------> requested lsn
100 |        X
    |---------------------------------> Key

Legend:
* X - page images

The vectored read path inspects each ancestor timeline one by one starting from the current one.
When moving into the ancestor timeline, the current code resets the current search lsn (called cont_lsn in code)
to the lsn of the ancestor timeline (here).

For instance, if the request lsn was 200, we would:

  1. Look into the current timeline and find nothing for the key
  2. Descend into the ancestor timeline and set cont_lsn=500
  3. Return the page image at LSN 400

Myself and Christian find it very unlikely for this to have happened in prod since the vectored read path
is always used at the last record lsn.

This issue was found by a regress test during the work to migrate get page handling to use the vectored
implementation. I've applied my fix to that wip branch and it fixed the issue.

Summary of changes

The fix is to set the current search lsn to the min between the requested LSN and the ancestor lsn.
Hence, at step 2 above we would set the current search lsn to 200 and ignore the images above that.

A test illustrating the bug is also included. Fails without the patch and passes with it.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented Apr 17, 2024

2766 tests run: 2648 passed, 0 failed, 118 skipped (full report)


Flaky tests (1)

Postgres 16

  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 28.0% (6443 of 22999 functions)
  • lines: 46.8% (45431 of 97156 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
11a1d23 at 2024-04-18T15:24:36.919Z :recycle:

@VladLazar VladLazar force-pushed the vlad/fix-cont-lsn-jump branch 3 times, most recently from 383699f to 5cd17cf Compare April 17, 2024 18:15
@VladLazar VladLazar requested a review from problame April 17, 2024 18:30
@VladLazar VladLazar marked this pull request as ready for review April 17, 2024 18:30
@VladLazar VladLazar requested a review from a team as a code owner April 17, 2024 18:30
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks convincing enough, but I cannot make the connection to the actual vectored read path change (min).

Does the old Timeline::get_value_reconstruct_data have this problem? There, we move to ancestor if cont_lsn is less than ancestor_lsn, and cont_lsn is originally request_lsn + 1. Okay, I think these are now aligned.

@VladLazar
Copy link
Contributor Author

The test looks convincing enough, but I cannot make the connection to the actual vectored read path change (min).

Does the old Timeline::get_value_reconstruct_data have this problem? There, we move to ancestor if cont_lsn is less than ancestor_lsn, and cont_lsn is originally request_lsn + 1. Okay, I think these are now aligned.

It's quite tricky to see the alignment between the read paths since they're structured differently. I think going through the example in the cover letter is the best way of understanding the issue. The test produces a very similar setup, albeit with a lot of boilerplate. Let me know if you wish to chat about it - the bug is quite obvious once you know about it :)

@VladLazar
Copy link
Contributor Author

I'll merge now to get some validation for the change before next weeks release.

@VladLazar VladLazar merged commit 6eb946e into main Apr 18, 2024
53 checks passed
@VladLazar VladLazar deleted the vlad/fix-cont-lsn-jump branch April 18, 2024 17:40
@hlinnaka
Copy link
Contributor

Great explanation and test case!

It took me a while to understand why the non-vectored path is not suffering from the same bug. I believe it's because `get_reconstruct_data' has this in the loop:


            // Recurse into ancestor if needed
            if is_inherited_key(key) && Lsn(cont_lsn.0 - 1) <= timeline.ancestor_lsn {
                trace!(
                    "going into ancestor {}, cont_lsn is {}",
                    timeline.ancestor_lsn,
                    cont_lsn
                );

                timeline_owned = timeline.get_ready_ancestor_timeline(ctx).await?;
                timeline = &*timeline_owned;
                prev_lsn = None;
                continue 'outer;
            }

Wouldn't hurt to explicitly test for the same scenario in the non-vectored case too. Or maybe we already have a test for that?

It would be useful to cross-check that get_reconstruct_data produces the same result as get_vectored. For example, at the end of get_reconstruct_data, call get_vectored with a corresponding single-item KeySpace, and assert that the result is the same.

@VladLazar
Copy link
Contributor Author

It took me a while to understand why the non-vectored path is not suffering from the same bug. I believe it's because `get_reconstruct_data' has this in the loop:


            // Recurse into ancestor if needed
            if is_inherited_key(key) && Lsn(cont_lsn.0 - 1) <= timeline.ancestor_lsn {
                trace!(
                    "going into ancestor {}, cont_lsn is {}",
                    timeline.ancestor_lsn,
                    cont_lsn
                );

                timeline_owned = timeline.get_ready_ancestor_timeline(ctx).await?;
                timeline = &*timeline_owned;
                prev_lsn = None;
                continue 'outer;
            }

Right. The non-vectored read path does not touch the cont_lsn when descending into the ancestor timeline. cont_lsn should never increase for it.

Wouldn't hurt to explicitly test for the same scenario in the non-vectored case too. Or maybe we already have a test for that?

It would be useful to cross-check that get_reconstruct_data produces the same result as get_vectored. For example, at the end of get_reconstruct_data, call get_vectored with a corresponding single-item KeySpace, and assert that the result is the same.

We already have this cross checking mechanism in place (see validate_get_vectored_impl). It is currently enabled for regression tests and and in staging, but disabled in prod.

For context, I caught this bug with the cross validation mechanism on my wip branch to migrate the get page requests to use the vectored path. The non-vectored path was returning the correct result, but it's a fair point that the test could be updated to exercise the non-vectored read path too.

@VladLazar VladLazar mentioned this pull request Apr 23, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants