Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pageserver): generate basebackup from aux file v2 storage #7517

Merged
merged 15 commits into from
May 7, 2024

Conversation

skyzh
Copy link
Member

@skyzh skyzh commented Apr 25, 2024

Problem

This pull request adds the new basebackup read path + aux file write path. In the regression test, all logical replication tests are run with matrix aux_file_v2=false/true.

Summary of changes

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented Apr 25, 2024

3042 tests run: 2915 passed, 0 failed, 127 skipped (full report)


Flaky tests (4)

Postgres 16

  • test_vm_bit_clear_on_heap_lock: debug

Postgres 15

  • test_partial_evict_tenant[relative_spare]: release

Postgres 14

  • test_partial_evict_tenant[relative_equal]: release
  • test_partial_evict_tenant[relative_spare]: release

Code coverage* (full report)

  • functions: 31.2% (6252 of 20016 functions)
  • lines: 46.7% (46832 of 100259 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
801550e at 2024-05-07T16:38:45.689Z :recycle:

@skyzh skyzh mentioned this pull request Apr 25, 2024
24 tasks
@skyzh skyzh force-pushed the skyzh/scan-interface branch 2 times, most recently from dbe3d95 to 06c3377 Compare April 30, 2024 14:08
@skyzh skyzh marked this pull request as ready for review April 30, 2024 15:41
@skyzh skyzh requested a review from a team as a code owner April 30, 2024 15:41
@skyzh skyzh requested review from VladLazar and removed request for a team April 30, 2024 15:41
@skyzh
Copy link
Member Author

skyzh commented Apr 30, 2024

All failed tests pass locally so I assume it's something wrong with vectored get env variable in CI. Fix in #7569.

@skyzh skyzh mentioned this pull request May 2, 2024
5 tasks
@skyzh skyzh force-pushed the skyzh/aux-file-v2 branch 2 times, most recently from dd0a2fa to 48907cb Compare May 2, 2024 19:26
@skyzh
Copy link
Member Author

skyzh commented May 2, 2024

rebased to the new scan implementation pull request and added aux_file_v2 test matrix for regress tests.

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The looks good to me, but I have questions about the testing.
Assuming a bug with aux files v2, I don't think there is a guarantee that postgress will generate an error. Have you considered validating the two aux files implementations against each other (at least in the tests)?

pageserver/src/pgdatadir_mapping.rs Outdated Show resolved Hide resolved
Base automatically changed from skyzh/scan-interface to main May 3, 2024 14:43
skyzh added 9 commits May 3, 2024 10:43
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh
Copy link
Member Author

skyzh commented May 3, 2024

Added a new cross-validation mode to aux file policy flag, waiting for CI runs.

Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh added 3 commits May 3, 2024 11:26
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh
Copy link
Member Author

skyzh commented May 3, 2024

added 26efd47 to fix a bug on the code path of using vectored get for sequential get. It should throw a missing key error, but it doesn't.

Think about the case if a key does not exist. In the code path, we will create a ValueReconstructState of empty image and empty WAL, which then gets passed to reconstruct_value, and throwing "base image for {key} at {request_lsn} not found" PageReconstructError::Other error. This case is actually a missing key error.

Therefore, if the cached image is not found, we do not add it to the value reconstruct state to avoid calling into reconstruct_value.

It's unknown why this case only happens when v1+v2 are both enabled. Probably due to the difference of layer map in two cases, where in one case, the vectored get code path could catch the missing key error earlier than the point that reconstruct_value gets called. We only run cross validation instead of only running v2.

@skyzh skyzh requested a review from VladLazar May 3, 2024 17:17
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just a few nits.

pageserver/src/tenant/timeline.rs Show resolved Hide resolved
test_runner/fixtures/neon_fixtures.py Outdated Show resolved Hide resolved
pageserver/src/pgdatadir_mapping.rs Outdated Show resolved Hide resolved
pageserver/src/pgdatadir_mapping.rs Outdated Show resolved Hide resolved
pageserver/src/pgdatadir_mapping.rs Show resolved Hide resolved
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh skyzh enabled auto-merge (squash) May 7, 2024 16:10
@skyzh skyzh merged commit 017c34b into main May 7, 2024
53 checks passed
@skyzh skyzh deleted the skyzh/aux-file-v2 branch May 7, 2024 16:30
conradludgate pushed a commit that referenced this pull request May 8, 2024
This pull request adds the new basebackup read path + aux file write
path. In the regression test, all logical replication tests are run with
matrix aux_file_v2=false/true.

Also fixed the vectored get code path to correctly return missing key
error when being called from the unified sequential get code path.
---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants