Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: fix unlogged relations with sharding #7454

Merged
merged 4 commits into from
Apr 22, 2024

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Apr 22, 2024

Problem

INIT_FORKNUM blocks must be stored on shard 0 to enable including them in basebackup.

This issue can be missed in simple tests because creating an unlogged table isn't sufficient -- to repro I had to create an index on an unlogged table (then restart the endpoint).

Closes: #7451

Summary of changes

  • Add a reproducer for the issue.
  • Tweak the condition for key_is_shard0 to include anything that isn't a normal relation block and any normal relation block whose forknum is INIT_FORKNUM.
  • To enable existing databases to recover from the issue, add a special case that omits relations if they were stored on the wrong INITFORK. This enables postgres to start and the user to drop the table and recreate it.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver labels Apr 22, 2024
@jcsp jcsp requested a review from a team as a code owner April 22, 2024 09:38
@jcsp jcsp requested a review from koivunej April 22, 2024 09:38
@koivunej koivunej mentioned this pull request Apr 22, 2024
Copy link

github-actions bot commented Apr 22, 2024

2772 tests run: 2654 passed, 0 failed, 118 skipped (full report)


Flaky tests (1)

Postgres 15

  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 28.1% (6460 of 23024 functions)
  • lines: 46.8% (45586 of 97442 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
e55cdbb at 2024-04-22T11:54:47.808Z :recycle:

@jcsp
Copy link
Contributor Author

jcsp commented Apr 22, 2024

Pushed a change to how the workaround for existing broken tenants works: we can't directly know whether a given relation is problematic, we have to try reading it and ignore the error if it's a key that could have this issue.

This works with the real life example data we have.

@jcsp jcsp enabled auto-merge (squash) April 22, 2024 11:17
@jcsp jcsp merged commit 0bd1618 into main Apr 22, 2024
48 of 49 checks passed
@jcsp jcsp deleted the jcsp/issue-7451-unlogged-relations branch April 22, 2024 11:47
koivunej pushed a commit that referenced this pull request Apr 22, 2024
## Problem

- #7451 

INIT_FORKNUM blocks must be stored on shard 0 to enable including them
in basebackup.

This issue can be missed in simple tests because creating an unlogged
table isn't sufficient -- to repro I had to create an _index_ on an
unlogged table (then restart the endpoint).

Closes: #7451 

## Summary of changes

- Add a reproducer for the issue.
- Tweak the condition for `key_is_shard0` to include anything that isn't
a normal relation block _and_ any normal relation block whose forknum is
INIT_FORKNUM.
- To enable existing databases to recover from the issue, add a special
case that omits relations if they were stored on the wrong INITFORK.
This enables postgres to start and the user to drop the table and
recreate it.
jcsp added a commit that referenced this pull request Apr 29, 2024
jcsp added a commit that referenced this pull request Apr 30, 2024
PR #7454 included a workaround that let any existing bugged databases
start up. Having used that already, we may now

Closes: #7480
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: unlogged relations (INIT_FORKNUM) not handled properly on sharded tenants
2 participants