Skip to content

Conversation

@ephemeralsad
Copy link
Contributor

@ephemeralsad ephemeralsad commented Jun 2, 2025

Problem

Removed nodes can re-add themselves on restart if not properly tombstoned. We need a mechanism (e.g. soft-delete flag) to prevent this, especially in cases where the node is unreachable.

More details there: #12036

Summary of changes

  • Introduced NodeLifecycle enum to represent node lifecycle states.
  • Added a string representation of NodeLifecycle to the nodes table.
  • Implemented node removal using a tombstone mechanism.
  • Introduced /debug/v1/tombstone* handlers to manage the tombstone state.

@ephemeralsad ephemeralsad force-pushed the ephemeralsad/deletion-tumbstones branch 3 times, most recently from dca4d59 to b87860e Compare June 2, 2025 06:45
@github-actions
Copy link

github-actions bot commented Jun 2, 2025

8459 tests run: 7877 passed, 0 failed, 582 skipped (full report)


Flaky tests (1)

Postgres 14

Code coverage* (full report)

  • functions: 32.2% (9018 of 27988 functions)
  • lines: 48.5% (80021 of 165092 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
cee59e7 at 2025-06-06T06:00:23.378Z :recycle:

@ephemeralsad ephemeralsad changed the title Deletion tumbstones Deletion tombstones Jun 2, 2025
@ephemeralsad ephemeralsad changed the title Deletion tombstones storcon: Deletion tombstones Jun 2, 2025
@ephemeralsad ephemeralsad force-pushed the ephemeralsad/deletion-tumbstones branch from b87860e to 348dc4c Compare June 2, 2025 08:51
@ephemeralsad ephemeralsad marked this pull request as ready for review June 2, 2025 09:11
@ephemeralsad ephemeralsad requested a review from a team as a code owner June 2, 2025 09:11
@ephemeralsad ephemeralsad requested review from erikgrinaker and removed request for erikgrinaker June 2, 2025 09:11
@ephemeralsad ephemeralsad linked an issue Jun 2, 2025 that may be closed by this pull request
@jcsp jcsp requested a review from VladLazar June 2, 2025 13:46
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Got some points around robustness. Let me know if you wanna walk through anything on a call.


Let's add a test that covers our main scenario: preventing a flaky node from re-attaching by marking it soft deleted. Bonus points for covering concurrency between re-attach and node deletion. test_storage_controller.py is a good starting point.


Please update the cover letter with a description of the problem we are solving here.
It's better to have actual descriptions in the commit history than GH links.

@ephemeralsad ephemeralsad changed the title storcon: Deletion tombstones storcon: Introduce deletion tombstones to support flaky node scenario Jun 4, 2025
@ephemeralsad ephemeralsad force-pushed the ephemeralsad/deletion-tumbstones branch from 1ae2b74 to 17926bf Compare June 5, 2025 12:38
@ephemeralsad ephemeralsad requested a review from a team as a code owner June 5, 2025 12:38
@ephemeralsad ephemeralsad requested review from MMeent and knizhnik June 5, 2025 12:38
@github-actions
Copy link

github-actions bot commented Jun 5, 2025

If this PR added a GUC in the Postgres fork or neon extension,
please regenerate the Postgres settings in the cloud repo:

make NEON_WORKDIR=path/to/neon/checkout \
  -C goapp/internal/shareddomain/postgres generate

If you're an external contributor, a Neon employee will assist in
making sure this step is done.

@ephemeralsad ephemeralsad force-pushed the ephemeralsad/deletion-tumbstones branch from 17926bf to 7deb448 Compare June 5, 2025 13:02
@ephemeralsad ephemeralsad removed the request for review from a team June 5, 2025 13:03
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - give it a go in staging and update the docs for PS decom if required.


Small piece of feedback: we usually resolve review comments with a link to the commit that fixed it. This makes it easier for the reviewer to know if the change was applied and where.

@ephemeralsad ephemeralsad added this pull request to the merge queue Jun 6, 2025
Merged via the queue into main with commit 590301d Jun 6, 2025
102 checks passed
@ephemeralsad ephemeralsad deleted the ephemeralsad/deletion-tumbstones branch June 6, 2025 10:23
erikgrinaker pushed a commit that referenced this pull request Jun 6, 2025
…#12096)

## Problem

Removed nodes can re-add themselves on restart if not properly
tombstoned. We need a mechanism (e.g. soft-delete flag) to prevent this,
especially in cases where the node is unreachable.

More details there: #12036

## Summary of changes

- Introduced `NodeLifecycle` enum to represent node lifecycle states.
- Added a string representation of `NodeLifecycle` to the `nodes` table.
- Implemented node removal using a tombstone mechanism.
- Introduced `/debug/v1/tombstone*` handlers to manage the tombstone
state.
skyzh pushed a commit that referenced this pull request Jun 6, 2025
…#12096)

## Problem

Removed nodes can re-add themselves on restart if not properly
tombstoned. We need a mechanism (e.g. soft-delete flag) to prevent this,
especially in cases where the node is unreachable.

More details there: #12036

## Summary of changes

- Introduced `NodeLifecycle` enum to represent node lifecycle states.
- Added a string representation of `NodeLifecycle` to the `nodes` table.
- Implemented node removal using a tombstone mechanism.
- Introduced `/debug/v1/tombstone*` handlers to manage the tombstone
state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storcon: Pageserver removal tombstones

3 participants