[epic] dry-run horizon database truncation #5236

mollykarcher · 2024-03-07T14:56:30Z

We'll be truncating SDF Horizon's history retention to 1 year later this year. To our knowledge, most partners that enable history retention use 1-3 months of history, so it's possible that there could be issues that only present with this data profile (not retaining full history, but large retention window) that we simply haven't seen or heard of yet.

We should dry-run the truncation and mirror traffic to it for some amount of time, observing the performance impact and resolving any issues that arise from this process. Given the timing and the need to continue using staging to test/issue releases of Horizon prior to the truncation, we should not be doing this on the staging cluster and will need to spin up a new/independent one.

At a minimum:

Spin up another staging-like cluster of Horizon (https://github.com/stellar/ops/issues/2900)
Upgrade PostgreSQL 12 ➡️ 16 (services/horizon: upgrade psql support to most recent versions #4831)
Enabling reaping on that instance and set the retention to 1 year
- There are different ways this can be accomplished and we need time to evaluate that. For example, we could turn on reaping on the whole DB and see what happens (which may result in a lockup due to the massive amount of data that needs to be reaped, plus a possible full vacuum) or we could start from scratch and ingest a year+ of data and then enable reaping, or there may be other options.
- After discussion, it appears we must approach this by reaping the whole DB, because reingestion may take on the order of months. The hope 🤞 is that because the database will be much smaller, the full vacuum will be feasible without any extra operational concerns.
- The periodic reaping frequency should be configurable services/horizon: Reap in batches of 100k ledgers per second, to play nicely with others #3823
Document any operational things performed throughout this process, as we’ll repeat for blue/green prod clusters eventually
Mirror traffic from production to this cluster
Brainstorm how we could (or if we need to) simulate load from transaction submission
Observe, identify, and resolve (or defer/prioritize) any performance degradation

mollykarcher added the objective-7 label Mar 7, 2024

mollykarcher added this to the Sprint 44 milestone Mar 7, 2024

mollykarcher changed the title ~~dry-run horizon database truncation~~ [epic] dry-run horizon database truncation Mar 19, 2024

mollykarcher modified the milestones: Sprint 44, platform sprint 45 Mar 27, 2024

mollykarcher assigned Shaptic and tamirms Mar 27, 2024

Shaptic mentioned this issue Apr 5, 2024

services/horizon: Make reaping batch sizes configurable via --history-retention-reap-count. #5272

Merged

mollykarcher modified the milestones: platform sprint 45, platform sprint 46 Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[epic] dry-run horizon database truncation #5236

[epic] dry-run horizon database truncation #5236

mollykarcher commented Mar 7, 2024 •

edited by Shaptic

[epic] dry-run horizon database truncation #5236

[epic] dry-run horizon database truncation #5236

Comments

mollykarcher commented Mar 7, 2024 • edited by Shaptic

mollykarcher commented Mar 7, 2024 •

edited by Shaptic