Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[epic] dry-run horizon database truncation #5236

Open
8 tasks
mollykarcher opened this issue Mar 7, 2024 · 0 comments
Open
8 tasks

[epic] dry-run horizon database truncation #5236

mollykarcher opened this issue Mar 7, 2024 · 0 comments
Assignees

Comments

@mollykarcher
Copy link
Contributor

mollykarcher commented Mar 7, 2024

We'll be truncating SDF Horizon's history retention to 1 year later this year. To our knowledge, most partners that enable history retention use 1-3 months of history, so it's possible that there could be issues that only present with this data profile (not retaining full history, but large retention window) that we simply haven't seen or heard of yet.

We should dry-run the truncation and mirror traffic to it for some amount of time, observing the performance impact and resolving any issues that arise from this process. Given the timing and the need to continue using staging to test/issue releases of Horizon prior to the truncation, we should not be doing this on the staging cluster and will need to spin up a new/independent one.

At a minimum:

  • Spin up another staging-like cluster of Horizon (https://github.com/stellar/ops/issues/2900)
  • Upgrade PostgreSQL 12 ➡️ 16 (services/horizon: upgrade psql support to most recent versions #4831)
  • Enabling reaping on that instance and set the retention to 1 year
    • There are different ways this can be accomplished and we need time to evaluate that. For example, we could turn on reaping on the whole DB and see what happens (which may result in a lockup due to the massive amount of data that needs to be reaped, plus a possible full vacuum) or we could start from scratch and ingest a year+ of data and then enable reaping, or there may be other options.
    • After discussion, it appears we must approach this by reaping the whole DB, because reingestion may take on the order of months. The hope 🤞 is that because the database will be much smaller, the full vacuum will be feasible without any extra operational concerns.
    • The periodic reaping frequency should be configurable services/horizon: Reap in batches of 100k ledgers per second, to play nicely with others #3823
  • Document any operational things performed throughout this process, as we’ll repeat for blue/green prod clusters eventually
  • Mirror traffic from production to this cluster
  • Brainstorm how we could (or if we need to) simulate load from transaction submission
  • Observe, identify, and resolve (or defer/prioritize) any performance degradation
@mollykarcher mollykarcher added this to the Sprint 44 milestone Mar 7, 2024
@mollykarcher mollykarcher changed the title dry-run horizon database truncation [epic] dry-run horizon database truncation Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

3 participants