Skip to content

Pausing and unpausing

Connor Mendenhall edited this page Dec 20, 2021 · 2 revisions

The key principle to keep in mind in order to pause and unpause the running system is that reading storage diffs from the head of a syncing chain is faster than backfilling them. The goal and sequencing in pausing and unpausing is to avoid disrupting syncing storage diffs.

Pausing the system

  1. Shut down Postgraphile and the public API.
  2. Stop any vdb-execute containers (Oasis and MCD). This will pause the data transformation layer.
  3. Stop the vdb-header-sync container. This will stop syncing headers to the synchronization layer.
    1. If the node this container is using is not shared infrastructure, it's OK to shut it down as well. (However, this is not currently the case in production—the header sync service uses a shared node).
  4. Pause the statediffing geth nodes.
    1. I suspect this is already automated through EBS snapshots or similar, but it's wise to ensure there's a recent storage volume backup for these nodes before pausing them. In a failure scenario, it would be possible to restore geth chaindata from an EBS volume snapshot and sync back up to the head of the chain.
  5. Stop the vdb-extract-diffs containers.
  6. Stop Postgres and save an RDS snapshot.

Unpausing the system

This procedure is not quite the opposite of pausing, because transforming storage diffs relies in some cases on event data. To restart the system, we'll want to catch up to events and headers first, then re-enable diff processing.

  1. Restore Postgres from the saved RDS snapshot.
  2. Start any vdb-execute containers (Oasis and MCD). These will start listening for events and diffs to transform.
  3. Start the vdb-header-sync container (and the Ethereum node it's reading from, if it's been shut down). This will begin syncing headers from the last synced block up to the head of the chain. The execute processes will transform missing events. Transforming storage diffs relies on information from events to generate storage keys, so it's important to do this step before restarting diff processing, and to allow it to catch up fully to the head of the chain.
    1. Note: if new contracts have been added since the system was paused, there are two options. Add the contracts and transformers to the vdb-execute configuration before starting the container (just like onboarding new collateral types), or let events sync, add the new contracts/transformers, and run a subsequent event backfill before proceeding.
  4. For each region with a statediffing geth + extract diffs container:
    1. Start the vdb-extract-diffs container for this region.
    2. Unpause the statediffing geth node for this region.
    3. Storage diffs should be stored and transformed as the node syncs to the head of the chain.
  5. Reenable Postgraphile and the API.
Clone this wiki locally