docs: design doc for auto-removal of unused launcher image hashes#3488
Conversation
Proposes a usage-based TTL mechanism (refresh-on-attestation, read-time filtering, lazy sweep in verify_tee) to evict launcher hashes no participant has used for 14 days, replacing the unanimous-vote-only removal path for routine cleanup. Part of #3381
There was a problem hiding this comment.
Pull request overview
Adds a design document proposing usage-based expiry for launcher image hashes so the contract can automatically evict stale allowed_launcher_image_hashes entries over time, reducing long-term accumulation without requiring unanimous removal votes.
Changes:
- Introduces a draft design for adding
added/last_attestedtimestamps to allowed launcher entries and expiring them aftermax(added, last_attested) + TTL. - Proposes “refresh-on-use” via
submit_participant_infoand read-time filtering to reject expired hashes immediately. - Describes an inline, lazy sweep during
verify_teeas housekeeping to delete expired entries from storage.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - A hash backing a **valid attestation is never expired**: TTL (14d) > | ||
| attestation validity (7d), so any valid attestation refreshed its entry | ||
| within 7 days. Enforced as a config validation (`TTL > 7d`). |
| ## Open questions | ||
|
|
||
| 1. **TTL = 14 days** implies *vote a launcher in at most 14 days before | ||
| migrating to it*. Acceptable, or prefer 30 days? | ||
| 2. **Inline sweep vs. detached promise** — the issue's acceptance criteria | ||
| suggest running cleanup in a detached promise, so that a mass-expiry can | ||
| never consume enough gas to fail the transaction it piggybacks on. This doc | ||
| instead sweeps inline in `verify_tee`, betting that | ||
| `allowed_launcher_images` stays tiny forever (entries only enter via rare | ||
| operator votes; sweeping ~4 entries is negligible gas). Does anyone see a | ||
| future where this list grows to many entries? If not, inline stands. |
| 3. **Lazy sweep** — `verify_tee` deletes expired entries from storage | ||
| (keeping at least the newest). Housekeeping only; expiry is already | ||
| enforced by (2). |
There was a problem hiding this comment.
I’m afraid we could potentially run out of gas if we inline it. Instead, we could introduce a public contract method, similar to this one, that cleans up the expired entries from storage:
mpc/crates/contract/src/tee/proposal.rs
Lines 195 to 200 in dd1695a
Alternatively, we could spawn it as a separate self-call via Promise::new(...).function_call(CLEAN_EXPIRED_LAUNCHER_HASHES, ...).detach(), similar to what we do here:
mpc/crates/contract/src/lib.rs
Lines 1088 to 1136 in cf72a5a
There was a problem hiding this comment.
I prefer not to add a new public API. We already have a lot.
I don't think that running out of gas is a real concern, since the launcher whitelist is very small (and we control the size via voting). but I'm ok with moving this to a private method and using a Promise if you think it is better.
Let get another reviewer on this.
There was a problem hiding this comment.
A private method with a promise seems like a manageable situation. If this starts accumulating and running out of gas it doesn't bring down any other functionality. I'm happy with that solution.
There was a problem hiding this comment.
Updated the doc to go with the detached Promise + #[private] method (your option b): verify_tee spawns a detached self-call to a new #[private] clean_expired_launcher_hashes, gas reserved via a clean_expired_launcher_hashes_tera_gas config knob like the existing post-resharing cleanups. Skipped the public-method option to avoid growing the public API. Since read-time filtering already enforces expiry, a failed/out-of-gas sweep is harmless and just retries next verify_tee.
netrome
left a comment
There was a problem hiding this comment.
One thought so far, haven't read the full proposal. Will continue later in the afternoon.
| The insight: **not using a launcher is itself a vote.** Every node already | ||
| resubmits an attestation hourly (`submit_participant_info`), proving which | ||
| launcher it runs. The contract can observe disuse and evict stale hashes | ||
| without any vote. |
There was a problem hiding this comment.
Good point. I wonder if we should do the same for node image hashes. I don't think those should be auto-removed since we may have long periods during upgrades where we want to allow two concurrent node versions.
There was a problem hiding this comment.
If we do want add such a thing for nodes, then we will need to add voting mechanism to remove the nodes (same as we have for the launcher). To prevent an outdated node from staying forever.
There was a problem hiding this comment.
Yeah exactly, which does make sense to me. In practice I'd expect all operators to update in due time and auto-eviction working well.
There was a problem hiding this comment.
We also need to think if this is in addition to or instead of the auto 7 day removall we have.
There was a problem hiding this comment.
I wouldn't change this for now, but we can think of this for the future.
netrome
left a comment
There was a problem hiding this comment.
Looks good to me, as long as we're aligned the implementation of the lazy sweep should be through a promise. Could be worth highlighting in this doc - but we could also view it as an implementation detail. I'm fine with both.
Replaces the inline verify_tee sweep with a detached self-call to a new #[private] clean_expired_launcher_hashes method, so cleanup can never fail the host transaction and adds no public API. Resolves the open question on inline vs. promise cleanup.
Pull request overviewDesign-only PR introducing a draft proposal for usage-based expiry of Changes:
Reviewed changesPer-file summary
FindingsI verified the doc's load-bearing claims against the current code:
Non-blocking (design clarifications worth folding in before implementation):
✅ Approved — design is internally consistent, safety invariants check out against the actual constants, and the detached-promise sweep matches the established |
Part of #3381
Design-only PR — proposes usage-based expiry for
allowed_launcher_image_hashes:submit_participant_info) verifies against it — the contract refreshes that entry'slast_attestedtimestamp there. Nodes already send these, so no node-side changesmax(added, last_attested) + TTL < now, TTL default 14 days (config field, validated>= DEFAULT_EXPIRATION_DURATION_SECONDS)verify_teeto a#[private]cleanup method (no new public API, cannot fail the host tx)vote_remove_launcher_hashstays as manual early-removal override; measurements and MPC docker-image expiry out of scopeRemaining open question for reviewers: TTL length (14 days vs. 30 — see the Open questions section). Implementation follows in a separate PR.