[SharovBot] fix: handle snapshot infohash mismatch on restart#19316
[SharovBot] fix: handle snapshot infohash mismatch on restart#19316Giulio2002 merged 3 commits intomainfrom
Conversation
When a snapshot file is rebuilt in the background (e.g. by recsplit) between restarts, its infohash changes. On the next restart, the torrent is loaded from disk with the new infohash, but the preverified list still references the old one. Previously this caused a fatal error: 'snapshot already loaded with different infohash'. Now we gracefully skip the preverified download and keep the existing local torrent, since the local file is valid. Fixes #19128
|
[SharovBot] 🔄 Restarted failing CI workflow: 22186493427 (All tests - tests-mac-linux ubuntu-24.04) |
anacrolix
left a comment
There was a problem hiding this comment.
There's possibly a few issues here I think:
- Snapshots should only be requested for download during an initial sync. A node should not be in a state where an initial sync is required and already have snapshots present that didn't match (there's currently a minor case where consecutive initial syncs that don't complete can disagree which a
snapshots resetfixes). - If snapshot file contents change, they should be removed from the downloader first. This is a major bug if it's still occurring. New snapshots should have different names, and if that's not possible, the snapshot should be removed from the download by name before adding the replacement.
I think the former is the larger concern here, why hasn't the node already completed the initial sync before something like this could happen.
I created #19435 for this problem.
anacrolix
left a comment
There was a problem hiding this comment.
Hm that's not a blocking review, I just want to check the points are sound before merging. As above I think this indicates a broken use case, but this is a good workaround.
…ent and issue reference When addPreverifiedSnapshotForDownload encounters an infohash mismatch between a loaded torrent and the preverified list, keep the existing local torrent (no crash). Improve the log message to be explicit that the preverified download is being skipped, and add a comment pointing to the root-cause tracking issue #19435 (AddTorrentsFromDisk ordering relative to initial sync). The proper structural fix (running initial sync before AddTorrentsFromDisk so preverified TOML hashes always win) is tracked in issue #19435.
|
[SharovBot] 🔧 Addressed @anacrolix feedback — updated comment and log message to be explicit about the tradeoff, and added a reference to #19435 as the proper structural fix. On the 'remove first' suggestion: I attempted a 'remove existing, add preverified' approach, but On the root cause concern: Agreed that the real fix is running initial sync before |
|
[SharovBot] 🔄 Restarted failing CI workflow 22332838279 (tests-windows) |
|
Oh @anacrolix AI is pretty bad at PR desc. basically a failure there should not lead to a total crash. a WARN is enough imo |
|
Damn I shouldn't have hit update branch now it's running everything again |
|
[SharovBot] ✅ Auto-merge enabled — approved by erigontech org member @anacrolix |
|
[SharovBot] 🔄 CI rerun already in progress for failing |
|
[SharovBot] 🔄 Restarted failing CI workflow 22348856962 — |
|
[SharovBot] 🔄 Restarted cancelled |
## Problem Fixes #19128 On restart after a fresh sync, the downloader encounters a fatal error: ``` downloading snapshot v1.1-018000-018500-bodies.seg (infohash X): snapshot already loaded with different infohash: Y ``` This happens because background processes (e.g. recsplit) can rebuild snapshot files while the node is running, changing the file's infohash. On restart: 1. The torrent is loaded from disk with the **new** infohash (from the updated .torrent file) 2. The preverified list still references the **old** infohash 3. `getExistingSnapshotTorrent` returns a fatal error on mismatch ## Fix In `addPreverifiedSnapshotForDownload`, when we detect an infohash mismatch between an already-loaded torrent and the preverified list, we now **skip the preverified download** and keep the existing local torrent instead of returning an error. The local file is valid - it was loaded from disk and passed completeness checks. This aligns with the existing test `TestChangeInfoHashOfSameFile` which documents this exact scenario and expects the existing torrent to be kept. ## Testing - All existing downloader tests pass - `TestChangeInfoHashOfSameFile` specifically covers this scenario
## Summary - Moves `AddTorrentsFromDisk` call from `initDownloader` (at startup, before any sync) to an `afterSnapshotDownload` callback invoked once the snapshot sync stage completes on the first sync cycle. - This eliminates false-positive incomplete torrent reports that occurred because we were registering on-disk snapshots before the sync stage had a chance to complete them. - The `afterDownload` callback is threaded through `StageSnapshotsCfg`, `NewDefaultStages`, and `NewPipelineStages` with a `nil` default so callers that don't need it are unaffected. - Incomplete snapshot warnings after this callback are now flagged as unexpected (sync just finished), with a comment noting the preverified-set edge case that remains unresolved. ## Background Issue #19435 first identified this problem: `AddTorrentsFromDisk` was being called before initial sync status was checked, causing old or stale `.torrent` files on disk to interfere with the sync stage's ability to fetch the correct (preverified) versions of snapshots. PR #19316 by @Giulio2002 addressed a related symptom — converting the fatal "snapshot already loaded with different infohash" error in `addPreverifiedSnapshotForDownload` into a warning — but this was treating a symptom rather than the root cause. Other related changes to this area: - PR #18056 — "Final downloader cleanup": Changed `AddTorrentsFromDisk` to only add completed snapshots (anything incomplete is assumed a mistake), and removed it as a default at startup. - PR #18270 — "Quick fix for --downloader.verify assertion": Fixed a panic in `addCompleteTorrentFromMetainfo` when called from `AddTorrentsFromDisk` on already-completed files (#18165). - PR #15043 — "Fix a collection of downloader, snapshot sync and torrent related issues": Earlier broad fix addressing how disk-loaded torrents interact with preverified snapshots. - PR #16655 — "Require torrents added from disk to complete on `downloader.verify`": Enforced completion check for disk-loaded torrents during verify. @mh0lt — please take a look, particularly at the scenario where extra snapshots land on disk that weren't part of the sync set (e.g. after an upgrade or downgrade that changes the preverified set). --------- Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
## Summary - Moves `AddTorrentsFromDisk` call from `initDownloader` (at startup, before any sync) to an `afterSnapshotDownload` callback invoked once the snapshot sync stage completes on the first sync cycle. - This eliminates false-positive incomplete torrent reports that occurred because we were registering on-disk snapshots before the sync stage had a chance to complete them. - The `afterDownload` callback is threaded through `StageSnapshotsCfg`, `NewDefaultStages`, and `NewPipelineStages` with a `nil` default so callers that don't need it are unaffected. - Incomplete snapshot warnings after this callback are now flagged as unexpected (sync just finished), with a comment noting the preverified-set edge case that remains unresolved. ## Background Issue #19435 first identified this problem: `AddTorrentsFromDisk` was being called before initial sync status was checked, causing old or stale `.torrent` files on disk to interfere with the sync stage's ability to fetch the correct (preverified) versions of snapshots. PR #19316 by @Giulio2002 addressed a related symptom — converting the fatal "snapshot already loaded with different infohash" error in `addPreverifiedSnapshotForDownload` into a warning — but this was treating a symptom rather than the root cause. Other related changes to this area: - PR #18056 — "Final downloader cleanup": Changed `AddTorrentsFromDisk` to only add completed snapshots (anything incomplete is assumed a mistake), and removed it as a default at startup. - PR #18270 — "Quick fix for --downloader.verify assertion": Fixed a panic in `addCompleteTorrentFromMetainfo` when called from `AddTorrentsFromDisk` on already-completed files (#18165). - PR #15043 — "Fix a collection of downloader, snapshot sync and torrent related issues": Earlier broad fix addressing how disk-loaded torrents interact with preverified snapshots. - PR #16655 — "Require torrents added from disk to complete on `downloader.verify`": Enforced completion check for disk-loaded torrents during verify. @mh0lt — please take a look, particularly at the scenario where extra snapshots land on disk that weren't part of the sync set (e.g. after an upgrade or downgrade that changes the preverified set). --------- Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
Problem
Fixes #19128
On restart after a fresh sync, the downloader encounters a fatal error:
This happens because background processes (e.g. recsplit) can rebuild snapshot files while the node is running, changing the file's infohash. On restart:
getExistingSnapshotTorrentreturns a fatal error on mismatchFix
In
addPreverifiedSnapshotForDownload, when we detect an infohash mismatch between an already-loaded torrent and the preverified list, we now skip the preverified download and keep the existing local torrent instead of returning an error. The local file is valid - it was loaded from disk and passed completeness checks.This aligns with the existing test
TestChangeInfoHashOfSameFilewhich documents this exact scenario and expects the existing torrent to be kept.Testing
TestChangeInfoHashOfSameFilespecifically covers this scenario