Skip to content

[SharovBot] fix: handle snapshot infohash mismatch on restart#19316

Merged
Giulio2002 merged 3 commits intomainfrom
fix/snapshot-infohash-mismatch-on-restart
Feb 24, 2026
Merged

[SharovBot] fix: handle snapshot infohash mismatch on restart#19316
Giulio2002 merged 3 commits intomainfrom
fix/snapshot-infohash-mismatch-on-restart

Conversation

@Giulio2002
Copy link
Copy Markdown
Collaborator

Problem

Fixes #19128

On restart after a fresh sync, the downloader encounters a fatal error:

downloading snapshot v1.1-018000-018500-bodies.seg (infohash X): snapshot already loaded with different infohash: Y

This happens because background processes (e.g. recsplit) can rebuild snapshot files while the node is running, changing the file's infohash. On restart:

  1. The torrent is loaded from disk with the new infohash (from the updated .torrent file)
  2. The preverified list still references the old infohash
  3. getExistingSnapshotTorrent returns a fatal error on mismatch

Fix

In addPreverifiedSnapshotForDownload, when we detect an infohash mismatch between an already-loaded torrent and the preverified list, we now skip the preverified download and keep the existing local torrent instead of returning an error. The local file is valid - it was loaded from disk and passed completeness checks.

This aligns with the existing test TestChangeInfoHashOfSameFile which documents this exact scenario and expects the existing torrent to be kept.

Testing

  • All existing downloader tests pass
  • TestChangeInfoHashOfSameFile specifically covers this scenario

When a snapshot file is rebuilt in the background (e.g. by recsplit) between
restarts, its infohash changes. On the next restart, the torrent is loaded
from disk with the new infohash, but the preverified list still references
the old one.

Previously this caused a fatal error: 'snapshot already loaded with different
infohash'. Now we gracefully skip the preverified download and keep the
existing local torrent, since the local file is valid.

Fixes #19128
@Giulio2002
Copy link
Copy Markdown
Collaborator Author

[SharovBot] 🔄 Restarted failing CI workflow: 22186493427 (All tests - tests-mac-linux ubuntu-24.04)

Copy link
Copy Markdown
Contributor

@anacrolix anacrolix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's possibly a few issues here I think:

  • Snapshots should only be requested for download during an initial sync. A node should not be in a state where an initial sync is required and already have snapshots present that didn't match (there's currently a minor case where consecutive initial syncs that don't complete can disagree which a snapshots reset fixes).
  • If snapshot file contents change, they should be removed from the downloader first. This is a major bug if it's still occurring. New snapshots should have different names, and if that's not possible, the snapshot should be removed from the download by name before adding the replacement.

I think the former is the larger concern here, why hasn't the node already completed the initial sync before something like this could happen.

I created #19435 for this problem.

Copy link
Copy Markdown
Contributor

@anacrolix anacrolix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm that's not a blocking review, I just want to check the points are sound before merging. As above I think this indicates a broken use case, but this is a good workaround.

…ent and issue reference

When addPreverifiedSnapshotForDownload encounters an infohash mismatch between
a loaded torrent and the preverified list, keep the existing local torrent (no
crash). Improve the log message to be explicit that the preverified download is
being skipped, and add a comment pointing to the root-cause tracking issue
#19435 (AddTorrentsFromDisk ordering relative to initial sync).

The proper structural fix (running initial sync before AddTorrentsFromDisk so
preverified TOML hashes always win) is tracked in issue #19435.
@Giulio2002
Copy link
Copy Markdown
Collaborator Author

[SharovBot] 🔧 Addressed @anacrolix feedback — updated comment and log message to be explicit about the tradeoff, and added a reference to #19435 as the proper structural fix.

On the 'remove first' suggestion: I attempted a 'remove existing, add preverified' approach, but TestChangeInfoHashOfSameFile explicitly tests the opposite — it documents that a node with a local snapshot should keep its version when a new preverified hash is released (e.g. re-compressed release). That test would fail with the remove-first approach.

On the root cause concern: Agreed that the real fix is running initial sync before AddTorrentsFromDisk so preverified TOML hashes always win. That structural change is what #19435 tracks. This PR is a minimal stopgap to prevent the fatal crash while that deeper fix is implemented — with an explicit code comment now pointing to #19435.

@Giulio2002
Copy link
Copy Markdown
Collaborator Author

[SharovBot] 🔄 Restarted failing CI workflow 22332838279 (tests-windows)

@Giulio2002
Copy link
Copy Markdown
Collaborator Author

Giulio2002 commented Feb 24, 2026

Oh @anacrolix AI is pretty bad at PR desc. basically a failure there should not lead to a total crash. a WARN is enough imo

@anacrolix
Copy link
Copy Markdown
Contributor

Damn I shouldn't have hit update branch now it's running everything again

@Giulio2002
Copy link
Copy Markdown
Collaborator Author

[SharovBot] ✅ Auto-merge enabled — approved by erigontech org member @anacrolix

@Giulio2002
Copy link
Copy Markdown
Collaborator Author

[SharovBot] 🔄 CI rerun already in progress for failing tests-mac-linux (macos-15) (run 22348856962) — monitoring for auto-merge.

@Giulio2002
Copy link
Copy Markdown
Collaborator Author

[SharovBot] 🔄 Restarted failing CI workflow 22348856962 — TestShutterBlockBuilding on macos-15 is unrelated to this PR's changes (snapshot infohash fix); appears to be a flaky integration test.

@Giulio2002
Copy link
Copy Markdown
Collaborator Author

[SharovBot] 🔄 Restarted cancelled Unit tests workflow 22348856935 (linux (macos-latest) was cancelled, blocking auto-merge).

@Giulio2002 Giulio2002 merged commit a3b76e9 into main Feb 24, 2026
31 of 35 checks passed
@Giulio2002 Giulio2002 deleted the fix/snapshot-infohash-mismatch-on-restart branch February 24, 2026 15:05
sudeepdino008 pushed a commit that referenced this pull request Mar 4, 2026
## Problem

Fixes #19128

On restart after a fresh sync, the downloader encounters a fatal error:
```
downloading snapshot v1.1-018000-018500-bodies.seg (infohash X): snapshot already loaded with different infohash: Y
```

This happens because background processes (e.g. recsplit) can rebuild
snapshot files while the node is running, changing the file's infohash.
On restart:
1. The torrent is loaded from disk with the **new** infohash (from the
updated .torrent file)
2. The preverified list still references the **old** infohash
3. `getExistingSnapshotTorrent` returns a fatal error on mismatch

## Fix

In `addPreverifiedSnapshotForDownload`, when we detect an infohash
mismatch between an already-loaded torrent and the preverified list, we
now **skip the preverified download** and keep the existing local
torrent instead of returning an error. The local file is valid - it was
loaded from disk and passed completeness checks.

This aligns with the existing test `TestChangeInfoHashOfSameFile` which
documents this exact scenario and expects the existing torrent to be
kept.

## Testing

- All existing downloader tests pass
- `TestChangeInfoHashOfSameFile` specifically covers this scenario
AskAlexSharov added a commit that referenced this pull request Mar 16, 2026
## Summary

- Moves `AddTorrentsFromDisk` call from `initDownloader` (at startup,
before any sync) to an `afterSnapshotDownload` callback invoked once the
snapshot sync stage completes on the first sync cycle.
- This eliminates false-positive incomplete torrent reports that
occurred because we were registering on-disk snapshots before the sync
stage had a chance to complete them.
- The `afterDownload` callback is threaded through `StageSnapshotsCfg`,
`NewDefaultStages`, and `NewPipelineStages` with a `nil` default so
callers that don't need it are unaffected.
- Incomplete snapshot warnings after this callback are now flagged as
unexpected (sync just finished), with a comment noting the
preverified-set edge case that remains unresolved.

## Background

Issue #19435 first identified this problem: `AddTorrentsFromDisk` was
being called before initial sync status was checked, causing old or
stale `.torrent` files on disk to interfere with the sync stage's
ability to fetch the correct (preverified) versions of snapshots.

PR #19316 by @Giulio2002 addressed a related symptom — converting the
fatal "snapshot already loaded with different infohash" error in
`addPreverifiedSnapshotForDownload` into a warning — but this was
treating a symptom rather than the root cause.

Other related changes to this area:
- PR #18056 — "Final downloader cleanup": Changed `AddTorrentsFromDisk`
to only add completed snapshots (anything incomplete is assumed a
mistake), and removed it as a default at startup.
- PR #18270 — "Quick fix for --downloader.verify assertion": Fixed a
panic in `addCompleteTorrentFromMetainfo` when called from
`AddTorrentsFromDisk` on already-completed files (#18165).
- PR #15043 — "Fix a collection of downloader, snapshot sync and torrent
related issues": Earlier broad fix addressing how disk-loaded torrents
interact with preverified snapshots.
- PR #16655 — "Require torrents added from disk to complete on
`downloader.verify`": Enforced completion check for disk-loaded torrents
during verify.

@mh0lt — please take a look, particularly at the scenario where extra
snapshots land on disk that weren't part of the sync set (e.g. after an
upgrade or downgrade that changes the preverified set).

---------

Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
lupin012 pushed a commit that referenced this pull request Mar 17, 2026
## Summary

- Moves `AddTorrentsFromDisk` call from `initDownloader` (at startup,
before any sync) to an `afterSnapshotDownload` callback invoked once the
snapshot sync stage completes on the first sync cycle.
- This eliminates false-positive incomplete torrent reports that
occurred because we were registering on-disk snapshots before the sync
stage had a chance to complete them.
- The `afterDownload` callback is threaded through `StageSnapshotsCfg`,
`NewDefaultStages`, and `NewPipelineStages` with a `nil` default so
callers that don't need it are unaffected.
- Incomplete snapshot warnings after this callback are now flagged as
unexpected (sync just finished), with a comment noting the
preverified-set edge case that remains unresolved.

## Background

Issue #19435 first identified this problem: `AddTorrentsFromDisk` was
being called before initial sync status was checked, causing old or
stale `.torrent` files on disk to interfere with the sync stage's
ability to fetch the correct (preverified) versions of snapshots.

PR #19316 by @Giulio2002 addressed a related symptom — converting the
fatal "snapshot already loaded with different infohash" error in
`addPreverifiedSnapshotForDownload` into a warning — but this was
treating a symptom rather than the root cause.

Other related changes to this area:
- PR #18056 — "Final downloader cleanup": Changed `AddTorrentsFromDisk`
to only add completed snapshots (anything incomplete is assumed a
mistake), and removed it as a default at startup.
- PR #18270 — "Quick fix for --downloader.verify assertion": Fixed a
panic in `addCompleteTorrentFromMetainfo` when called from
`AddTorrentsFromDisk` on already-completed files (#18165).
- PR #15043 — "Fix a collection of downloader, snapshot sync and torrent
related issues": Earlier broad fix addressing how disk-loaded torrents
interact with preverified snapshots.
- PR #16655 — "Require torrents added from disk to complete on
`downloader.verify`": Enforced completion check for disk-loaded torrents
during verify.

@mh0lt — please take a look, particularly at the scenario where extra
snapshots land on disk that weren't part of the sync set (e.g. after an
upgrade or downgrade that changes the preverified set).

---------

Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sporadic could not start execution service due to error snapshot already loaded with different infohash

2 participants