Skip to content

perf: improve daemon sync speed#65

Merged
sn1f3rt merged 12 commits into
masterfrom
fix/slow-daemon-sync
May 12, 2026
Merged

perf: improve daemon sync speed#65
sn1f3rt merged 12 commits into
masterfrom
fix/slow-daemon-sync

Conversation

@sn1f3rt
Copy link
Copy Markdown
Member

@sn1f3rt sn1f3rt commented Mar 22, 2026

Summary

P2P / network throughput

  • Increases BLOCKS_SYNCHRONIZING_DEFAULT_COUNT from 20 → 100
  • Implements adaptive sync in get_block_sync_size(): requests up to 256 blocks/batch when far behind the tip, scaling down to exactly blocks_behind when within 100–256 blocks of the target, then falling back to 100 at tip
  • Caps batch size at BLOCKS_SYNCHRONIZING_SAFE_BATCH_COUNT (256) — CNA v3/v4/v5 look up block data at height - 256; a larger uncommitted write batch would make those lookups invisible to new read-only transactions
  • Raises P2P_DEFAULT_SYNC_SEARCH_CONNECTIONS_COUNT from 2 → 4
  • Raises P2P_DEFAULT_LIMIT_RATE_UP from 2048 → 8192 KB/s to match the download rate limit (the previous asymmetry was causing the outbound throttle to fire on seed nodes)
  • Moves CURRENCY_PROTOCOL_MAX_OBJECT_REQUEST_COUNT from cryptonote_protocol_handler.h into cryptonote_config.h

LMDB / block cache

  • Fixes sync crash at HF7 (block 173500): build_block_cache now reuses the active batch write transaction instead of opening a fresh read-only txn that cannot see uncommitted blocks
  • Introduces block_cache_data compact struct (96 → 56 bytes per entry), storing only the four fields accessed by get_cna_v4_data / get_cna_v5_data — reduces L3 cache pressure at lower chain heights
  • Eliminates 5 random LMDB reads per block for CNA v2 hash lookups by routing them through the block cache

Weight limit / median computation

  • Replaces hash-based long-term weight cache validation with height comparison, saving one get_block_hash_from_height LMDB read per block on the incremental path
  • Passes the long-term median computed in get_next_long_term_block_weight directly into update_next_cumulative_weight_limit to avoid recomputing it; uses std::optional<uint64_t> to make presence explicit (replaces a uint64_t == 0 sentinel)
  • Skips short-term weight median recomputation for non-final blocks within a sync batch (mid_batch); the 50k-block long-term median is stable within a 256-block batch, so the stale limit safely covers intra-batch validation

Closes #64

@sn1f3rt sn1f3rt added the bug Something isn't working label Mar 22, 2026
@sn1f3rt sn1f3rt self-assigned this Mar 22, 2026
@sn1f3rt sn1f3rt added the bug Something isn't working label Mar 22, 2026
@sn1f3rt sn1f3rt changed the title fix: improve daemon sync speed perf: improve daemon sync speed Mar 22, 2026
@sn1f3rt sn1f3rt added enhancement New feature or request and removed bug Something isn't working labels Mar 22, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 22, 2026

Build Artifacts

Target Status Download
nerva-linux-armv7 Download
nerva-linux-armv8 Download
nerva-windows-x32 Download
nerva-linux-i686 Download
nerva-windows-x64 Download
nerva-linux-x86_64 Download
nerva-macos-x64 Download
nerva-macos-armv8 Download
nerva-freebsd-x86_64 Download
nerva-android-armv8 Download

10 succeeded, 0 failed | View workflow run

@benevanoff
Copy link
Copy Markdown
Contributor

These seem like reasonable changes. Did you make any measurements for how much this actually improves the sync time?

@sn1f3rt
Copy link
Copy Markdown
Member Author

sn1f3rt commented Mar 23, 2026

These seem like reasonable changes. Did you make any measurements for how much this actually improves the sync time?

I've asked Rob, Grey and others to test this out - will update the PR description when I have the results.

sn1f3rt added 5 commits March 24, 2026 21:55
…ing adaptive sync

Increases BLOCKS_SYNCHRONIZING_DEFAULT_COUNT from 20 to 100 for a 5x
throughput improvement, implements real adaptive sync logic in
get_block_sync_size() that scales up to 1000 blocks/batch when far
behind the chain tip, and raises P2P_DEFAULT_SYNC_SEARCH_CONNECTIONS_COUNT
from 2 to 4 to double peer discovery parallelism.

Closes #64
The adaptive sync introduced COMMAND_RPC_GET_BLOCKS_FAST_MAX_COUNT (1000)
as the ceiling for P2P block request batches, but the P2P protocol enforces
CURRENCY_PROTOCOL_MAX_OBJECT_REQUEST_COUNT (500). Nodes requesting >500 blocks
are immediately dropped by all peers, breaking initial sync from genesis.

Cap the adaptive batch size at CURRENCY_PROTOCOL_MAX_OBJECT_REQUEST_COUNT and
move that constant to cryptonote_config.h where it belongs alongside the other
sync-related constants.
CNA v3/v4/v5 PoW variants look up block data at height-256 using a
fresh read-only LMDB transaction. When blocks are added in a batch
write transaction, that read-only txn cannot see uncommitted blocks
in the same batch.

With a 500-block batch starting below HF9 (height 240500), any block
at position >= 257 in the batch will attempt to read block data that
is still uncommitted, causing MDB_NOTFOUND. The PoW hash then uses
garbage stack data, the block fails the "enough proof of work" check,
and the peer is dropped and blocked.

Fix: cap the adaptive sync batch size at 256 (BLOCKS_SYNCHRONIZING_SAFE_BATCH_COUNT).
A batch of <=256 blocks can only look up data at heights strictly
below the batch start, which are always committed.

The P2P protocol limit (CURRENCY_PROTOCOL_MAX_OBJECT_REQUEST_COUNT=500)
is unchanged; 256 is well within that limit.
build_block_cache stores mdb_block_info (96 bytes/entry) but
get_cna_v4_data and get_cna_v5_data only read four fields:
hash, timestamp, diff_lo, and coins (56 bytes total).

Introduce a compact block_cache_data struct with only those fields,
replacing std::vector<mdb_block_info> with std::vector<block_cache_data>.

Cache size at height 930k: ~49 MB (down from ~85 MB)
Cache size at current tip (~4.1M): ~218 MB (down from ~374 MB)

On CPUs with >=48 MB L3, the full 930k cache now fits in L3,
restoring compute-bound behaviour that was lost when the cache grew
past L3 capacity at the HF12 boundary.
@sn1f3rt sn1f3rt force-pushed the fix/slow-daemon-sync branch from 31b094e to ba8931a Compare March 24, 2026 16:25
R0BC0D3R and others added 4 commits March 29, 2026 16:08
…ost-HF12

This caches the median computed in get_next_long_term_block_weight and passes it to update_next_cumulative_weight_limit, avoiding a duplicate 50k-window calculation per block. Improves post-HF12 sync speed by ~37%.
…reads per block

get_cna_v2_data previously called get_block_hash_from_height() five times,
each requiring a random LMDB read. During initial sync these are cold reads
that stall the CPU waiting on I/O.

build_block_cache() already loads a contiguous window of block hashes into
memory for get_cna_v4_data and get_cna_v5_data. Calling it at the start of
get_cna_v2_data (a no-op if the cache is already current) and reading h0–h4
from m_block_cache eliminates all 5 LMDB reads per block for CNA v2 PoW.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sn1f3rt added a commit that referenced this pull request Apr 6, 2026
The daemon's getblocks.bin RPC handler returns up to
COMMAND_RPC_GET_BLOCKS_FAST_MAX_COUNT (1000) blocks per request. The
wallet held m_daemon_rpc_mutex for the entire duration of each response,
meaning one 1000-block fetch (potentially several seconds on Windows)
blocked all other wallet RPC operations — balance queries, height
updates, etc. — causing the GUI to show timeouts during wallet scan.

Note: the adaptive sync cap introduced in PR #65 applies only to P2P
block propagation via get_block_sync_size(); it has no effect on the
wallet getblocks.bin endpoint, which always used the 1000-block server
default.

Fix: add an optional max_count field to COMMAND_RPC_GET_BLOCKS_FAST
request (0 = server default, backwards-compatible). The wallet sets it
to BLOCKS_SYNCHRONIZING_DEFAULT_COUNT (100), reducing each mutex hold
to a fraction of its previous duration and allowing status calls to
interleave between batches.
R0BC0D3R and others added 3 commits April 15, 2026 16:56
…ailure at HF7 (block 173500)

build_block_cache was always opening a new read-only LMDB transaction,
which cannot see blocks written in the current uncommitted batch
transaction. At HF7 (block 173500), CNA v2 PoW is first activated and
calls build_block_cache to read the preceding blocks — all of which are
in the active batch and invisible to a read-only txn. This caused a DB
error, block rejection, peer ban, and sync halt.

Fix: reuse m_write_txn when a batch is active, matching the established
TXN_BLOCK_PREFIX pattern used elsewhere in this file. Also move
m_block_cache_height.store() inside the critical section to close a
secondary race condition.
m_prepare_height is reset to 0 inside prepare_handle_incoming_blocks
before the function returns, so by the time handle_block_to_main_chain
runs the condition `new_height < m_prepare_height + m_prepare_nblocks`
reduced to `new_height < m_prepare_nblocks`, which is false for any
block past the genesis batch.

Introduce m_batch_start_height, set alongside m_prepare_nblocks at the
start of the prepare phase and not reset mid-function, so the mid_batch
guard correctly covers all non-final blocks in each sync batch.
…d long-term median

Using 0 as "not computed" silently misfires if the long-term median ever
evaluates to exactly zero (theoretically impossible in a live chain, but
a hidden invariant). Replace with std::optional<uint64_t> so presence is
explicit, and clean up the NULL/nullptr inconsistency in the default arg.
@sn1f3rt sn1f3rt merged commit a01db4b into master May 12, 2026
14 checks passed
@sn1f3rt sn1f3rt deleted the fix/slow-daemon-sync branch May 12, 2026 18:07
sn1f3rt added a commit that referenced this pull request May 12, 2026
The daemon's getblocks.bin RPC handler returns up to
COMMAND_RPC_GET_BLOCKS_FAST_MAX_COUNT (1000) blocks per request. The
wallet held m_daemon_rpc_mutex for the entire duration of each response,
meaning one 1000-block fetch (potentially several seconds on Windows)
blocked all other wallet RPC operations — balance queries, height
updates, etc. — causing the GUI to show timeouts during wallet scan.

Note: the adaptive sync cap introduced in PR #65 applies only to P2P
block propagation via get_block_sync_size(); it has no effect on the
wallet getblocks.bin endpoint, which always used the 1000-block server
default.

Fix: add an optional max_count field to COMMAND_RPC_GET_BLOCKS_FAST
request (0 = server default, backwards-compatible). The wallet sets it
to BLOCKS_SYNCHRONIZING_DEFAULT_COUNT (100), reducing each mutex hold
to a fraction of its previous duration and allowing status calls to
interleave between batches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

investigate: slow daemon sync times

4 participants