Skip to content

Fix signet indexing #63

Merged
arminsabouri merged 1 commit intopayjoin:masterfrom
Mshehu5:data_length
Apr 22, 2026
Merged

Fix signet indexing #63
arminsabouri merged 1 commit intopayjoin:masterfrom
Mshehu5:data_length

Conversation

@Mshehu5
Copy link
Copy Markdown
Contributor

@Mshehu5 Mshehu5 commented Apr 21, 2026

closes: #62
This PR fixes blk file parsing by using Bitcoin Core’s recorded logical file size (BlockFileInfo.size) instead of the raw filesystem length. That prevents the parser from reading into the preallocated tail of the active blk*.dat file, which can look like garbage after XOR decoding and cause unexpected EOF errors.

Reasoning for this approach are here #62 (comment)

Message for reviwers:
After this fix indexing is working with results below, but if a better approach is available would appreciate suggestions

tx-indexer on  data_length [$!?] is 📦 v0.1.0 via 🦀 v1.92.0-nightly 
❯ time cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 1000
   Compiling tx-indexer-primitives v0.1.0 (/home/shehu/Documents/os/tx-indexer/src/crates/primitives)
   Compiling tx-indexer-fingerprints v0.1.0 (/home/shehu/Documents/os/tx-indexer/src/crates/fingerprints)
   Compiling tx-indexer-pipeline v0.1.0 (/home/shehu/Documents/os/tx-indexer/src/crates/pipeline)
   Compiling tx-indexer-heuristics v0.1.0 (/home/shehu/Documents/os/tx-indexer/src/crates/heuristics)
   Compiling bitcoin-transaction-indexer v0.1.0 (/home/shehu/Documents/os/tx-indexer)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.84s
     Running `target/debug/examples/indexer --datadir /home/shehu/.bitcoin/signet --depth 1000`
Indexing last 1001 blocks (depth: 1000) from /home/shehu/.bitcoin/signet
Indexed 90835 transactions in 1001 blocks (27.46s)

--- RBF signaling analysis (5.49s) ---
Transactions signaling RBF: 58863/90835 (64.8%)
cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 1000  31.64s user 4.81s system 107% cpu 33.976 total

tx-indexer on  data_length [$!?] is 📦 v0.1.0 via 🦀 v1.92.0-nightly took 34s 
❯ time cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 100 
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/examples/indexer --datadir /home/shehu/.bitcoin/signet --depth 100`
Indexing last 101 blocks (depth: 100) from /home/shehu/.bitcoin/signet
Indexed 13253 transactions in 101 blocks (7.11s)

--- RBF signaling analysis (860.65ms) ---
Transactions signaling RBF: 7659/13253 (57.8%)
cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 100  7.68s user 0.68s system 102% cpu 8.137 total

Comment thread src/crates/primitives/src/parser.rs Outdated
Comment thread src/crates/primitives/src/parser.rs Outdated
Comment thread src/crates/primitives/src/parser.rs Outdated
pub file_no: u32,
pub height_first: u32,
pub height_last: u32,
pub data_len: Option<u32>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it optional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s optional because not every parser path has a known logical blk file size. its a fallaback inase we get NONE. None means “fall back to the full file length”

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean not every blk file has a logical size? It seems like its always set to Some() in method that creates the hints?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes those are always Some(info.size). The Option is only for the parser fallback path when no hints are provided where we still allow BlkFileHint::default() / empty hints and do not know the logical size up front.

check parser.rs line 47

pub paths: dense::IndexPaths,
pub spk_db: SledScriptPubkeyDb,
/// Blk file layout hints: `(file_no, height_first, height_last)`, sorted by `file_no`.
/// Blk file layout hints derived from Bitcoin Core's block index.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a regression. I would revert and just reference the new struct BlkFileHints

Comment thread src/crates/primitives/src/parser.rs Outdated
pub file_no: u32,
pub height_first: u32,
pub height_last: u32,
pub data_len: Option<u32>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A rustdoc for this would be useful. esp. since later in this PR its re-assigned a var called used_len?

Copy link
Copy Markdown
Contributor

@bc1cindy bc1cindy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

could we add a test that builds a synthetic blk file with garbage past data_len and asserts the parse stops cleanly?

that would pin down the signet regression against future changes

@arminsabouri
Copy link
Copy Markdown
Collaborator

good catch!

could we add a test that builds a synthetic blk file with garbage past data_len and asserts the parse stops cleanly?

that would pin down the signet regression against future changes

Yes to adding a test for this. It can live with the other blk file test fixtures. I am not sure how to get actually generate this fixture.

@bc1cindy
Copy link
Copy Markdown
Contributor

I am not sure how to get actually generate this fixture.

maybe using one that already exists and appending zeros?

@arminsabouri
Copy link
Copy Markdown
Collaborator

I am not sure how to get actually generate this fixture.

maybe using one that already exists and appending zeros?

I don't think that will work. You would also need to update the leveldb index

@arminsabouri
Copy link
Copy Markdown
Collaborator

Sorry @Mshehu5 with #61 merged there are some conflicts

Use Bitcoin Core's recorded blk file size instead of the raw
filesystem length when parsing block files.

This prevents the parser from reading into the preallocated tail of
the active blk file, which can appear as garbage after XOR decoding
and trigger unexpected EOF errors.
@Mshehu5
Copy link
Copy Markdown
Contributor Author

Mshehu5 commented Apr 22, 2026

Sorry @Mshehu5 with #61 merged there are some conflicts

Thanks for the heads up. Merged conflicts addressed good to go

Copy link
Copy Markdown
Collaborator

@arminsabouri arminsabouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cACK 2edce7d

@arminsabouri arminsabouri merged commit ae5cc43 into payjoin:master Apr 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix signet indexing

3 participants