-
-
Couldn't load subscription status.
- Fork 106
Description
Problem
Recent debugging of #2010 and #1996 revealed that tracing state propagation across peers is difficult because:
- Scattered logging - State changes logged at different levels in different modules
- Missing context - Hard to correlate "peer A has version X" with "peer B has version Y"
- No timeline view - Can't easily see the sequence of state changes across multiple peers
- Difficult to query - Can't filter for "show me all state changes for contract K across all peers"
This made debugging feel like "whack-a-mole combined with groundhog day" because it was hard to understand what state each peer had at each point in time.
Proposal
Add structured operation tracing that makes state propagation visible and queryable. This should integrate with the existing tracing infrastructure but add specific span/event patterns for contract operations.
Design
1. Operation Spans with Structured Fields
Wrap each high-level operation in a span with consistent fields:
let span = tracing::info_span!(
"contract_operation",
op_type = "put", // put, update, get
tx = %transaction_id,
contract = %key,
peer = %peer_id,
role = "initiator", // initiator, target, forwarder
initial_version = ?old_version,
final_version = tracing::field::Empty, // filled later
);
let _enter = span.enter();
// ... perform operation ...
span.record("final_version", &new_version);2. State Change Events
Emit structured events at key points:
tracing::info!(
event = "state_cached",
contract = %key,
peer = %peer_id,
version = version,
size_bytes = state.len(),
source = "local_merge", // local_merge, network_receive, client_put
"Cached contract state locally"
);
tracing::info!(
event = "state_forwarded",
contract = %key,
from_peer = %source_peer,
to_peer = %target_peer,
version = version,
"Forwarded contract state to target peer"
);3. Multi-Peer Timeline View
With structured events, we can query logs to build timelines:
# Show all state changes for a specific contract across all peers
cat peer-*.log | grep 'contract="ABC"' | jq -s 'sort_by(.timestamp)'
# Show version progression for a contract
cat peer-*.log | grep 'event="state_cached"' | \
jq -s 'group_by(.peer) | map({peer: .[0].peer, versions: map(.version)})'4. Tracing Subscribers
Consider adding a custom tracing subscriber for development that:
- Aggregates events by contract key
- Displays a visual timeline of state propagation
- Highlights version mismatches or unexpected state
// Example output:
// Contract ABC timeline:
// T+0.0s [Peer1] PUT initiated (v0 -> v1)
// T+0.1s [Peer1] State cached locally (v1)
// T+0.2s [Peer1] Forwarded to Peer2 (v1)
// T+0.3s [Peer2] State received (v1)
// T+0.4s [Peer2] State cached locally (v1)
// ⚠ T+0.5s [Peer1] GET returned (v0) ← VERSION MISMATCH!Suggested Events to Track
PUT Operation
put_initiated- Client initiates PUTstate_cached- State cached locally before forwardingput_forwarded- Request forwarded to target peerput_received- Target peer receives PUTput_completed- PUT completes successfullyput_failed- PUT fails with error
UPDATE Operation
update_initiated- Client initiates UPDATEdelta_applied- Delta applied to local statestate_cached- Updated state cached locallyupdate_forwarded- Request forwarded to target peerupdate_received- Target peer receives UPDATEupdate_completed- UPDATE completes successfullyupdate_failed- UPDATE fails with error
GET Operation
get_initiated- Client initiates GETstate_served- State served from local cacheget_forwarded- Request forwarded to target peerstate_received- State received from networkget_completed- GET completes successfullyget_failed- GET fails or times out
State Management
state_merged- Two states merged (include both versions)version_incremented- Version incremented (old -> new)contract_seeded- Contract added to seed listcontract_unseeded- Contract removed from seed list
Benefits
- Easier debugging - See exactly what happened, when, and on which peer
- Reproduce issues - Timeline view makes it easy to understand failure sequences
- Performance analysis - Structured timing data for operation latencies
- Testing validation - Verify state propagation in integration tests
- Documentation - Events serve as living documentation of the protocol
Implementation Approach
- Start with PUT/UPDATE/GET operations in
crates/core/src/operations/ - Add structured spans and events at key points
- Create a test subscriber that validates state propagation in tests
- Consider a
--trace-contractsCLI flag for development builds - Document the event schema for external log analysis tools
Related Issues
- fix: client-initiated PUT requests not cached locally when forwarded to target peer #2010 - Client-initiated PUT not cached locally
- fix: persist contract state after PUT merge in upsert_contract_state #1996 - Incoming PUT not cached locally
- Add invariant assertions for contract operation state consistency #2013 - Add invariant assertions (complementary approach)
Example Use Cases
Use case 1: Debug why a peer serves stale state
# Find when peer last cached this contract
grep 'peer="ABC" event="state_cached"' peer.log | tail -1Use case 2: Verify state propagated correctly in test
assert_event_sequence(&logs, &[
("put_initiated", peer1),
("state_cached", peer1),
("put_forwarded", peer1),
("state_received", peer2),
("state_cached", peer2),
]);Use case 3: Find version inconsistencies
# Show all peers with their latest cached version for contract K
grep 'contract="K" event="state_cached"' *.log | \
jq -s 'group_by(.peer) | map({peer: .[0].peer, version: (map(.version) | max)})'[AI-assisted debugging and comment]
Metadata
Metadata
Assignees
Labels
Type
Projects
Status