Skip to content

Add structured operation tracing for state propagation visibility #2014

@sanity

Description

@sanity

Problem

Recent debugging of #2010 and #1996 revealed that tracing state propagation across peers is difficult because:

  1. Scattered logging - State changes logged at different levels in different modules
  2. Missing context - Hard to correlate "peer A has version X" with "peer B has version Y"
  3. No timeline view - Can't easily see the sequence of state changes across multiple peers
  4. Difficult to query - Can't filter for "show me all state changes for contract K across all peers"

This made debugging feel like "whack-a-mole combined with groundhog day" because it was hard to understand what state each peer had at each point in time.

Proposal

Add structured operation tracing that makes state propagation visible and queryable. This should integrate with the existing tracing infrastructure but add specific span/event patterns for contract operations.

Design

1. Operation Spans with Structured Fields

Wrap each high-level operation in a span with consistent fields:

let span = tracing::info_span!(
    "contract_operation",
    op_type = "put",           // put, update, get
    tx = %transaction_id,
    contract = %key,
    peer = %peer_id,
    role = "initiator",         // initiator, target, forwarder
    initial_version = ?old_version,
    final_version = tracing::field::Empty,  // filled later
);

let _enter = span.enter();
// ... perform operation ...
span.record("final_version", &new_version);

2. State Change Events

Emit structured events at key points:

tracing::info!(
    event = "state_cached",
    contract = %key,
    peer = %peer_id,
    version = version,
    size_bytes = state.len(),
    source = "local_merge",  // local_merge, network_receive, client_put
    "Cached contract state locally"
);

tracing::info!(
    event = "state_forwarded",
    contract = %key,
    from_peer = %source_peer,
    to_peer = %target_peer,
    version = version,
    "Forwarded contract state to target peer"
);

3. Multi-Peer Timeline View

With structured events, we can query logs to build timelines:

# Show all state changes for a specific contract across all peers
cat peer-*.log | grep 'contract="ABC"' | jq -s 'sort_by(.timestamp)'

# Show version progression for a contract
cat peer-*.log | grep 'event="state_cached"' | \
  jq -s 'group_by(.peer) | map({peer: .[0].peer, versions: map(.version)})'

4. Tracing Subscribers

Consider adding a custom tracing subscriber for development that:

  • Aggregates events by contract key
  • Displays a visual timeline of state propagation
  • Highlights version mismatches or unexpected state
// Example output:
// Contract ABC timeline:
// T+0.0s  [Peer1] PUT initiated (v0 -> v1)
// T+0.1s  [Peer1] State cached locally (v1)
// T+0.2s  [Peer1] Forwarded to Peer2 (v1)
// T+0.3s  [Peer2] State received (v1)
// T+0.4s  [Peer2] State cached locally (v1)
// ⚠ T+0.5s  [Peer1] GET returned (v0) ← VERSION MISMATCH!

Suggested Events to Track

PUT Operation

  • put_initiated - Client initiates PUT
  • state_cached - State cached locally before forwarding
  • put_forwarded - Request forwarded to target peer
  • put_received - Target peer receives PUT
  • put_completed - PUT completes successfully
  • put_failed - PUT fails with error

UPDATE Operation

  • update_initiated - Client initiates UPDATE
  • delta_applied - Delta applied to local state
  • state_cached - Updated state cached locally
  • update_forwarded - Request forwarded to target peer
  • update_received - Target peer receives UPDATE
  • update_completed - UPDATE completes successfully
  • update_failed - UPDATE fails with error

GET Operation

  • get_initiated - Client initiates GET
  • state_served - State served from local cache
  • get_forwarded - Request forwarded to target peer
  • state_received - State received from network
  • get_completed - GET completes successfully
  • get_failed - GET fails or times out

State Management

  • state_merged - Two states merged (include both versions)
  • version_incremented - Version incremented (old -> new)
  • contract_seeded - Contract added to seed list
  • contract_unseeded - Contract removed from seed list

Benefits

  1. Easier debugging - See exactly what happened, when, and on which peer
  2. Reproduce issues - Timeline view makes it easy to understand failure sequences
  3. Performance analysis - Structured timing data for operation latencies
  4. Testing validation - Verify state propagation in integration tests
  5. Documentation - Events serve as living documentation of the protocol

Implementation Approach

  1. Start with PUT/UPDATE/GET operations in crates/core/src/operations/
  2. Add structured spans and events at key points
  3. Create a test subscriber that validates state propagation in tests
  4. Consider a --trace-contracts CLI flag for development builds
  5. Document the event schema for external log analysis tools

Related Issues

Example Use Cases

Use case 1: Debug why a peer serves stale state

# Find when peer last cached this contract
grep 'peer="ABC" event="state_cached"' peer.log | tail -1

Use case 2: Verify state propagated correctly in test

assert_event_sequence(&logs, &[
    ("put_initiated", peer1),
    ("state_cached", peer1),
    ("put_forwarded", peer1),
    ("state_received", peer2),
    ("state_cached", peer2),
]);

Use case 3: Find version inconsistencies

# Show all peers with their latest cached version for contract K
grep 'contract="K" event="state_cached"' *.log | \
  jq -s 'group_by(.peer) | map({peer: .[0].peer, version: (map(.version) | max)})'

[AI-assisted debugging and comment]

Metadata

Metadata

Assignees

Labels

A-contractsArea: Contract runtime, SDK, and executionA-developer-xpArea: developer experienceA-networkingArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateP-mediumMedium priorityT-enhancementType: Improvement to existing functionality

Type

No type

Projects

Status

Triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions