Skip to content

Conversation

@sanity
Copy link
Collaborator

@sanity sanity commented Dec 5, 2025

Problem

When a contract is stored via PUT, the subsequent GET from the same node can fail with "contract not found" even though both operations happen on the same peer. This is a race condition between storing and fetching the contract.

Root cause: The stretto caching library (used in ContractStore) provides eventual consistency. Inserts are buffered to background threads and may not be immediately visible to subsequent get() calls. From the stretto documentation:

"Wait until all previous operations have been applied. This ensures that any insert or remove calls made before calling this method will be reflected in subsequent get calls."

The race condition:

  1. store_contract() calls cache.insert() for a new contract
  2. validate_state() immediately calls fetch_contract() to retrieve it
  3. fetch_contract() calls cache.get() but the insert hasn't been applied yet
  4. Returns "contract not found"

Solution

Call wait() after inserting into the cache to ensure the value is visible before returning from store_contract(). This adds minimal overhead since we're only blocking until this specific insert completes, not waiting for unrelated cache operations.

Testing

  • cargo test -p freenet store_and_load passes
  • The existing unit test in contract_store.rs validates store-then-fetch works correctly

Fixes

Closes #2214

[AI-assisted - Claude]

The stretto caching library uses background threads for inserts,
providing eventual consistency. Without calling wait() after insert(),
the value may not be immediately visible to subsequent get() calls.

This caused a race condition where store_contract() would insert a
contract into the cache, then validate_state() would immediately try
to fetch it via fetch_contract() and fail with "contract not found"
because the background insert hadn't completed yet.

Fixes #2214

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@sanity sanity force-pushed the fix-2214-stretto-cache branch from a0c8259 to fc158c7 Compare December 5, 2025 04:20
// for inserts, and without wait() the value may not be immediately visible
// to subsequent get() calls (eventual consistency). This is critical because
// validate_state() needs to fetch the contract immediately after storing.
let _ = self.contract_cache.wait();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude What thread will this block?

Copy link

@claude claude bot Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude finished @sanity's task —— View job


Analysis

The cache.wait() call blocks the current tokio worker thread that's executing the async function.

Call chain:

  1. store_contract() is called from verify_and_store_contract() (runtime.rs:1092)
  2. verify_and_store_contract() is an async fn running on the tokio async runtime
  3. When the tokio executor schedules this task on a worker thread, that worker thread executes the code
  4. The cache.wait() call is a synchronous blocking operation (not async)
  5. This blocks the tokio worker thread until the stretto background threads complete the buffered insert

Impact:

  • While one tokio worker thread is blocked waiting for the cache, other tokio worker threads can still process other async tasks
  • The blocking duration should be minimal (just waiting for the cache insert buffer to flush)
  • However, this is technically blocking async code, which is generally discouraged in tokio

Alternative approach:
If stretto provided an async wait() method (like async fn wait() or returning a future), that would be preferable as it would allow the tokio worker thread to handle other tasks while waiting. But since stretto's Cache is synchronous (not AsyncCache), and the wait time should be very short (just flushing the insert buffer), the current approach is acceptable.

Note: The code uses stretto::Cache (sync), not stretto::AsyncCache. The crate does have both variants available (Cargo.toml enables both "async" and "sync" features).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: while this technically blocks the tokio task, the important context is that this is not a cross-request race condition.

The store_contract()validate_state() sequence happens within a single PUT request, synchronously. Looking at verify_and_store_contract() in runtime.rs:

// Line 1092: store the contract
self.runtime.contract_store.store_contract(contract)?;

// Line 1105: immediately validate (needs to fetch contract)
let result = self.runtime.validate_state(...)?;

The validate_state() call immediately fetches the contract we just stored via prepare_contract_call()fetch_contract(). Without wait(), the buffered insert hasn't been processed yet, so get() returns None and we get "contract not found".

The window isn't "narrow" - it's essentially guaranteed to fail because it's two consecutive function calls in the same thread, and stretto buffers the insert for background processing.

[AI-assisted - Claude]

@sanity sanity added this pull request to the merge queue Dec 5, 2025
Merged via the queue into main with commit 8a3769f Dec 5, 2025
20 of 21 checks passed
@sanity sanity deleted the fix-2214-stretto-cache branch December 5, 2025 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GET request fails to find locally cached contract after different client's PUT

2 participants