-
-
Notifications
You must be signed in to change notification settings - Fork 107
fix: wait for stretto cache insert before validation #2215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The stretto caching library uses background threads for inserts, providing eventual consistency. Without calling wait() after insert(), the value may not be immediately visible to subsequent get() calls. This caused a race condition where store_contract() would insert a contract into the cache, then validate_state() would immediately try to fetch it via fetch_contract() and fail with "contract not found" because the background insert hadn't completed yet. Fixes #2214 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
a0c8259 to
fc158c7
Compare
| // for inserts, and without wait() the value may not be immediately visible | ||
| // to subsequent get() calls (eventual consistency). This is critical because | ||
| // validate_state() needs to fetch the contract immediately after storing. | ||
| let _ = self.contract_cache.wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@claude What thread will this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude finished @sanity's task —— View job
Analysis
The cache.wait() call blocks the current tokio worker thread that's executing the async function.
Call chain:
store_contract()is called fromverify_and_store_contract()(runtime.rs:1092)verify_and_store_contract()is anasync fnrunning on the tokio async runtime- When the tokio executor schedules this task on a worker thread, that worker thread executes the code
- The
cache.wait()call is a synchronous blocking operation (not async) - This blocks the tokio worker thread until the stretto background threads complete the buffered insert
Impact:
- While one tokio worker thread is blocked waiting for the cache, other tokio worker threads can still process other async tasks
- The blocking duration should be minimal (just waiting for the cache insert buffer to flush)
- However, this is technically blocking async code, which is generally discouraged in tokio
Alternative approach:
If stretto provided an async wait() method (like async fn wait() or returning a future), that would be preferable as it would allow the tokio worker thread to handle other tasks while waiting. But since stretto's Cache is synchronous (not AsyncCache), and the wait time should be very short (just flushing the insert buffer), the current approach is acceptable.
Note: The code uses stretto::Cache (sync), not stretto::AsyncCache. The crate does have both variants available (Cargo.toml enables both "async" and "sync" features).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: while this technically blocks the tokio task, the important context is that this is not a cross-request race condition.
The store_contract() → validate_state() sequence happens within a single PUT request, synchronously. Looking at verify_and_store_contract() in runtime.rs:
// Line 1092: store the contract
self.runtime.contract_store.store_contract(contract)?;
// Line 1105: immediately validate (needs to fetch contract)
let result = self.runtime.validate_state(...)?;The validate_state() call immediately fetches the contract we just stored via prepare_contract_call() → fetch_contract(). Without wait(), the buffered insert hasn't been processed yet, so get() returns None and we get "contract not found".
The window isn't "narrow" - it's essentially guaranteed to fail because it's two consecutive function calls in the same thread, and stretto buffers the insert for background processing.
[AI-assisted - Claude]
Problem
When a contract is stored via PUT, the subsequent GET from the same node can fail with "contract not found" even though both operations happen on the same peer. This is a race condition between storing and fetching the contract.
Root cause: The
strettocaching library (used inContractStore) provides eventual consistency. Inserts are buffered to background threads and may not be immediately visible to subsequentget()calls. From the stretto documentation:The race condition:
store_contract()callscache.insert()for a new contractvalidate_state()immediately callsfetch_contract()to retrieve itfetch_contract()callscache.get()but the insert hasn't been applied yetSolution
Call
wait()after inserting into the cache to ensure the value is visible before returning fromstore_contract(). This adds minimal overhead since we're only blocking until this specific insert completes, not waiting for unrelated cache operations.Testing
cargo test -p freenet store_and_loadpassescontract_store.rsvalidates store-then-fetch works correctlyFixes
Closes #2214
[AI-assisted - Claude]