-
-
Notifications
You must be signed in to change notification settings - Fork 105
fix: make get operation resilient to local caching failures #2020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the GET operation's local caching logic by adding a state comparison check before attempting to cache received contracts. The change addresses issue #2018 by preventing version validation errors in contracts with strict version ordering requirements.
Key changes:
- Adds a local state check before caching to detect if the network state already matches local state
- Treats local caching failures as non-fatal warnings instead of operation errors
- Simplifies error handling by removing the complex forward-error-to-requester logic
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
672da6f to
9cafbea
Compare
Fixes #2018 ## Problem GET operations were failing when attempting to cache contracts that implement version-based state validation (like web-container-contract), even though the GET successfully retrieved the data from the network. ### Two Issues Fixed 1. **Redundant caching attempts**: When GET retrieved a contract from the network that was already cached locally with identical state, it would still attempt to cache via PutQuery. This triggered version validation in contracts that reject equal versions. 2. **Caching failure treated as critical error**: When local caching failed, the entire GET operation would fail and return an error to the client, even though the data was successfully retrieved from the network. Caching is an optimization for DHT seeding, not critical to fulfilling the GET request. ## Solution ### Layer 1: Prevention Before attempting PutQuery, check if local cached state matches the incoming state. If identical, skip caching entirely and just mark the contract as seeded if needed. ### Layer 2: Resilience If PutQuery fails for any reason, log a warning but continue with the GET operation. Return the successfully retrieved data to the client instead of treating the optional caching step as a critical error. ## Changes - `crates/core/src/operations/get.rs`: Modified GET operation to: - Query local state before attempting cache via PutQuery - Skip redundant caching when states match - Handle PutQuery failures gracefully by logging warning instead of failing the entire GET operation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
9cafbea to
f3c17a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reproducible in a regression test?
|
@iduartgomez - Good question about regression tests! Regression Test StatusShort answer: Not yet reproducible in automated tests, but this is a known systemic issue. The Testing ProblemThis bug should have been caught by existing integration tests like:
However, these tests are all currently ignored/disabled. See Issue #2021 for full audit. Why Tests Are DisabledThe ignore reasons reveal the underlying problems:
The tests were disabled because they exposed real bugs (like #2018), but the bugs weren't fixed - the tests were just ignored instead. Test Coverage PlanIssue #2021 tracks fixing all ignored tests. The plan:
This PRThis fix addresses one of the core issues preventing tests from working:
RecommendationMerge this PR to fix the immediate bug, then work on #2021 to restore proper test coverage. @sanity - correct me if I've mischaracterized the testing situation. [AI-assisted debugging and comment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure those tests specifically test this, I would recommend we add a test which specifically tests what is being fixed in this PR, so add that to the backlog.
Let's not block anyway.
|
@iduartgomez - Agreed! I'll create a backlog issue for a specific regression test. The test should verify:
I started working on an integration test but hit complexity with the test structure. The proper test should:
Will create a backlog issue tracking this specific test. [AI-assisted debugging and comment] |
Summary
Fixes #2018
GET operations should not fail when local caching fails. The data was successfully retrieved from the network - caching is an optimization, not a requirement.
Problem
Two related issues caused GET operations to fail unnecessarily:
Redundant caching attempts: When GET retrieved a contract that was already cached locally with identical state, it would still attempt PutQuery, triggering version validation errors in contracts that enforce strict version ordering (like web-container-contract).
Caching treated as critical: When PutQuery failed, the entire GET operation would fail and return an error to the client, even though the data was successfully retrieved.
Solution
Layer 1: Prevention
Before attempting PutQuery, query local state and compare. If states match, skip caching entirely (just mark as seeded if needed).
Layer 2: Resilience
If PutQuery fails for any reason, log a warning but continue - return the successfully retrieved data instead of failing the GET.
Changes
Modified
crates/core/src/operations/get.rs:970-1081:Testing
Why This Fix is Correct
The GET operation's primary job is to retrieve data. Local caching is a DHT optimization that should never cause a GET to fail when the retrieval succeeded.
[AI-assisted debugging and comment]
🤖 Generated with Claude Code