Skip to content

Conversation

@sanity
Copy link
Collaborator

@sanity sanity commented Nov 15, 2025

Summary

  • Publishing nodes now finalize PUTs immediately after they cache/seed the contract locally, so a SuccessfulPut from downstream peers is treated as best-effort acknowledgement. This fixes River/RPC callers hanging when remote gateways can’t dial the ephemeral callback port.
  • When the original request asked for subscribe=true, spin up the subscription child op right away (unless the parent already failed) so publish flows still attach live updates even without the downstream ack.
  • Add tracing around the “finish early” path to make the new behavior easy to diagnose in logs.
  • Verified live: latest freenet build succeeds (riverctl room create returns immediately), while freenet 0.1.37 reproduces the old timeout/contract-missing errors.

Testing

  • cargo test -p freenet --lib
  • technic: cargo run --bin riverctl -- room create --name "cli-test" --nickname "owner" (with current branch)
  • technic: cargo run --bin riverctl -- room create --name "old-test" --nickname "owner" against freenet 0.1.37 binary (reproduces failure)

@sanity sanity requested a review from iduartgomez November 15, 2025 14:19
origin: origin.clone(),
});
// When we're the origin node we already seeded the contract locally.
// Treat downstream SuccessfulPut messages as best-effort so River is unblocked.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

River is just one application, I dlnt think we need to mention it here

Copy link
Collaborator

@iduartgomez iduartgomez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think best effort for puts is basically lying to the application and is not a good design. If I am doing a put I expect it to be available in the network otherwise I would prefer to get a clear error code which says what is failing.

But that is me.

@sanity
Copy link
Collaborator Author

sanity commented Nov 16, 2025

@iduartgomez I understand the concern about misleading applications, but I think best-effort is the right model here.

The issue: Even if we waited for downstream acks (and even if all of them responded), that still doesn't guarantee retrievability. Those peers could crash, network could partition, etc. The guarantee is weak regardless.

What actually happens: Origin caches locally and sends SeekNode. Propagation continues in background. Waiting for acks doesn't make data more available - it just blocks the caller on network timing.

The semantics: PUT success means "I've accepted your data and started propagation," not "N replicas confirmed and retrievable." Applications that need stronger guarantees should verify via GET or track subscription updates.

Error codes: We still surface real failures (invalid contract, local storage failed). We're just not treating "downstream didn't ack yet" as a failure when propagation is ongoing.

Thoughts?

@iduartgomez
Copy link
Collaborator

I think there are two distinct categories between not eaiting for acks but the network works so at least I could initialize a put and the network is not working at all, but I cache locally and finalize the operation successfully. The second seems misleading to me completely no?

@sanity
Copy link
Collaborator Author

sanity commented Nov 16, 2025

@iduartgomez Good distinction. You're right - those are two different scenarios:

Case 1: Network works, don't wait for acks

  • We find a next hop and send SeekNode
  • Propagation happens, we just don't block waiting for confirmation
  • Returning success is honest - we initiated propagation

Case 2: Network completely down

  • closest_potentially_caching() returns None
  • We cache locally but can't propagate at all
  • Returning success here is misleading ✓ your concern

Proposed fix: Return an error in case 2. When no peers are reachable, fail with something like OpError::NoReachablePeers instead of completing successfully. That way applications know the difference between "propagating" vs "isolated/network broken."

This makes PUT semantics: "I've cached locally AND initiated network propagation" - both must succeed.

Does that address your concern?

@iduartgomez
Copy link
Collaborator

Sounds good.

@sanity sanity enabled auto-merge November 16, 2025 21:34
@sanity sanity added this pull request to the merge queue Nov 16, 2025
Merged via the queue into main with commit 7d64e35 Nov 16, 2025
11 checks passed
@sanity sanity deleted the issue-2086 branch November 16, 2025 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants