-
-
Couldn't load subscription status.
- Fork 106
Open
Labels
A-networkingArea: Networking, ring protocol, peer discoveryArea: Networking, ring protocol, peer discoveryE-hardExperience needed to fix/implement: Hard / a lotExperience needed to fix/implement: Hard / a lotP-highHigh priorityHigh priorityT-bugType: Something is brokenType: Something is broken
Description
Summary
PUT operations with subscribe:true still complete before the follow-on subscribe request is guaranteed to succeed. The race shows up as clients receiving PutResponse and moving on to UPDATE or waiting for notifications long before the network subscription is actually established (or has even started). PR #1767 patched around this with sleeps, but Nacho already pointed out we really need to treat PUT+subscribe as a single committed transaction.
Requirements
- Treat the auto-subscribe that hangs off a PUT as part of the same logical transaction/operation so that the client only receives
PutResponseafter the subscribe succeeds (or fails deterministically). - If the subscribe fails, surface that error in the same response path instead of silently continuing, so the client can alert the user or retry.
- Update the RequestRouter tracking so PUT+subscribe are grouped under one transaction ID instead of two loosely-related ops.
- Ensure the solution plays nicely with recent router and subscription fixes (fix: Critical subscription routing fixes (Phases 1-3) #1854, Fix request router deduplication race with PUT operations (issue #1886) #1891) and the proximity cache work (feat: proximity-based update forwarding #1853).
- Provide regression tests that would fail without the atomic behaviour (e.g. immediate UPDATE after PUT with
subscribe:true, and notification assertions instead of sleeps).
Open Questions
- Should PUT wait synchronously for the subscribe to complete, or should we change the API response to include an explicit subscribe status/result so clients can react without blocking the node’s op loop?
- How do we want to handle partial failures if PUT succeeds but subscribe cannot (e.g. no peers available)? Should we roll back the PUT or surface a structured warning?
References
- Fix subscribe:true flag for PUT operations #1767 Draft fix that highlights the timing gap.
- Comment from @iduartgomez on the race concern: Fix subscribe:true flag for PUT operations #1767 (comment).
Metadata
Metadata
Assignees
Labels
A-networkingArea: Networking, ring protocol, peer discoveryArea: Networking, ring protocol, peer discoveryE-hardExperience needed to fix/implement: Hard / a lotExperience needed to fix/implement: Hard / a lotP-highHigh priorityHigh priorityT-bugType: Something is brokenType: Something is broken
Type
Projects
Status
Triage