Skip to content

Requestor pausing probably needs a rearchitect, or should be removed entirely #160

@hannahhoward

Description

@hannahhoward

This module supports pausing and resuming in a number of ways:

  • A responder can pause by calling PauseResponse in an OutgoingBlockHook
  • A responder can pause by calling PauseResponse directly on the Graphsync instance
  • A responder can resume by calling UnpauseResponse in a RequestUpdatedHook
  • A responder can resume by calling UnpauseResponse directly on the Graphsync Instance
  • A requestor can pause by calling PauseRequest on the Graphsync instance
  • A requestor can unpause by calling UnpauseRequest on the Graphsync instance

Of these, the requestor pause/unpause is by far the most complicated to implement, and produces unpredictable behavior.

Graphsync is designed to operate in an untrusted environment, and as such, responders can't simply accept commands from requestors to pause at any time (I could DDOS a respondering by simply telling them to respond to requests I kept pausing till they held too much memory for all my requests)

I explored a number of ways to implement this, and eventually settled on a requestor dealing with pause/unpause by simply cancelling the request and sending it again with a do-no-send-cids extension.

There are a number of problems with this:

  1. It adds significant complexity to the requestor implementation, as we have to track all CIDS sent.
  2. It leads to unpredictable behavior -- as soon as we pause on the requestor side, we stop recording the CIDs we receive. However, the responder may send more CIDs before it gets the cancel message. This means upon unpause, those CIDs are sent a second time. This leads to On resume graphsync occasionally enqueues the same outgoing block twice for a single file #158.
  3. The PauseRequest/UnpauseRequest do something very different at the protocol level than what they're named. Where the pausing on the responder has a protocol level responder code, pausing in the requestor is not a concept covered by the protocol. The methods implement a Cancel/Restart more than a pause/unpause (but still with the same request ID -- so the calling module doesn't know that we've actually done this inside the module -- oy).
  4. This was all done before go-data-transfer had implemented restarts. Since go-data-transfer implements restarts by tracking CIDs and using do-not-send-cids, this means we're tracking CIDs twice (go-graphsync doesn't track CIDs to disk, and it should combine it's internal set with an external set passed by go-data-transfer correctly, but it's still kind of bizarre behavior)

Pause/Unpause is part of the protocol on the responder side. There is a response code that indicates a response has been paused, and a mechanism for the client to ask the responder to unpause. It makes sense to support pause/unpause on the responder side.

However, I think that pause/unpause for the requestor should not be part of go-graphsync. We should enable primitives to do do this via higher level code:

  • Cancelling requests (already supported by cancelling the context on the call to Request)
  • Sending arbitrary extensions in request updates - i.e.SendExtensionData(RequestID, ...ExtensionData)
  • Possibly add the ability to pause a response as well as unpause a response in a RequestUpdatedHook

This enables a few ways you might implement in go-data-transfer:

  • Implement Pause/Unpause for the requesting side via sending an extension through graphsync and reading it in a request updated hook and pausing there... meaning ultimately the responder pauses/unpauses
  • Implement Pause/Unpause via simply communicating on data transfer protocol and then pausing by calling PauseResponse on the graphsync instance directly.

Note: this implementations may still require a fair amount of complexity, as any pause initiated on the requesting side must account for the responding side sending more data before it receives the pause request.

Alternatively, we can try to develop a pause /unpause request at the protocol level in go-graphsync so that we can at least more clearly define expected behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium: Good to have, but can wait until someone steps upeffort/weeksEstimated to take multiple weeksexp/wizardExtensive knowledge (implications, ramifications) required

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions