dist_ffi fixes and receive error handling #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I figured it'd be a good time to go to the basics and (re-)consider our network / FFI model. In particular I wanted to make it so that
Receive
will actually returnErr=true
sometimes, namely when the socket is gone.But then I realized that this doesn't make sense since when the socket is gone, we try to reconnect! So there's a discussion we need to have about which behavior we want dist_ffi to have. Should a channel created via
Connect
transparently reconnect? If yes, shouldReceive
on such a channel ever return an error (and when)? Also when (if at all) shouldReceive
on a channel created viaListen
return an error?From a uRPC perspective, it is quite nice that
Send
reconnects transparently and we anyway could not do much with the information thatReceive
failed. So maybe we should renameErr
inReceiveRet
toTimeout
and set some arbitrary timeout (60s) just to keep the operation fallible (which is required from a formal perspective), but not even attempt to actually provide information about network errors on theReceive
side?Also I realized the current "transparent reconnect" code is buggy -- or rather, it is racy; there could be other
Send
operations going on while we reconnect. So I added a lock tosender
, which I then also use to avoid having to copy thedata
unnecessarily.