-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement stream abstraction to replace multiple RPC responses #271
Comments
Is the In addition to multiple I don't feel the |
That's my main worry with this approach. |
Personally, I would tend toward a stream interface in place of it. An RPC having multiple responses feels a bit like a function returning multiple times, it can be done for the purpose of coroutines or generators or similar, but it isn't really a procedure call anymore, it would be "create remote iterator" or similar. |
OK, let's not abuse RPC for this. |
Hm, I am seeing some references for RPC streaming, streamed RPC response, etc in some extant RPC implementations, e.g. My only worry about a "stream" abstraction parallel to the RPC abstraction is that I think RPC will end up feeling like just be a degenerate case of a "stream". (I would be fine either way at this point, I'm just trying to further discussion) |
One use case for "streamed responses" above that might be germane to our project is handling large files or blobs of data as a response to an rpc. Instead of sending the entire blob at once, we'd likely want to split it and process each chunk at a time so the entire thing doesn't have to be stored in the memory of the client. (e.g. reading a large file from kvs?) |
FWIW, stream support in particular within mrpc in combination with user-defined reduction/aggregation support will make the system pretty similar to MRNet stream idiom or (windowed) group aggregation in the data analytics world. My point is perhaps, the current single response rpc is one abstraction level. And streaming is another level. Streaming + aggregation, if we do this, the next abstraction level, and each level will map to different use cases? |
Interesting. The three that you list @grondo take a couple of different approaches. The second and third refer to streaming in the sense of repeated callbacks to retrieve successive chunks of a single larger entity until that is exhausted, where the first is more like what I originally was thinking in that it is effectively a way for an RPC to act as a co-routine or generator function, and thus return a stream of distinct individual objects. Either of these could be useful functionality, and since the former is actually the same as calling a resumable function multiple times, and the latter is just a large potentially broken up response, they could be reasonably expressed as RPCs. That said, being able to send multiple responses representing separate objects in reply to one RPC and potentially interacting with other requests from the client, seems like something else, and would be better modeled as a stream or pipe in my opinion. |
@trws, good points. I think I agree. If you consider the full response to an In fact I could imagine many services that you could abstract in this manner. I'm not saying it is right, I'm just pointing out why this line we're drawing caused a bit of confusion for me. I'm probably just getting lost in my own head. Also given that RPC would be a special case of stream, should it be implemented as a stream? |
Maybe this needs a little use-casing. For exec, I would think that a stream abstraction that could be used persistently by each end would be the most natural abstraction since either side might encounter an event that should forward information to the other without a request. In principle it could be made more RPC or REST-ish if the output items are buffered by default and only transmitted on request, but that would be a pretty painful way to handle an interactive connection. If we have multi-part or generator-style RPCs, then single-shot RPCs would probably be handled as a special case, as you note. There would likely be some performance/space advantages to making a single-response specific implementation, but I doubt it would be significant. |
Oh I just had a scary idea. What if one peer could call Depending on the connector selected, these connections could be implemented in different ways ranging from tunneled over the existing overlays to establishing dedicated point to point sockets. This would require some changes to allow a reactor to work with multiple flux_t handles. I think I probably am not thinking of some details, like how does the "stream open" request handler work, but wouldn't that be neat? Once open, if there was a way to reference the stream's identity in an RPC (like a |
I really like the idea of being able to open a connection to a service like that, I'm not entirely sure if it makes sense to have that be a flux_open target however. Would that result in a complete handle that could be used as though one were connected to the local broker? I could see the machinery being almost entirely the same underneath, but |
I was thinking there might be an advantage to allowing the connection method to be flexible (like the two methods mentioned: tunneling, versus new dedicated sockets). The idea needs more careful thought. I didn't mean to derail the discussion on RPC variants though, that is quite interesting. |
Actually, one would need an existing handle to send the requests for opening a new one, so you are right |
@trws - yes I agree with you. A "stream" actually maps pretty well onto exec and actually it would be really nice if the stream handle was passed back to client on each callback. I was mainly arguing as devil's advocate and also trying to put myself in the head of someone stepping up to write a service for the first time, and being confused about which parallel implementation to use. As @garlick mentioned, another interesting use case is If you do implement kvs_watch as a stream, it would be tempting to implement This probably wouldn't happen in reality, but I just wanted to make sure I get the point across. |
Actually, what I guess I'm trying to say is if you do implement stream, what are the benefits of keeping the rpc implementation around if streams have a superset of rpc functionality? Won't this just make more code without any extra functionality? (I'm not saying there definitely is not an arguable reason to keep rpc, I'm really just asking because I seem to have confused myself here) |
The watch interfaces might actually make for a good example. As it is, def callback(key, json_str, arg, errnum):
print('key changed key:', key, 'value:', json_str)
kvs.watch(flux_handle, callback, 'key') This works, and it's very flexible, but it's kind-of a pain in the head to reason about asynchronous programming like that. This, presumably, is why print('key changed key:', key, 'value:',
kvs.watch_once(flux_handle, key)) Having a stream interface would give us additional flexibility at the cost of greater complexity on the client side at least, and if it's a variable two-way stream then the server as well. I think I would still want to be able to do this on the client side just for the sake of clarity: print('rpc reply payload:', f.rpc_send('topic')) I'm betting that with a fully expressive non-blocking stream API that would require at least one, if not multiple, callbacks and at least one extra user-visible handle. |
Sorry I think I'm not making my point. I agree that the stream abstraction will work well for everything we're talking about. I'm not sure, though, why it is difficult to reason about async programming with That being said, I'm used to low-level event based programming with |
I think I must have mis-interpreted, or perhaps misconstrued, the situation. Using Somehow I think we've ended up talking at cross purposes, and may be envisioning different stream APIs, or perhaps just different use-cases. If the stream API is a souped-up RPC with multiple responses, then keeping both and having them be incompatible would probably just be bloat. If the stream API were to be a bi-directional channel for arbitrary asynchronous messages between the two endpoints, then having RPCs would be a useful abstraction, possibly even for use inside of them, or implemented on top of them as a convenience. |
Sorry probably my fault. I was just trying to understand the various perspectives and learn something by asking questions. I understand your point about |
I think this discussion ran its course. We should probably open bugs against RFC 6 for any further discussion on its deficiencies. |
The KVS watch and exec stdio services send multiple responses to a single request (reusing its matchtag). This is not currently allowed by RFC 6 and is not supported by
flux_rpc()
.This or a similar idiom probably needs to be allowed. One thing that is lacking in RFC 6 is a way to cancel an in-progress RPC. A cancellation protocol is desirable for singleton RPC's, even more so for RPC that has an open ended number of responses.
flux_rpc()
would need to be adapted to allow multiple responses (multipleflux_rpc_get()
calls).There probably needs to be a way for the server side to terminate an RPC also.
An alternative is to create a new "stream" abstraction with explicit open/close calls. There are a number of ways this could be implemented, including a queue on the server side where messages (or "items") are fetched with an RPC. Possibly this could be a variant of #250 (named pipe service).
The text was updated successfully, but these errors were encountered: