Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design feedback on lazy_load_blob #219

Open
manuel opened this issue Jan 30, 2024 · 14 comments
Open

Design feedback on lazy_load_blob #219

manuel opened this issue Jan 30, 2024 · 14 comments

Comments

@manuel
Copy link

manuel commented Jan 30, 2024

I think the idea of enabling middleware to just receive the head of a request, and forward it, and only have the final recipient return a response (link) is a great idea (out of curiosity, did you come up with that or did you get it from somewhere else?)

However, I see an issue with the current implementation: AFAICT, only the last lazy_load_blob received by a process is remembered by the kernel. So what if the middleware needs to make another request before forwarding that lazy_load_blob to the next destination, e.g. to fetch some info?

@dr-frmr
Copy link
Member

dr-frmr commented Jan 30, 2024

Thank you for the feedback! I'm sure others have put together similar structures, however we were inspired to add this functionality purely via benchmarking various Wasm experiments and observing the bottleneck often being moving bytes into a process.

I think you're right that the current implementation is a bit unfinished. We've got a lot of work left to do before 1.0...

Looking at the kernel logic around process state, one can see we have a prompting_message and last_blob. Something I have been very interested in doing is restructuring the contexts map, which keeps track of outstanding requests, to also manage these two fields. I think it could be very intuitive to have a nesting logic where total context (including the lazy_load_blob) is popped off a stack as you receive responses.

The issue I'm having so far is that responses themselves can also contain blobs, of course, and I haven't figured out a good way for the programmer to "traverse" a "stack" of blobs/prompting_messages.

@manuel
Copy link
Author

manuel commented Jan 30, 2024

Each request could have a unique ID, and you could fetch a particular lazy_load_blob via that ID?

@dr-frmr
Copy link
Member

dr-frmr commented Jan 30, 2024

We have unique IDs for every message, but currently don't expose that to the process. The thought is that processes should use state of some kind to manage their data rather than keep track of a bunch of request IDs. I'm hesitant to change this property because it's been nice, so far, to treat messages as nothing more, and build state machines that react to them.

With regards to fetching, we've so far been able to avoid the issue of memory leakage by only tracking outgoing requests that expect a response (which must be associated with a timeout). If we added functionality to retrieve arbitrary data from past messages, wouldn't we need to start baking in some logic around expiry, which becomes difficult to mentally model as a process developer?

@dr-frmr
Copy link
Member

dr-frmr commented Jan 30, 2024

So what if the middleware needs to make another request before forwarding that lazy_load_blob to the next destination, e.g. to fetch some info?

One direct solution to this issue would be to hold on to the last blob that gets received as long as no other blob comes in (most messages don't contain them). That way, you as a developer can assemble a protocol where the middleware receives a request with a blob, calls out to the forwarding destination, and gets a response, then is still able to attach the original blob because the response from the forwarding destination doesn't have one, so the original one was never discarded. This would be a very simple change to the kernel. The only downside I can think of right now is that working with an unfamiliar protocol might cause surprises where a blob gets introduced unexpectedly.

Of course, this doesn't work if your forwarding destination presents an unfriendly protocol. You can always bite the bullet and ingest the original blob, though...

@nick1udwig
Copy link
Contributor

nick1udwig commented Jan 30, 2024

So this most recent proposal of hanging onto the last blob is similar to something I was considering recently: holding onto last prompting_message until one with er != None or rsvp != None comes in. The downside here is that it can lead to unexpected behaviors if you have a lot of er == None && rsvp == None Requests, because I have this very long-lived state that is no longer relevant to me.

The "one-only blob" is really nice for expiry, as you say above.

@nick1udwig
Copy link
Contributor

One way you can do the specific case asked about above right now is:

  1. Middleware A receives msg with blob; final destination is Process F
  2. Middleware A makes request to Process B to get add'l info; it attached blob via inherit = true
  3. Middleware A receives response from Process B which also inherit = true to pass the same blob back along with the information
  4. Middleware A passes info + blob via inherit = true to Process F

I think in our system as currently implemented this will involve a fair amount of copying that blob around; in principle it should involve no copying of the blob and so should be efficient.

In practice, this is p fragile and annoying and requires that Process B be controlled by the dev of Middleware A since it has to cooperate to pass the blob back.

I think the general question still stands and it would be nice if this were easier.

@manuel
Copy link
Author

manuel commented Jan 30, 2024

One idea would be to have some sort of stack discipline: when making the additional request, the middleware tells the kernel "this new request is nested within the current request [the one with the lazy_load_blob]".

So the kernel knows that that lazy_load_blob must be kept around until the response to the nested request arrives (or a timeout occurs).

@dr-frmr
Copy link
Member

dr-frmr commented Jan 30, 2024

Yeah nesting is good. We have a field called inherit that marks whether a message "inherits" the ID, context, and blob of the previous message. We could alter the logic of inheritance to build a stack (would just look like stacking instead of replacing when you inherit and assign a blob at the same time?).. then we just need a way to "pop" from it to consume the stack...

@dr-frmr
Copy link
Member

dr-frmr commented Jan 31, 2024

Man, I wrote a long thing out here that I liked, but closed the window and lost the text. The second time I write it will be better anyways...

@dr-frmr
Copy link
Member

dr-frmr commented Feb 2, 2024

Looking at all the logic below, which I've drafted a few times, I'm convincing myself not to add a stack. It's a layer of mental overhead we've been very productive without, so far. I'd like to explore other avenues for performance, rather than introduce this complexity just for the sake of blobs. They are cool, but the simplicity of handling a message stream directly is even cooler. Lastly, we can address the specific use-case in discussion here (and others) with @hosted-fornet's idea to "hang on to the last blob". We should definitely not toss away our single-frame until it gets replaced by something newer.

To be clear I think there are many stack-based solutions that can work, I'd just prefer to have processes handle this task in-house until we're certain that the kernel should provide such a thing. The performance cost can be addressed in entirely different ways in the future--we've been super lazy about optimizing around the Wasm boundary, preferring to get the userspace semantics right first...


We can add a stack for incoming messages. The main question is how to, as a process, choose between operating on incoming responses or the current "outstanding request" stack frame when inheriting blobs and context.

You must always be able to respond to all requests that expect a response. I'm happy with enforcing that they must be responded to in LIFO order. In general I think it's optimal for processes to respond to requests as they are received, and although the stack has utility for advanced use-cases, the majority of processes will only ever require the most recent message. Since processes are usually stateful, they store any necessary information about historical messages in their state rather than in their stack.

Incoming requests are added to the stack. Each request in the stack retains all of its information including blob. When a response is sent, requests are popped off the stack until one expects_response. The response is paired to that request, which is also removed. If the incoming request stack is empty, or does not contain any which expect a response, the response is dropped (matches current behavior).

Incoming responses are not added to the stack. Instead, they are matched against the outstanding request table. This is the current machinery in place and it works well.

The only issue now is inheritance in the case where a response has just been received and there are request(s) in the incoming request stack. The next message sent could either inherit the blob from the response, or the stack.

I think this is best resolved by having get_blob() act as a pop instead of just a fetch, but only on responses. Meaning: when you call get_blob(), if your most recent inbound message was a response, you get its Option<LazyLoadBlob>. But if you call it again, you fetch from top of request stack.

@manuel
Copy link
Author

manuel commented Feb 2, 2024

I think it's very powerful for a process to be able to respond to requests in an arbitrary order. I wouldn't easily give that up. (And you probably mean FIFO? The first received request must be the first for which a response is sent?)

Here's how I would do it:

If a request Req1 comes in, the process may make another request Req2, and mark that as nested within Req1.

For the kernel, this means, it will keep Req1 around (including its lazy_load_blob) until a response for Req2 is received or a timeout occurs. (This is the "stack" idea, but it's not really a stack - it never grows beyond one remembered request for a nested request.)

Any other requests or responses that are received in the meantime can be handled by the process as it wishes. It may make other nested requests for received requests - the kernel will keep them around, too.

Once the response for Req2 is received, the process may now retrieve the lazy_load_blob from Req1 or forward it (inherit).

But what if a middleware needs to make 2 or more requests before forwarding a request? It could be possible to send a nested request also on receipt of a response. So when the response for Req2 is received, the process can make another nested request Req3 for Req1, and the kernel will continue to keep it around, until a response for Req3 is received. This way, a process may make any number of requests (serially) during the handling of an original request.

In summary, there is not really a stack. A request will be kept around if during handling it, the process makes a nested request. It will also be kept around if the process makes another nested request during handling of a response to a nested request.

Of course, I am not familiar with your code, but this sounds quite simple and powerful to me.

The kernel would need a map request ID -> request for any requests for which a nested request was made and a second map request ID -> timestamp to handle timeouts.

@dr-frmr
Copy link
Member

dr-frmr commented Feb 2, 2024

Ah yes I meant FIFO. Thanks for explaining the depth=1 nature of the "stack" idea here, it makes things easier for sure.

I'm wondering if it's possible for the kernel to nest automatically rather than make the process "mark" nested requests. If you receive a request that expects a response, is every subsequent message you send, until you make a response, nested? You can only ever issue a response to the last request you received--FIFO.

process calls `receive()`

1. process receives request(A)
    the context of the process is now entirely centered around (A), meaning that's the blob fetched, request ID etc
         a. process sends a response
            this response is issued to request(A), clearing it from kernel.
         b. process sends a request(B)
            request(A) is saved as it nests, and will be retained
            (A) is still the context until `receive()` gets called
            (A) will become the nested context if response(B) is received (see below)
         c. process does nothing (calls `receive()` again without sending any messages)
            request(A) is dropped from kernel and forgotten
                
2. process receives response(Z)
     the context of the process is centered around (Z), but if (Z) is nested under a request(Y), context(Y) is accessible
         a. process sends a response
             this response is issued to request(Y) if it exists and is dropped otherwise
         b. process sends a request(X)
             if request(Y) exists, it is saved as the nested request
             response(Z) is forgotten
         c. process does nothing until next `receive()`
             response(Z) and request(Y), if it exists, are both forgotten

Mostly writing this out for my own understanding--it seems good to me.
Now the last "problem", in case 2: if request(Y) exists, it's nested under (Z), which means calling get_blob() will give you (Z)'s blob. That means we need a new function to get_nested_blob() -- same function signature, returns option<lazy_load_blob>. Could this be the solution? Will this "just work"?

edit: Is it problematic that in case 2b, request(X) always nests under (Y)? Is there a case where one wants to discard (Y) while sending (X)? I think this may not be an issue at all, since if you loop through that case multiple times, the nesting isn't "adding up"--we're only ever holding onto 1 layer, which is fine. As @hosted-fornet has noted previously though there is a slight concern that the developer "forgets" where a nested request came from, but you have to intentionally access it anyways...

edit2: This whole thing has to interact with the inherit behavior properly. The goal of inherit is to be able to send a request, while having the response be targeted directly to the request previously.

process calls `receive()`

1. process receives request(A)
    the context of the process is now entirely centered around (A), meaning that's the blob fetched, request ID etc
         a. process sends a response
            this response is issued to request(A), clearing it from kernel.
         b. process sends a request(B)
            request(A) is saved as it nests, and will be retained
            (A) is still the context until `receive()` gets called
            (A) will become the nested context if response(B) is received (see below)
            **if request(B).inherit=true, the response is routed directly as a response to (A), skipping this process**
         c. process does nothing (calls `receive()` again without sending any messages)
            request(A) is dropped from kernel and forgotten
                
2. process receives response(Z)
     the context of the process is centered around (Z), but if (Z) is nested under a request(Y), context(Y) is accessible
         a. process sends a response
             this response is issued to request(Y) if it exists and is dropped otherwise
         b. process sends a request(X)
             if request(Y) exists, it is saved as the nested request
             response(Z) is forgotten
             **if request(X).inherit=true, the response is routed directly as a response to (Y) if it exists**
         c. process does nothing until next `receive()`
             response(Z) and request(Y), if it exists, are both forgotten

@dr-frmr
Copy link
Member

dr-frmr commented Feb 6, 2024

I'm implementing the above now and (perhaps selfishly) wondering if get_nested_blob() even makes sense to add. If you want to seamlessly inherit it, that works fine as described above. If you actually want to ingest it, it's not unreasonable to tell the developer that they should ingest it when they first get the request.

@dr-frmr
Copy link
Member

dr-frmr commented Feb 6, 2024

Another thing that's come up during implementation is that it's not possible to inherit a blob from a response.

The previous design allowed it, because it just used whatever message was received last. But if you properly leverage request inheritance, meaning that responses are always routed exactly where they are intended to end up (at the originating request) it should be the case that responses should always go to their final destination, meaning that it would never make sense to inherit a blob from one anyways.

I think the overarching trade-off here is one of simplicity vs. optionality. Of course we could enable more general use by adding more fields, flags, etc, but it seems like we're already at the boundary where more choice would lead to an overcomplicated mental model... I'm inclined to say that more complex protocols should achieve their complexity by storing more information in their request and response bodies.

EDIT: No, being able to inherit response blobs is extremely useful, actually. There's a fairly common case that we already use in practice, where you receive a request, and in the course of responding to it, send other request(s), receive response(s), and want to inherit a response-blob for your final response, or even another intermediate request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants