Correlated request/response data API#93
Conversation
tombentley
left a comment
There was a problem hiding this comment.
Thanks Rob.
I left a few comments.
I also have a slightly larger point:
Currently we assume all filters are stateful (in the sense of keeping conversational state in their fields). We've sometimes talked about the value of being able to declare a filter as being stateless. And when we knew a filter was stateless we would be able to use a @Stateless (in the Netty sense) version of FilterHandler to invoke it. Having those mechanics would mean stateless filters would require constant memory, rather than memory proportional to the number of connections.
One thing compromise we might be making by doing what is proposed is that we have no way of knowing whether a filter will actually use this new mechanism or not. The proposal compromises the type safety of the builder API a bit for supporting such explicitly stateless filters in the future: Such filters would have to not use this API (in addition to not having any fields and so on). From a purely API-focussed PoV it would be a little bit nicer if we could have used static typing to avoid stateless filters having access to request builders with these stateful methods at all.
I don't really think that's feasible with the current filter API, however. Therefore what you're proposing here is OK. But it might be a good idea to call out this tradeoff explicitly in the proposal text.
|
|
||
| ### Implementation Considerations | ||
| * **Scoping:** The data is strictly scoped to the filter instance that created it. It is not shared with other filters in the chain to avoid implicit coupling. | ||
| * **Lifecycle Management:** The framework becomes responsible for "popping" the data. If a request is a zero-ack Produce request, the framework will proactively prevent or discard the stashed data to prevent leaks. If the Response Filter does not access the data, the framework will handle discarding the stashed data. |
There was a problem hiding this comment.
proactively prevent or discard
which is it?
There was a problem hiding this comment.
I've updated it to discard. If we made it eagerly throw an exception on trying to stash data for a zero-ack Produce request, then the author (presumably unaware of zero-ack's special-ness) might not encounter this problem until they've put their Filter out there in the world. It feels like discarding will cause less heartache.
| ```java | ||
| return context.requestFilterResultBuilder() | ||
| .forwardRequest(header, request) | ||
| .withContextualData(myStateObject) |
There was a problem hiding this comment.
I like the fluent API, but I'm not keen on the method name. I don't think a typical developer, even one with some knowledge of the Kafka protocol, would know what "context" meant without looking at the javadoc.
Using the word "correlation" or "correlated" would at least be a hint that this was something to do with correlationId. So perhaps something like withCorrelatedData is better?
But even that doesn't really say what it does clearly. However, that following that path suggests something like savingStateForResponse, which seems like a bit of a mouthful.
There was a problem hiding this comment.
Have updated to correlated language
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
6cbf6f0 to
0aef499
Compare
tombentley
left a comment
There was a problem hiding this comment.
Thanks @robobario, I think this looks good.
Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com> Signed-off-by: Robert Young <robertyoungnz@gmail.com>
| ## Motivation | ||
|
|
||
| Manual state management for correlated data is error-prone and introduces several risks: | ||
| * **Memory Leaks:** If a response never returns or a filter fails to "pop" the data, the internal map grows indefinitely. |
There was a problem hiding this comment.
Or even the io.kroxylicious.proxy.filter.ResponseFilter#shouldHandleResponse could even say no
|
|
||
| ### Implementation Considerations | ||
| * **Scoping:** The data is strictly scoped to the filter instance that created it. It is not shared with other filters in the chain to avoid implicit coupling. | ||
| * **Lifecycle Management:** The framework becomes responsible for "popping" the data. If a request is a zero-ack Produce request, the framework will proactively discard the stashed data to prevent leaks. If the Response Filter does not access the data, the framework will handle discarding the stashed data. |
There was a problem hiding this comment.
I'm happy with this approach, but did you consider making it an error to try to store state in a zero-ack produce-request. should it be a rejected alternative?
There was a problem hiding this comment.
I think that's a bad idea because when a filter developer forgets to gets the acks=0 case (and we've been guilty of that ourselves!) they release something which will break with an exception.
But there is a deeper question there: We're assuming that this state that we're keeping around for the response is only really using memory. But what if there are other resources in there, that are invisible to the runtime. Like a socket/connection/file descriptor? So I wonder if we should require that the data they associate should actually be Closable. That way we can guarantee that the runtime will fully clean it up even if the filter developer has forgotten that acks=0 is a thing.
Wdyt @robobario ?
| ### Implementation Considerations | ||
| * **Scoping:** The data is strictly scoped to the filter instance that created it. It is not shared with other filters in the chain to avoid implicit coupling. | ||
| * **Lifecycle Management:** The framework becomes responsible for "popping" the data. If a request is a zero-ack Produce request, the framework will proactively discard the stashed data to prevent leaks. If the Response Filter does not access the data, the framework will handle discarding the stashed data. | ||
| * **Data Structure:** The API will support a single state object. Filter authors can define a custom `Record` or `POJO` to hold multiple values, a complex Key-Value map is deemed unnecessary. |
There was a problem hiding this comment.
Do we want to say anything about stashing the request itself? A filter may stash the request, but there is no guarantee that request will carry any modifications made by filters upsteam of this one.
| **Response Path:** | ||
| The context made available to the response filter will provide access to the stashed object: | ||
| ```java | ||
| Optional<MyStateObject> data = context.correlatedData(); |
There was a problem hiding this comment.
what happens if the request filter calls this method? is it an error?
k-wall
left a comment
There was a problem hiding this comment.
couple of comments, but otherwise LGTM
Adds a proposal for a framework API to enable Filters to stash some Contextual data for a Request, that it can then access later when processing the Response.