-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete TransportCallbacks
and use RequestHandler
trait instead
#992
Delete TransportCallbacks
and use RequestHandler
trait instead
#992
Conversation
See private-attribution#987 for motivation. I had to decide whether I want to use dynamic dispatch vs clunky HTTP interfaces with another generic parameter propagated through the entire stack. I don't have a conslusive answer which way is better, both have significant downsides. Problems with DD approach that is proposed in this change: * Hard to keep `RequestHandler` trait object safe. No generics for `handle` method, use of `async_trait` etc. That removes the opportunity for some optimizations, namely using a trait to pass data down to the handler. It could be better if HTTP layer just passes the same structs it gets from HTTP layer without an extra conversion that must occur if dynamic dispatch is used. * Non zero-cost abstraction. To get data back from the handler, we have to use the same format, right now it is JSON but I doubt we can do better than binary serialization, which means more work to get the data out. * `Box<dyn Trait<....` is everywhere now. Problems with static dispatch (I will link a commit) is more code that requires a change. It is also not clear whether we can make it a zero-cost abstraction. It is mentioned in private-attribution#987 but I will reiterate it here that the reason for the intermediate layer data representation (betweeen HTTP and transport) is to support various delivery channels for IPA, that could potentially include something like CF workers. We don't seem to have an opportunity to rely on our network layer being HTTP in the long term.
e05f98e - version of |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #992 +/- ##
========================================
Coverage 89.65% 89.65%
========================================
Files 168 167 -1
Lines 22828 23012 +184
========================================
+ Hits 20467 20632 +165
- Misses 2361 2380 +19 ☔ View full report in Codecov by Sentry. |
/// ## Performance | ||
/// This implementation is far from being optimal. Between HTTP and transport layer, there exists | ||
/// one round of serialization and deserialization to properly represent the types. It is not critical | ||
/// to address, because MPC helpers have to handle a constant number of requests per query. Note | ||
/// that all requests tagged with [`crate::helpers::transport::RouteId::Records`] are not routed | ||
/// through [`RequestHandler`], so there is no penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this situation might improve somewhat if we stopped using axum. At somewhat increased risk of returning an invalid response, we can just give the transport layer a Vec<u8>
and a content type, and have it return those without considering whether the bytes are valid for the claimed content type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like to have type safety. One of the challenges I faced before was lack of clear contract between API client and server. Here the situation is aggravated by having extra layers that don't support strong type safety. This is somewhat mitigated by e2e tests but it is easy to miss, if API is never hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely we should have type safety (in the sense of checking requests and responses against a well-defined schema) somewhere on the datapath. But also, somewhere along the path the data has to be in a raw form. What I was trying to suggest is that we think about where the transitions are and which communication layers are dealing with raw form vs. type-safe parsed form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me create an issue for that
self: &Arc<Self>, | ||
mut callbacks: L::Handler, | ||
handler: Option<Box<dyn RequestHandler<Identity = I>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this should be optional in the dynamic case, but required in the static case? (Can we not substitute the panicking handler in the cases where this would otherwise be None
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably need to be consistent and either use PanickingHandler
or Option
. I decided that latter is better suited to indicate that there is no handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that Option
probably makes more sense that PanickingHandler
, although if there's just a few test cases that want to omit a handler or something like that, then maybe better to use a stub in those cases than make everybody else deal with an Option
.
In any case, I don't have a strong opinion either way, I was just curious why it was different than the static case.
Sanitize failure was legit - so glad that we have these. I introduced a leak by having two Arc pointers pointing to each other. Resolving it was not trivial and introduced a bunch of new structs that help manage owning and non-owning ends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'm okay with either the static or dynamic version -- I think the cost of dynamic dispatch here is negligible, but the type parameters in the static version also seem within reason. Either one streamlines the transport implementations.
/// Cleans up the `records_stream` collection after drop to ensure this transport | ||
/// can process the next query even in case of a panic. | ||
struct ClearOnDrop { | ||
/// | ||
/// This implementation is a poor man's safety net and only works because we run | ||
/// one query at a time and don't use query identifiers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue to improve this? Besides the issues you note, it seems like it might belong in the app or query processor rather than in (a particular implementation of) the transport layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure where I want to put this - it feels like transport internal state must be self-managed. We currently share the same transport instance across all queries and therefore having a need to clean up the internal state. As anything can panic at any moment, this state management is repetitive and prone to errors.
Maybe a better model would create a transport per query and then any panic will cause thread to abort, gateway and request handler to be dropped and transport destroyed.
/// ## Performance | ||
/// This implementation is far from being optimal. Between HTTP and transport layer, there exists | ||
/// one round of serialization and deserialization to properly represent the types. It is not critical | ||
/// to address, because MPC helpers have to handle a constant number of requests per query. Note | ||
/// that all requests tagged with [`crate::helpers::transport::RouteId::Records`] are not routed | ||
/// through [`RequestHandler`], so there is no penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely we should have type safety (in the sense of checking requests and responses against a well-defined schema) somewhere on the datapath. But also, somewhere along the path the data has to be in a raw form. What I was trying to suggest is that we think about where the transitions are and which communication layers are dealing with raw form vs. type-safe parsed form.
self: &Arc<Self>, | ||
mut callbacks: L::Handler, | ||
handler: Option<Box<dyn RequestHandler<Identity = I>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that Option
probably makes more sense that PanickingHandler
, although if there's just a few test cases that want to omit a handler or something like that, then maybe better to use a stub in those cases than make everybody else deal with an Option
.
In any case, I don't have a strong opinion either way, I was just curious why it was different than the static case.
See #987 for motivation. I had to decide whether I want to use dynamic dispatch vs clunky HTTP interfaces with another generic parameter propagated through the entire stack. I don't have a conslusive answer which way is better, both have significant downsides.
Problems with DD approach that is proposed in this change:
RequestHandler
trait object safe. No generics forhandle
method, use ofasync_trait
etc. That removes the opportunity for some optimizations, namely using a trait to pass data down to the handler. It could be better if HTTP layer just passes the same structs it gets from HTTP layer without an extra conversion that must occur if dynamic dispatch is used.Box<dyn Trait<....
is everywhere now.Problems with static dispatch (I will link a commit) is more code that requires a change. It is also not clear whether we can make it a zero-cost abstraction.
It is mentioned in #987 but I will reiterate it here that the reason for the intermediate layer data representation (betweeen HTTP and transport) is to support various delivery channels for IPA, that could potentially include something like CF workers. We don't seem to have an opportunity to rely on our network layer being HTTP in the long term.