-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxy: rebind services on connect errors #952
Conversation
proxy/src/bind.rs
Outdated
// try to connect again. | ||
match ready { | ||
Err(ReconnectError::Connect(err)) => { | ||
error!("connect error to {:?}: {}", self.endpoint, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'd rather more loudly log when connect errors occur, this seems to trigger several times during the test, because of how many times it retries before the metrics scrape noticed the TCP connection event.
This could be reduced from an error!
to warn!
, or lower... and/or we could also apply some backoff here so as to not loop back immediately, but after a second or something. (Currently, this yields to the executor, so any other work should be polled again first, and then this will get polled again...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this could be quite verbose/noisy in some cases. An easyish solution would be to only log once per consecutive error. something like
match ready {
Ok(..) => {
self.logged_err = false;
}
Err(ReconnectError::Connect(err)) => {
if !self.logged_err {
warn!(...);
self.logged_err = true;
}
}
proxy/src/bind.rs
Outdated
@@ -217,16 +229,20 @@ where | |||
Reconnect::new(proxy) | |||
} | |||
|
|||
pub fn new_binding(&self, ep: &Endpoint, protocol: &Protocol) -> Binding<B> { | |||
if protocol.can_reuse_clients() { | |||
pub fn new_bound_service(&self, ep: &Endpoint, protocol: &Protocol) -> BoundService<B> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIOLI: this method was called bind_service
before being renamed to new_bound_service
, perhaps just change it back?
/// - If there is an error in the inner service (such as a connect error), we | ||
/// need to throw it away and bind a new service. | ||
pub struct BoundService<B: tower_h2::Body + 'static> { | ||
bind: Bind<Arc<ctx::Proxy>, B>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to me that, since Bind::new_bound_service
takes &self
, this could be
pub struct BoundService<'a, B: tower_h2::Body + 'static> {
bind: &'a Bind<Arc<ctx::Proxy>, B>,
// ...
}
and then we wouldn't have to clone in new_bound_service
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't, we need to return a static Service
, since it will live separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yeah, you're right, nevermind!
proxy/src/bind.rs
Outdated
@@ -90,7 +102,7 @@ pub struct NormalizeUri<S> { | |||
inner: S | |||
} | |||
|
|||
pub type Service<B> = Binding<B>; | |||
pub type Service<B> = BoundService<B>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this type alias still necessary? It was originally added to shorten a really long return type for what's now called "Stack<B>
".
Consider either just renaming BoundService<B>
to Service<B>
and removing the type alias, or changing all consumers of this API to refer to BoundService<B>
(which seems clearer IMHO) and removing the type alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on trying this out, I found that other places using types from this module preferred the prefix, bind::Protocol
, bind::HttpRequest
, etc. So, seeing bind::Service
actually felt clear.
/// | ||
/// # TODO | ||
/// | ||
/// Buffering is currently unbounded and does not apply timeouts. This must be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for removing this out of date comment! 👍
proxy/src/bind.rs
Outdated
// try to connect again. | ||
match ready { | ||
Err(ReconnectError::Connect(err)) => { | ||
error!("connect error to {:?}: {}", self.endpoint, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this could be quite verbose/noisy in some cases. An easyish solution would be to only log once per consecutive error. something like
match ready {
Ok(..) => {
self.logged_err = false;
}
Err(ReconnectError::Connect(err)) => {
if !self.logged_err {
warn!(...);
self.logged_err = true;
}
}
// whoever owns this service will call `poll_ready` if they | ||
// are still interested. | ||
task::current().notify(); | ||
Ok(Async::NotReady) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the merits of doing this versus looping with the new state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make that clear in the comment, so it seems I should add to it.
If we loop in here, then we will be eagerly setting up a new connection, even if the buffer wrapping this service determines that it's queue of requests have run out. This instead allows the buffer to determine if it should loop or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that first sentence in the comment just needs to be rearranged a bit
// This service isn't ready yet. Instead of trying to make it ready,
// schedule the task for notification so that the caller can
// determine whether readiness is still necessary (i.e. whether
// there are still requests to be sent).
... or something like that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me!
7328f1f
to
87dc131
Compare
Instead of having connect errors destroy all buffered requests, this changes Bind to return a service that can rebind itself when there is a connect error. It won't try to establish the new connection itself, but waits for the buffer to poll again. Combing this with changes in tower-buffer to remove canceled requests from the buffer should mean that we won't loop on connect errors for forever. Signed-off-by: Sean McArthur <sean@seanmonstar.com>
87dc131
to
7a31f59
Compare
Instead of having connect errors destroy all buffered requests, this changes Bind to return a service that can rebind itself when there is a connect error. It won't try to establish the new connection itself, but waits for the buffer to poll again. Combing this with changes in tower-buffer to remove canceled requests from the buffer should mean that we won't loop on connect errors for forever. Signed-off-by: Sean McArthur <sean@seanmonstar.com>
The proxy's integration tests depend on the `net2` crate, which has been deprecated and replaced by `socket2`. Since `net2` is no longer actively maintained, `cargo audit` will warn us about it, so we should replace it with `socket2`. While I was making this change, I was curious why we were manually constructing and binding these sockets at all, rather than just using `tokio::net::TcpListener::bind`. After some archaeology, I determined that this was added in linkerd/linkerd2#952, which added a test that requires a delay between when a socket is _bound_ and when it starts _listening_. `tokio::net::TcpListener::bind` (as well as the `std::net` version) perform these operations together. Since this wasn't obvious from the test code, I went ahead and moved the new `socket2` version of this into a pair of functions, with comments explaining why we didn't just use `tokio::net`. Fixes linkerd/linkerd2#4891
The proxy's integration tests depend on the `net2` crate, which has been deprecated and replaced by `socket2`. Since `net2` is no longer actively maintained, `cargo audit` will warn us about it, so we should replace it with `socket2`. While I was making this change, I was curious why we were manually constructing and binding these sockets at all, rather than just using `tokio::net::TcpListener::bind`. After some archaeology, I determined that this was added in linkerd/linkerd2#952, which added a test that requires a delay between when a socket is _bound_ and when it starts _listening_. `tokio::net::TcpListener::bind` (as well as the `std::net` version) perform these operations together. Since this wasn't obvious from the test code, I went ahead and moved the new `socket2` version of this into a pair of functions, with comments explaining why we didn't just use `tokio::net`. Fixes linkerd/linkerd2#4891
The proxy's integration tests depend on the `net2` crate, which has been deprecated and replaced by `socket2`. Since `net2` is no longer actively maintained, `cargo audit` will warn us about it, so we should replace it with `socket2`. While I was making this change, I was curious why we were manually constructing and binding these sockets at all, rather than just using `tokio::net::TcpListener::bind`. After some archaeology, I determined that this was added in linkerd/linkerd2#952, which added a test that requires a delay between when a socket is _bound_ and when it starts _listening_. `tokio::net::TcpListener::bind` (as well as the `std::net` version) perform these operations together. Since this wasn't obvious from the test code, I went ahead and moved the new `socket2` version of this into a pair of functions, with comments explaining why we didn't just use `tokio::net`. Fixes linkerd/linkerd2#4891
Instead of having connect errors destroy all buffered requests,
this changes Bind to return a service that can rebind itself when
there is a connect error.
It won't try to establish the new connection itself, but waits for
the buffer to poll again. Combing this with changes in tower-buffer
to remove canceled requests from the buffer should mean that we
won't loop on connect errors for forever.
Closes #899