pd(oblivious): terminate client sync if it exceeds timeout #2887

erwanor · 2023-08-01T19:08:37Z

This PR is part of a series of performance improvements that we are making to pd. This PR specifically:

terminate the compact_block_range streams if the outbound queue is full for more than 1s.
adds a servel-level timeout of 7s to all request handlers

hdevalence · 2023-08-01T19:10:11Z

Why do we want a server-level timeout for all request handlers? Won't that break the compact block subscription behavior we want?

erwanor · 2023-08-01T19:47:33Z

I think you might be right, although local testing with aggressive timeouts (e.g. 1s) over sync processes that last 10s still work.

My thought process to include this is that when I skimmed over the grpc-timeout code, I noticed that the grpc-timeout middleware operate at the level of a client Request and applies the server-level timeout to the Response corresponding to that request for a stream, but not the subsequent items that the server worker will send:

impl<S, ReqBody> Service<Request<ReqBody>> for GrpcTimeout<S>
where
    S: Service<Request<ReqBody>>,
    S::Error: Into<crate::Error>,
{
    type Response = S::Response;
    type Error = crate::Error;
    type Future = ResponseFuture<S::Future>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.inner.poll_ready(cx).map_err(Into::into)
    }

    fn call(&mut self, req: Request<ReqBody>) -> Self::Future {
        let client_timeout = try_parse_grpc_timeout(req.headers()).unwrap_or_else(|e| {
            tracing::trace!("Error parsing `grpc-timeout` header {:?}", e);
            None
        });

        // Use the shorter of the two durations, if either are set
        let timeout_duration = match (client_timeout, self.server_timeout) {
            (None, None) => None,
            (Some(dur), None) => Some(dur),
            (None, Some(dur)) => Some(dur),
            (Some(header), Some(server)) => {
                let shorter_duration = std::cmp::min(header, server);
                Some(shorter_duration)
            }
        };

        ResponseFuture {
            inner: self.inner.call(req),
            sleep: timeout_duration
                .map(tokio::time::sleep)
                .map(OptionPin::Some)
                .unwrap_or(OptionPin::None),
        }
    }
}

So this is why I included this in this PR, but it definitely warrants more scrutiny, and maybe we should just cut it out.

hdevalence · 2023-08-01T19:49:23Z

No, I think that makes sense, let's keep it in but have a comment that explains that the timeout applies to the first item and not to the entire stream duration (noting that we want to support long lived streams)

erwanor added 4 commits August 1, 2023 10:57

pd(oblivious): use rx_/tx_ convention for streams

f5b9e66

pd: set global timeout on request handlers

8c94c91

pd(oblivious): wrap tx_compact_block send in a tokio timeout

41af50c

pd(oblivious): slightly better except statements

71ddaa8

erwanor temporarily deployed to smoke-test August 1, 2023 19:08 — with GitHub Actions Inactive

pd: document expectations about server-level timeout

ab82d96

erwanor temporarily deployed to smoke-test August 1, 2023 19:55 — with GitHub Actions Inactive

erwanor merged commit b443f2c into main Aug 1, 2023

erwanor deleted the grpc_timeout_params branch August 1, 2023 20:18

erwanor self-assigned this Aug 1, 2023

erwanor mentioned this pull request Aug 2, 2023

storage: isolate state snapshot channel #2895

Merged

conorsch mentioned this pull request Aug 4, 2023

pd doesn't behave well under heavy sync load #2867

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd(oblivious): terminate client sync if it exceeds timeout #2887

pd(oblivious): terminate client sync if it exceeds timeout #2887

erwanor commented Aug 1, 2023

hdevalence commented Aug 1, 2023

erwanor commented Aug 1, 2023

hdevalence commented Aug 1, 2023

pd(oblivious): terminate client sync if it exceeds timeout #2887

pd(oblivious): terminate client sync if it exceeds timeout #2887

Conversation

erwanor commented Aug 1, 2023

hdevalence commented Aug 1, 2023

erwanor commented Aug 1, 2023

hdevalence commented Aug 1, 2023