Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd(oblivious): terminate client sync if it exceeds timeout #2887

Merged
merged 5 commits into from
Aug 1, 2023

Conversation

erwanor
Copy link
Member

@erwanor erwanor commented Aug 1, 2023

This PR is part of a series of performance improvements that we are making to pd. This PR specifically:

  • terminate the compact_block_range streams if the outbound queue is full for more than 1s.
  • adds a servel-level timeout of 7s to all request handlers

@erwanor erwanor temporarily deployed to smoke-test August 1, 2023 19:08 — with GitHub Actions Inactive
@hdevalence
Copy link
Member

Why do we want a server-level timeout for all request handlers? Won't that break the compact block subscription behavior we want?

@erwanor
Copy link
Member Author

erwanor commented Aug 1, 2023

I think you might be right, although local testing with aggressive timeouts (e.g. 1s) over sync processes that last 10s still work.

My thought process to include this is that when I skimmed over the grpc-timeout code, I noticed that the grpc-timeout middleware operate at the level of a client Request and applies the server-level timeout to the Response corresponding to that request for a stream, but not the subsequent items that the server worker will send:

impl<S, ReqBody> Service<Request<ReqBody>> for GrpcTimeout<S>
where
    S: Service<Request<ReqBody>>,
    S::Error: Into<crate::Error>,
{
    type Response = S::Response;
    type Error = crate::Error;
    type Future = ResponseFuture<S::Future>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.inner.poll_ready(cx).map_err(Into::into)
    }

    fn call(&mut self, req: Request<ReqBody>) -> Self::Future {
        let client_timeout = try_parse_grpc_timeout(req.headers()).unwrap_or_else(|e| {
            tracing::trace!("Error parsing `grpc-timeout` header {:?}", e);
            None
        });

        // Use the shorter of the two durations, if either are set
        let timeout_duration = match (client_timeout, self.server_timeout) {
            (None, None) => None,
            (Some(dur), None) => Some(dur),
            (None, Some(dur)) => Some(dur),
            (Some(header), Some(server)) => {
                let shorter_duration = std::cmp::min(header, server);
                Some(shorter_duration)
            }
        };

        ResponseFuture {
            inner: self.inner.call(req),
            sleep: timeout_duration
                .map(tokio::time::sleep)
                .map(OptionPin::Some)
                .unwrap_or(OptionPin::None),
        }
    }
}

So this is why I included this in this PR, but it definitely warrants more scrutiny, and maybe we should just cut it out.

@hdevalence
Copy link
Member

No, I think that makes sense, let's keep it in but have a comment that explains that the timeout applies to the first item and not to the entire stream duration (noting that we want to support long lived streams)

@erwanor erwanor temporarily deployed to smoke-test August 1, 2023 19:55 — with GitHub Actions Inactive
@erwanor erwanor merged commit b443f2c into main Aug 1, 2023
@erwanor erwanor deleted the grpc_timeout_params branch August 1, 2023 20:18
@erwanor erwanor self-assigned this Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Testnet 58: Mimas
Development

Successfully merging this pull request may close these issues.

2 participants