Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: IncompleteMessage: connection closed before message completed #2136

Closed
fhsgoncalves opened this issue Feb 20, 2020 · 10 comments
Closed

Comments

@fhsgoncalves
Copy link

fhsgoncalves commented Feb 20, 2020

Hey, I'm experiencing a weird behavior with the hyper client when using https.
Sometimes my app in production fails to perform the request, but the same request works most of the time. I performed a load test locally to try to reproduce the problem, and I could reproduce: it is occurring ~0.02% of the times.

I guess that it could be something related to the hyper-tls, so I switched to hyper-rustls, but the same problem continue to occur.
So I tried to hit the url using http instead of https and the error went away!

The error I receive from hyper::Client::get is: hyper::Error(IncompleteMessage): connection closed before message completed.

Follow a minimal working example to reproduce the error:

Cargo.toml:

[dependencies]
hyper = "0.13"
tokio = { version = "0.2", features = ["full"] }
hyper-tls = "0.4.1"

src/main.rs:

use std::convert::Infallible;
use std::net::SocketAddr;

use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Client, Response, Server, Uri};
use hyper_tls::HttpsConnector;


pub type HttpClient = Client<HttpsConnector<hyper::client::connect::HttpConnector>>;

#[tokio::main]
async fn main() {
    let addr = SocketAddr::from(([0, 0, 0, 0], 8100));
    let client = Client::builder().build::<_, hyper::Body>(HttpsConnector::new());

    let make_service = make_service_fn(move |_| {
        let client = client.clone();
        async move { Ok::<_, Infallible>(service_fn(move |_req| handle(client.clone()) )) }
    });

    let server = Server::bind(&addr).serve(make_service);

    println!("Listening on http://{}", addr);

    if let Err(e) = server.await {
        eprintln!("server error: {}", e);
    }
}

async fn handle(client: HttpClient) -> Result<Response<Body>, hyper::Error> {

    let url = "https://url-here"; // CHANGE THE URL HERE!

    match client.get(url.parse::<Uri>().unwrap()).await {
        Ok(resp) => Ok(resp),
        Err(err) => { eprintln!("{:?} {}", err, err); Err(err) }
    }
}

PS: replace the url value with a valid https url. In my tests I used a small file on aws s3.

I performed a local load test using hey:

$ hey -z 120s -c 150 http://localhost:8100 

Running the test for 2 minutes (-z 120s) was enough to see some errors appearing.

Could anyone help me out? If I need to provide more information, or anything, just let me know.
Thank you!

@seanmonstar
Copy link
Member

This is just due to the racy nature of networking.

hyper has a connection pool of idle connections, and it selected one to send your request. Most of the time, hyper will receive the server's FIN and drop the dead connection from its pool. But occasionally, a connection will be selected from the pool and written to at the same time the server is deciding to close the connection. Since hyper already wrote some of the request, it can't really retry it automatically on a new connection, since the server may have acted already.

@fhsgoncalves
Copy link
Author

Hey, thank you for the swift response!

I got it! So the connection is being reused, right? It is due the keep-alive option?
If it is, disabling this flag, or performing a retry on the app side should solve the issue?


Also, I could not reproduce the error when requesting a url over http. I tried a lot of times, without success, I could only reproduce the issue when requesting a url over https.

If that is the reason, I should experienced the issue when using http too, right?

@fhsgoncalves
Copy link
Author

I just found that aws s3 has a default max idle timeout of 20s, and hyper's default keep_alive_timeout is 90s.

Settings the keep_alive_timeout to less than 20s on hyper client seems to have solved the problem!

Thank you, your explanation really help me to understand why this was happening!

@fhsgoncalves
Copy link
Author

fhsgoncalves commented Feb 21, 2020

I was looking at the java aws client, and I saw that they use the max-idle-timeout as 60s, but there is a second property called validate-after-inactivity (5s default) that allows the idle timeout be so high.
Looking at the code, I saw that the http client they use supports this behavior.

It would be possible to implement the same behavior on hyper? Does it make sense? 😄

@seanmonstar
Copy link
Member

I believe the "revalidation" it does is to poll that it is readable. In hyper, we already register for when the OS discovers the connection has hung up. The race would still exist, if the "revalidation" happened at the same time the server was closing.

Rudo2204 added a commit to Rudo2204/rpl that referenced this issue Jun 9, 2021
```
Error: Request Error when talking to qbittorrent: error sending request for url (http://localhost:6006/api/v2/torrents/delete): connection closed before message completed

Caused by:
    0: error sending request for url (http://localhost:6006/api/v2/torrents/delete): connection closed before message completed
    1: connection closed before message completed
```
Issue: hyperium/hyper#2136
@ronanyeah
Copy link

Anyone getting this with reqwest, try this:

let client = reqwest::Client::builder()
    .pool_max_idle_per_host(0)
    .build()?;

wyyerd/stripe-rs#172

@Rudo2204
Copy link

Well, I tried to use .pool_max_idle_per_host(0) and I still got this error today.

@loyd
Copy link

loyd commented Oct 28, 2021

Doesn't hyper take into account the Keep-Alive header?

I've faced this problem with ClickHouse HTTP crate, however, ClickHouse sends Keep-Alive: timeout=3, so I don't understand, why hyper doesn't handle it.

@seanmonstar, any ideas?

im-0 added a commit to im-0/solana that referenced this issue Aug 26, 2022
Setting pool idle timeout to a value smaller than watchtower's poll
interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.
im-0 added a commit to im-0/solana that referenced this issue Sep 16, 2022
Setting pool idle timeout to a value smaller than solana-watchtower's
poll interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.
mvines pushed a commit to solana-labs/solana that referenced this issue Sep 16, 2022
Setting pool idle timeout to a value smaller than solana-watchtower's
poll interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.
mergify bot pushed a commit to solana-labs/solana that referenced this issue Sep 16, 2022
Setting pool idle timeout to a value smaller than solana-watchtower's
poll interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.

(cherry picked from commit 798975f)
mvines pushed a commit to solana-labs/solana that referenced this issue Sep 17, 2022
Setting pool idle timeout to a value smaller than solana-watchtower's
poll interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.

(cherry picked from commit 798975f)
flomonster added a commit to OpenRailAssociation/osrd that referenced this issue Aug 11, 2023
This fix is linked to this hyper issue hyperium/hyper#2136
It can't be reproduced locally and is a frequent occurrence in the CI.
Note: It might be a better fix than this one.
BlackDex added a commit to BlackDex/vaultwarden that referenced this issue Aug 13, 2023
Some optimizations in regards to downloading Favicon's.

I also encounterd some issues with accessing some sites where the
connection got dropped or closed early. This seems a reqwest/hyper
thingy, hyperium/hyper#2136. This is now also
fixed.

General:

- Decreased struct size
- Decreased memory allocations
- Optimized tokenizer a bit more to only emit tags when all attributes are there and are valid.

reqwest/hyper connection issue:
The following changes helped solve the connection issues to some sites.
The endresult is that some icons are now able to be downloaded always instead of sometimes.

- Enabled some extra reqwest features, `deflate` and `native-tls-alpn`
  (Which do not bring in any extra crates since other crates already enabled them, but they were not active for Vaultwarden it self)
- Configured reqwest to have a max amount of idle pool connections per host
- Configured reqwest to timeout the idle connections in 10 seconds
BlackDex added a commit to BlackDex/vaultwarden that referenced this issue Aug 13, 2023
Some optimizations in regards to downloading Favicon's.

I also encounterd some issues with accessing some sites where the
connection got dropped or closed early. This seems a reqwest/hyper
thingy, hyperium/hyper#2136. This is now also
fixed.

General:

- Decreased struct size
- Decreased memory allocations
- Optimized tokenizer a bit more to only emit tags when all attributes are there and are valid.

reqwest/hyper connection issue:
The following changes helped solve the connection issues to some sites.
The endresult is that some icons are now able to be downloaded always instead of sometimes.

- Enabled some extra reqwest features, `deflate` and `native-tls-alpn`
  (Which do not bring in any extra crates since other crates already enabled them, but they were not active for Vaultwarden it self)
- Configured reqwest to have a max amount of idle pool connections per host
- Configured reqwest to timeout the idle connections in 10 seconds
github-merge-queue bot pushed a commit to OpenRailAssociation/osrd that referenced this issue Aug 14, 2023
This fix is linked to this hyper issue hyperium/hyper#2136
It can't be reproduced locally and is a frequent occurrence in the CI.
Note: It might be a better fix than this one.
@joleeee
Copy link

joleeee commented Nov 19, 2023

I think it's unexpected for most people that this isn't automatically retried? If i ask the library to get a website for me i expect it to not fail because the keep alive timed out. If it should use keepalive by default it should also be able to handle it properly?

Am I misunderstanding anything here? I think closing this as completed is misleading :-)

digizeph added a commit to bgpkit/bgpkit-broker that referenced this issue Nov 19, 2023
@cschramm
Copy link

cschramm commented Nov 20, 2023

it should also be able to handle it properly?

It's just not possible conceptually, is it? See #2136 (comment)

Rather the application developer has to decide if the request should get retried, e.g. if it's a well behaving GET request or if it's visible on the application level that the request did or did not have the desired effect yet.

Side note: There seem to be some weird servers that silently time out connections, meaning that they do not close the connection when their timeout is reached but unconditionally close it as soon as they get reused later. While you can counteract that with a suitable pool_idle_timeout, I think it would be possible for the client to trigger that behavior before sending an actual request. It would still be racy as any connection if it does not trigger, though.

daniel-savu added a commit to hyperlane-xyz/hyperlane-monorepo that referenced this issue Feb 19, 2024
### Description

Applies the fix in
#2384 everywhere
an `HttpClient` is constructed via rusoto.

It lowers the S3 timeout to 15s based on tips in [this
thread](hyperium/hyper#2136 (comment)),
to avoid `Error during dispatch: connection closed before message
completed` errors. Note that we'll probably still run into these issues,
but less frequently
([source](rusoto/rusoto#1766 (comment))).


### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

<!--
- Fixes #[issue number here]
-->

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
ltyu pushed a commit to ltyu/hyperlane-monorepo that referenced this issue Mar 13, 2024
### Description

Applies the fix in
hyperlane-xyz#2384 everywhere
an `HttpClient` is constructed via rusoto.

It lowers the S3 timeout to 15s based on tips in [this
thread](hyperium/hyper#2136 (comment)),
to avoid `Error during dispatch: connection closed before message
completed` errors. Note that we'll probably still run into these issues,
but less frequently
([source](rusoto/rusoto#1766 (comment))).


### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

<!--
- Fixes #[issue number here]
-->

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
daniel-savu added a commit to hyperlane-xyz/hyperlane-monorepo that referenced this issue May 30, 2024
Applies the fix in
#2384 everywhere
an `HttpClient` is constructed via rusoto.

It lowers the S3 timeout to 15s based on tips in [this
thread](hyperium/hyper#2136 (comment)),
to avoid `Error during dispatch: connection closed before message
completed` errors. Note that we'll probably still run into these issues,
but less frequently
([source](rusoto/rusoto#1766 (comment))).

<!--
Are there any minor or drive-by changes also included?
-->

<!--
- Fixes #[issue number here]
-->

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
daniel-savu added a commit to hyperlane-xyz/hyperlane-monorepo that referenced this issue Jun 4, 2024
Backport of
#3283

Applies the fix in
#2384 everywhere
an `HttpClient` is constructed via rusoto.

It lowers the S3 timeout to 15s based on tips in [this
thread](hyperium/hyper#2136 (comment)),
to avoid `Error during dispatch: connection closed before message
completed` errors. Note that we'll probably still run into these issues,
but less frequently

([source](rusoto/rusoto#1766 (comment))).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants