Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect Timeout confusion #832

Closed
victoraugustolls opened this issue Feb 28, 2020 · 29 comments
Closed

Connect Timeout confusion #832

victoraugustolls opened this issue Feb 28, 2020 · 29 comments
Labels
question Further information is requested

Comments

@victoraugustolls
Copy link

victoraugustolls commented Feb 28, 2020

Hi!

I'm quite confused about Connect Timeout. I observed that I have some applications that happens to throw a ConnectionTimeout exception in httpx/backends/asyncio.py, in this part (lines 201, 202):

except asyncio.TimeoutError:
	raise ConnectTimeout()

What I think is strange is that, my applications are monitored with NewRelic and Microsoft Application Insights, and although they say that a request took something like 5s, I can find the request in the origin and it took just some milliseconds.

So I think my question here is this: is there something in the origin application that can cause this timeout? Maybe some blocking code? Or wrong AsyncClient creation?

Any help would be appreciated, thanks!

Obs: my applications are developed using FastAPI

@florimondmanca florimondmanca added the question Further information is requested label Feb 28, 2020
@florimondmanca
Copy link
Member

Interesting, thanks!

they say that a request took something like 5s, I can find the request in the origin and it took just some milliseconds.

Questions:

  • I'm not entirely sure what "the request in the origin" means exactly here — does it refer to an entire client.get() call?
  • Is the ConnectTimeout expected in your case, or is it a spurrious error that's popping up and you're not sure why?

My two hypotheses so far are:

a) These monitoring backends have limited support for asyncio, or…
b) There's something (a synchronous operation?) blocking the event loop (perhaps causing the timeout in the first place?) that makes these backends unable to correctly pick up the start time of the request call.

Would you be able to share:

  • A screenshot of the flame graph (assuming that's what NewRelic/Insights give you)?
  • A snippet of the code surrounding the request call? (ideally a minimal repro example, stripped down of any bits deemed irrelevant)

@victoraugustolls
Copy link
Author

victoraugustolls commented Feb 28, 2020

Hi @florimondmanca ! Thanks for the response. I must add that, right now, after removing the newrelic dependency I no longer seeing the ConnectTimeout error, so probably there was some blocking code there, but will wait to see if the exception happens again.

Is the ConnectTimeout expected in your case, or is it a spurrious error that's popping up and you're not sure why?

Not expected :/

Right now what I'm seeing is this:
<class 'httpx.exceptions.ProtocolError'> : <ConnectionTerminated error_code:ErrorCodes.NO_ERROR, last_stream_id:1999, additional_data:None>

and

[Errno 104] Connection reset by peer

and

<class 'KeyError'> : HTTPConnection(origin=Origin(scheme='https' host='[FILTERED]' port=443))

@victoraugustolls
Copy link
Author

For reference, I have this HTTPX Wrapper class that I use:

from httpx import AsyncClient, Response, Timeout
from opencensus.trace import execution_context
from opencensus.trace.span import SpanKind
from opencensus.trace.status import Status
from opencensus.trace.tracer import Tracer

from .call import HTTPCall


class HTTPService(object):
    __slots__ = "_timeout", "_client", "_uri"

    _timeout: Timeout
    _client: AsyncClient
    _uri: str

    def __init__(
        self,
        *,
        timeout=Timeout(connect_timeout=20, read_timeout=20, write_timeout=20),
        client: AsyncClient,
        uri: str
    ):
        self._timeout = timeout
        self._client = client
        self._uri = uri

    async def call(
        self, call: HTTPCall, client: AsyncClient = None
    ) -> Response:
        if client is None:
            client = self._client

        return await self._call(call, client)

    async def _call(self, call: HTTPCall, client: AsyncClient) -> Response:
        tracer: Tracer = execution_context.get_opencensus_tracer()
        url = self._uri + call.endpoint

        span = tracer.start_span(call.name)

        span.span_kind = SpanKind.CLIENT
        span.add_attribute("http.method", call.method)
        span.add_attribute("http.url", url)
        span.add_attribute("http.host", self._uri)
        span.add_attribute("http.route", call.endpoint)

        try:
            r: Response = await client.request(
                method=call.method,
                url=url,
                auth=call.auth,
                headers=call.headers,
                params=call.params,
                json=call.body,
            )
        except Exception as e:
            span.set_status(Status.from_exception(e))
            span.add_attribute("error", str(e))
            raise
        else:
            span.add_attribute("http.status_code", r.status_code)
            for header in call.tracing_headers:
                span.add_attribute(header, r.headers[header])
            return r
        finally:
            tracer.end_span()

@victoraugustolls
Copy link
Author

victoraugustolls commented Feb 28, 2020

One more observation: when disabling the HTTP/2 in a client, the number of connect timeouts are a lot higher to a specific host then when enabled. When enabled I have almost no timeouts, but have the errors described above.

@victoraugustolls
Copy link
Author

Idea: maybe I'm hitting #634 and the the underlying connection can't be acquired because close in hanging? I know this specific host is made in DotNet Core 2.2, which (as being Microsoft) could result in the same effect as the linked issue?

@victoraugustolls
Copy link
Author

victoraugustolls commented Feb 29, 2020

More news: creating the client with PoolLimits(soft_limit=0, hard_limit=None) basically made most of the errors disappear 🤔

@freis
Copy link

freis commented Feb 29, 2020

@victoraugustolls that approach of the wrapper seems nice, could you share what you are doing on the call file?

@victoraugustolls
Copy link
Author

@freis sure:

from typing import Optional, Tuple, List


class HTTPCall(object):
    __slots__ = (
        "name",
        "method",
        "endpoint",
        "auth",
        "headers",
        "params",
        "body",
        "tracing_headers",
    )

    name: str
    method: str
    endpoint: str
    auth: Optional[Tuple[str, str]]
    headers: Optional[dict]
    params: Optional[dict]
    body: Optional[dict]
    tracing_headers: List[str]

    def __init__(
        self,
        name: str,
        method: str,
        endpoint: str,
        auth: Optional[Tuple[str, str]] = None,
        headers: Optional[dict] = None,
        params: Optional[dict] = None,
        body: Optional[dict] = None,
        tracing_headers: List[str] = None,
    ):
        if tracing_headers is None:
            tracing_headers = []

        self.name = name
        self.method = method
        self.endpoint = endpoint
        self.auth = auth
        self.headers = headers
        self.params = params
        self.body = body
        self.tracing_headers = tracing_headers

@victoraugustolls
Copy link
Author

@florimondmanca a new error has appeared 😢

<class 'httpx.exceptions.ConnectionClosed'>: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

@freis
Copy link

freis commented Mar 1, 2020

Yeah I got something similar using trio also

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 1, 2020

A snippet of the code surrounding the request call? (ideally a minimal repro example, stripped down of any bits deemed irrelevant)

@florimondmanca I made a sample project to help find the issue!

https://github.com/victoraugustolls/httpx-timeout

@victoraugustolls
Copy link
Author

victoraugustolls/httpx-timeout

Added an update with an endpoints for aiohttp and another for httpx. I only receive timeouts on httpx one.

@florimondmanca
Copy link
Member

@victoraugustolls Are you using the latest HTTPX release? Several issues you mentioned (ProtocolError, ConnectionCloses, etc) remind me of past issues that should have been resolved in bug fix releases.

Are you still experiencing the spurious connect timeouts?

Also FYI — the HTTPService code looks okay. Though I see you’re still using a tracing library — are you sure the library correctly supports async execution?

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 1, 2020

Are you using the latest HTTPX release?

Yes! 0.11.1

Are you still experiencing the spurious connect timeouts?

Yes I am, randomly

Though I see you’re still using a tracing library — are you sure the library correctly supports async execution?

I tried without without any tracing and it had no effects. In the sample repository I removed all tracing, just basic client usage, and I can simulate the random ConnectTimeout error with stress tests, but not with aiohttp.

Also, the ConnectTimeout seems to happen only with one specific host, which when I tried out by swapping my endpoint from async to sync and httpx to requests it worked flawlessly.

Other observation is that, if in my application I have the same client for my HTTPServices (total of 4), I receive the errors described above. If I do one client for each HTTPService, the specific host (which is HTTP/2 enabled) receives a lot more ConnectTimeout errors and the other ones work flawlessly.

Can this be some kind of race condition inside the connection pool cause by the HTTP/2 implementation?

@florimondmanca
Copy link
Member

@victoraugustolls Are we off to close this now that we tracked a reproducible issue in #841?

@victoraugustolls
Copy link
Author

I still receive timeouts with http2 disabled, but I think we can close this now and come back to it later, what do you think?

@tomchristie
Copy link
Member

It'd be interesting to test httpcore against this too, since there's a bit more work in there now around eg. ensuring that concurrent HTTP/2 requests end up on the same connection.

@victoraugustolls
Copy link
Author

I would gladly deploy a version of the microservice that I see this behaviour with a version which uses httpcore before it is officially released to help.

@victoraugustolls
Copy link
Author

@florimondmanca one point is that here I raised many types of errors and described receiving ConnectTimeout errors that make no sense.

On #841 the errors are ReadTimeout and KeyError, which in the end can be the same problem (maybe ?) or not.

@florimondmanca
Copy link
Member

florimondmanca commented Mar 4, 2020

@victoraugustolls

Also, the ConnectTimeout seems to happen only with one specific host, which when I tried out by swapping my endpoint from async to sync and httpx to requests it worked flawlessly.

This is a very interesting point. Should we perhaps look deeper into what we might be doing differently than Requests when establishing a connection in the sync case? Our sync implementation is currently backed by urllib3 so if you’re experiencing timeouts in the sync case then there must be something Requests does that we don’t — most likely a special edge case for certain kinds of hosts.

@victoraugustolls
Copy link
Author

I can try making the endpoint sync again but using the sync client instead of the requests library to see what’s happens. As soon as I do this I can post the result here!

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 6, 2020

@florimondmanca @tomchristie by using httpcore-interface here the issues have been mitigated. Right now when issuing a high volume of requests, I have a high amount of Error: <class 'httpcore._exceptions.ConnectTimeout'> errors ONLY with http2 disabled on client. Relevant to say that using aiohttp I have no timeout errors. They seem to appear only on HTTP/1.1 with httpx.

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 6, 2020

With HTTP/2 I only receive Error: <class 'h2.exceptions.TooManyStreamsError'> Max outbound streams is 100, 100 open, but that was excepted, at least until this is merged.

@tomchristie
Copy link
Member

They seem to appear only on HTTP/1.1 with httpx.

Right, and presuambly then only because we have tighter defaults.

@victoraugustolls
Copy link
Author

Right, and presuambly then only because we have tighter defaults.

If you are referencing to timeout defaults, I'm using this Session and Client:

client = httpx.AsyncClient(
    http2=False,
    pool_limits=httpx.PoolLimits(soft_limit=10, hard_limit=50000)
)
session = aiohttp.ClientSession(
    timeout=aiohttp.ClientTimeout(5.0)
)

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 6, 2020

Also, I receive the following errors with HTTP/2 connections under stress:

Error: <class 'httpcore._exceptions.ProtocolError'> <ConnectionTerminated error_code:ErrorCodes.NO_ERROR, last_stream_id:1999, additional_data:None>
Error: <class 'h2.exceptions.ProtocolError'> Invalid input ConnectionInputs.RECV_HEADERS in state ConnectionState.CLOSED
Error: <class 'httpcore._exceptions.ProtocolError'> <ConnectionTerminated error_code:ErrorCodes.NO_ERROR, last_stream_id:1999, additional_data:None>
Error: <class 'h2.exceptions.ProtocolError'> Invalid input ConnectionInputs.RECV_HEADERS in state ConnectionState.CLOSED
Error: <class 'h2.exceptions.ProtocolError'> Invalid input ConnectionInputs.RECV_DATA in state ConnectionState.CLOSED

@florimondmanca
Copy link
Member

florimondmanca commented Jun 20, 2020

Hi @victoraugustolls, is this still an issue as of 0.13? Curious if any of the improvements since 4mo ago have helped reduce these spurious connect timeouts.

@victoraugustolls
Copy link
Author

Hi @florimondmanca , sorry for the (very) late response. I just deployed a version of my api with the newest release and I must say it works flawlessly.

The only error I receive is this one: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE but I do not think, at this point, that this is a bug / error from httpx, but yes a flaw I the remote server.

If you want, I can give more info about this error.

@tomchristie
Copy link
Member

If you want, I can give more info about this error.

@victoraugustolls That'd be great yup - is there an easy minimal way to reproduce it?
Even if it is a server error, it'd be helpful to dig into, and make sure we're raising an appropriate exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants