New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
responseClassifier not working with http/1.0 requests #409
Comments
Does this configuration works properly when configured without a responseClassifier? I expect that this issue isn't related to response classifiers. From the router's perspective, retrying shouldn't be any different from issuing two requests. It may be helpful to get a PCAP file (i.e. from tcpdump) to give us some more insight into what's going on between linkerd and your server. |
Good question -- it "works" without responseClassifier, in the sense that 10% of requests fail with a 503 (as expected), and I don't see any ChannelClosedExceptions in the logs.
|
Okay, I've been able to reproduce this in a test... So we've got that going for us |
Test Http/1.0 requests with and without retries.
Test Http/1.0 requests with and without retries.
The retry filter may issue multiple requests on the underlying service factory. This interaction needs to be managed by a FactoryToService module so that the underlying service (connection) is released to finagle between requests so that, for instance, it may be closed in the case of HTTP/1.0 requests. Fixes #409
* Introduce a FactoryToService module into the path stack The retry filter may issue multiple requests on the underlying service factory. This interaction needs to be managed by a FactoryToService module so that the underlying service (connection) is released to finagle between requests so that, for instance, it may be closed in the case of HTTP/1.0 requests. Fixes #409 * Move acquisition failure tracing below retries. Now that factory acquisition is pushed below retries, the client factory lookup always succeeds. In order to record acquisition failures, recording must be pushed below retries. * closes aren't synchronous, so we can't rely on them to be timely in CI. * Fixup commentary * Move NoBrokersAvailable lifting down under the path stack With the prior change, NoBrokersAvailableExceptions would not be decorated with Dtab information because they are thrown at request time and not service acquisition time (due to factoryToService). This change moves NoBrokers decoration below the path stack. * improve DstBindingFactory testing in face of factory-to-service
The build scripts assume they are executed from the root of this repo. This prevents running scripts from other locations, for example, `cd web && ../bin/go-run .`. Modify the build scripts to work regardless of current directory. Fixes linkerd#301 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
It appears that retries aren't working as expected for http clients that don't use persistent connections when talking to linkerd.
For example, I have the following router config:
Which forwards requests to a go http server running on port 9000. The go server is configured to respond to 10% of requests with a 503:
If I send 1000 http/1.1 requests to the router with curl, it uses persistent connections by default and all 1000 requests succeed.
From the router stats:
You can see that roughly 10% of client requests failed, but overall router success rate was 100%. This is confirmed by looking at requeue stats:
If instead I send http/1.0 requests that don't use persistent connections, the retry behavior does not work as expected, and I see 502 and 503 responses from the router.
The router stats show an overall router success rate of 89%:
And the requeue stats indicate that some requests were retried 4 times and still failed, which is unlikely:
The 3 502s that we received correspond to 3 ChannelClosedExceptions in the router's logs. For instance:
As far as I can tell, when a request to linkerd is made without a persistent http connection and the first response it receives from the downstream is a failure, it goes into a retry loop that fully exhausts the retry budget, and still fails the request.
The text was updated successfully, but these errors were encountered: