Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy does not honor request cancellation #986

Closed
siggy opened this issue May 23, 2018 · 10 comments
Closed

Proxy does not honor request cancellation #986

siggy opened this issue May 23, 2018 · 10 comments
Assignees
Milestone

Comments

@siggy
Copy link
Member

siggy commented May 23, 2018

Using the lifecycle test environment (introduced in https://github.com/runconduit/conduit-examples/pull/41) we're observing a gradual throughput drop from 3000qps with 100% SR to 0qps.

It's unclear whether this issue lies in the proxy, test environment, or some interaction with Kubernetes.

Setup

Test environment topology:

slow_cooker ->
  HTTP 1.1 ->
    bb point-to-point ->
      gRPC ->
        bb terminus

The bb terminus server exits after one minute's worth of requests, causing the gRPC connection to fail. Kubernetes then restarts the container in the existing pod.

Repro

These results were produced on Docker for Mac with Kubernetes.

Build Conduit off of master, currently at https://github.com/runconduit/conduit/tree/c5f0adafc8a831c0cc19b9abdf2fa1016e1942c5 (with #985 applied for docker build fix).

bin/docker-build

Deploy conduit and test environment

bin/conduit install - | kubectl apply -f -
curl -s https://raw.githubusercontent.com/runconduit/conduit-examples/f9ad69bfadcc817e3042b9027cba37467d726cf4/lifecycle/lifecycle.yml | bin/conduit inject - | kubectl apply -f -

Logs

slow-cooker

$ kubectl -n lifecycle logs -f $(
  kubectl -n lifecycle get po --selector=job-name=slow-cooker -o jsonpath='{.items[*].metadata.name}'
) slow-cooker
# sending 100 GET req/s with concurrency=10 to http://bb-p2p.lifecycle.svc.cluster.local:8080 ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2018-05-23T06:34:57Z   2990/0/0 3000  99% 30s   7 [ 27  57  81  149 ]  149 +
2018-05-23T06:35:27Z   2893/0/0 3000  96% 30s   7 [ 26  62  92 1367 ] 1367
2018-05-23T06:35:57Z   2684/0/0 3000  89% 30s   6 [ 26  62 747 1958 ] 1958 +
2018-05-23T06:36:27Z   3000/0/0 3000 100% 30s   7 [ 27  53  80  152 ]  152
2018-05-23T06:36:57Z    816/18/0 3000  27% 30s   4 [ 30 116 10007 10039 ] 10036 +
2018-05-23T06:37:27Z   1475/19/0 3000  49% 30s   6 [ 38 110 10007 10023 ] 10018
2018-05-23T06:37:57Z   2988/0/0 3000  99% 30s   9 [ 32  96 171  278 ]  278 -
2018-05-23T06:38:27Z    626/17/0 3000  20% 30s   4 [ 17  66 10007 10039 ] 10033
2018-05-23T06:38:57Z     91/30/0 3000   3% 30s   5 [ 24 10007 10015 10023 ] 10016
...
2018-05-23T06:56:27Z      8/30/0 3000   0% 30s  13 [10007 10015 10031 10031 ] 10026
2018-05-23T06:56:57Z      7/30/0 3000   0% 30s   5 [10007 10023 10039 10039 ] 10037
...
2018-05-23T07:06:27Z      0/30/0 3000   0% 30s 10000 [10007 10015 10071 10071 ] 10069
2018-05-23T07:06:57Z      0/30/0 3000   0% 30s 10000 [10007 10023 10087 10087 ] 10080

proxy log in bb-p2p pod

$ kubectl -n lifecycle logs -f po/bb-p2p-578cf865cb-vgspz conduit-proxy
INFO conduit_proxy using controller at HostAndPort { host: DnsName(Name("proxy-api.conduit.svc.cluster.local")), port: 8086 }
INFO conduit_proxy routing on V4(127.0.0.1:4140)
INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
INFO conduit_proxy serving Prometheus metrics on V4(0.0.0.0:4191)
INFO conduit_proxy protocol detection disabled for inbound ports {25, 3306}
INFO conduit_proxy protocol detection disabled for outbound ports {25, 3306}
INFO conduit_proxy::transport::connect "controller-client", DNS resolved DnsName(Name("proxy-api.conduit.svc.cluster.local")) to 10.98.179.79
ERR! conduit_proxy turning operation timed out after 10s into 500
ERR! conduit_proxy turning operation timed out after 10s into 500
ERR! conduit_proxy turning operation timed out after 10s into 500
ERR! conduit_proxy turning operation timed out after 10s into 500

tail of bb-p2p proxy with --proxy-log-level debug,conduit_proxy=debug:
https://gist.github.com/siggy/874ea90f5f4c35ec7260b835bef82625

proxy log in bb terminus pod

$ kubectl -n lifecycle logs -f po/bb-terminus-748dddbb75-6hm4g conduit-proxy
INFO conduit_proxy using controller at HostAndPort { host: DnsName(Name("proxy-api.conduit.svc.cluster.local")), port: 8086 }
INFO conduit_proxy routing on V4(127.0.0.1:4140)
INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
INFO conduit_proxy serving Prometheus metrics on V4(0.0.0.0:4191)
INFO conduit_proxy protocol detection disabled for inbound ports {25, 3306}
INFO conduit_proxy protocol detection disabled for outbound ports {25, 3306}
INFO conduit_proxy::transport::connect "controller-client", DNS resolved DnsName(Name("proxy-api.conduit.svc.cluster.local")) to 10.98.179.79
WARN conduit_proxy::bind connect error to Endpoint { address: V4(10.1.15.228:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)
WARN conduit_proxy::bind connect error to Endpoint { address: V4(10.1.15.228:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)
WARN conduit_proxy::bind connect error to Endpoint { address: V4(10.1.15.228:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)

tail of bb terminus proxy with --proxy-log-level debug,conduit_proxy=debug:
https://gist.github.com/siggy/654a54be40175783f3d45c86933292ec

Baseline

Confirmed that without proxy injection, this issue does not arise.

cat lifecycle.yml | kubectl apply -f -
$  kubectl -n lifecycle logs -f $(
>   kubectl -n lifecycle get po --selector=job-name=slow-cooker -o jsonpath='{.items[*].metadata.name}'
> ) slow-cooker
# sending 100 GET req/s with concurrency=10 to http://bb-p2p.lifecycle.svc.cluster.local:8080 ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2018-05-23T22:21:14Z   2500/490/0 3000  83% 30s   0 [  7  18  34  102 ]  102 +
2018-05-23T22:21:44Z   2920/0/0 3000  97% 30s   1 [  7  18  44  987 ]  987
2018-05-23T22:22:14Z   2896/0/0 3000  96% 30s   1 [  9  66 133  201 ]  201
2018-05-23T22:22:44Z   2988/0/0 3000  99% 30s   1 [ 11  53 100  256 ]  256
2018-05-23T22:23:14Z   2992/0/0 3000  99% 30s   1 [  8  32  84  199 ]  199
2018-05-23T22:23:44Z   2999/0/0 3000  99% 30s   2 [  8  32  88  226 ]  226
2018-05-23T22:24:14Z   3004/0/0 3000 100% 30s   1 [  7  16  27   42 ]   42
2018-05-23T22:24:44Z   3004/0/0 3000 100% 30s   2 [  7  16  37   63 ]   63
2018-05-23T22:25:14Z   1590/1410/0 3000  53% 30s   0 [  5  14  25   50 ]   50
...
2018-05-23T22:30:14Z   1010/1990/0 3000  33% 30s   0 [  4  11  30   39 ]   39
2018-05-23T22:30:44Z    740/2260/0 3000  24% 30s   0 [  3  11  27   41 ]   41

Note that some degradation does occur, most likely due to https://github.com/runconduit/conduit-examples/issues/42.

@olix0r
Copy link
Member

olix0r commented May 24, 2018

I suspect that this is related to #952

Everything goes fine in the bb-terminus proxy until a connect error is encountered:

TRCE conduit_proxy::inbound recognize local=10.1.1.61:4143 orig=Some(V4(10.1.1.61:9090))
TRCE conduit_proxy::inbound recognize key=Some((V4(10.1.1.61:9090), Http2))
WARN conduit_proxy::bind connect error to Endpoint { address: V4(10.1.1.61:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)
DBUG conduit_proxy::bind bind_stack endpoint=Endpoint { address: V4(10.1.1.61:9090), dst_labels: None }, protocol=Http2
DBUG conduit_proxy::bind bind_stack endpoint=Endpoint { address: V4(10.1.1.61:9090), dst_labels: None }, protocol=Http2
DBUG conduit_proxy::bind bind_stack endpoint=Endpoint { address: V4(10.1.1.61:9090), dst_labels: None }, protocol=Http2
...

This log message prints repeatedly and it appears that it is never properly re-bound.

@olix0r
Copy link
Member

olix0r commented May 24, 2018

Ah hah! It appears to me that the proxy is doing the right thing: there is nothing accepting connections on port 9090:

:; kubectl exec -n lifecycle bb-terminus-dbc9c8996-f4qlq -c bb-terminus -it bash 
root@bb-terminus-dbc9c8996-f4qlq:/# curl localhost:9090
curl: (7) Failed to connect to localhost port 9090: Connection refused

Checking the container's logs, it appears that nothing is logged after the container was supposed to have restarted:

time="2018-05-24T22:52:42Z" level=info msg="TerminateAfter limit hit (61), stopping [terminus-grpc:9090-h1:-1]"
time="2018-05-24T22:52:42Z" level=info msg="Stopping service [terminus-grpc:9090-h1:-1] due to handler"
time="2018-05-24T22:52:42Z" level=info msg="Shutting down [grpc-9090]"
time="2018-05-24T22:52:42Z" level=info msg="Received gRPC request [in:http-sid:point-to-point-channel-grpc:-1-h1:8080-442218966] [requestUID:\"in:http-sid:point-to-point-channel-grpc:-1-h1:8080-442218966\" ] Returning response [requestUID:\"in:http-sid:point-to-point-channel-grpc:-1-h1:8080-442218966\" payload:\"BANANA\" ]"

Though, it does appear the container is thought the be running:

  bb-terminus:
    Container ID:  docker://0255acbba2efcc3989dea088defa790aa4064dd7cc6bc2c52ed5237afdd29718
    Image:         buoyantio/bb:v0.0.3
    Image ID:      docker-pullable://buoyantio/bb@sha256:c6ba300939a944ece25a005d736500551ed213d1a7a0169250bcbc069c3990e3
    Port:          9090/TCP
    Command:
      /bin/bash
    Args:
      -c
      exec \
/out/bb terminus \
--grpc-server-port=9090 \
--response-text=BANANA \
--terminate-after=$(shuf -i 55-65 -n1)

    State:          Running
      Started:      Thu, 24 May 2018 22:52:41 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 24 May 2018 22:52:23 +0000
      Finished:     Thu, 24 May 2018 22:52:25 +0000
    Ready:          True
    Restart Count:  2
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-z5glz (ro)

@olix0r
Copy link
Member

olix0r commented May 24, 2018

This configuration appears to reproduce this buggy behavior without the proxy being in the loop: we run slow-cooker, bb-p2p, and bb-terminus all in a single pod. slow-cooker's throughput drops similarly.

I'm going to close this Conduit issue, since this doesn't seem worth pursuing further here. Please reopen this if we find something that indicates a Conduit issue.

@olix0r olix0r closed this as completed May 24, 2018
@siggy
Copy link
Member Author

siggy commented May 25, 2018

A slight modification to the localtest config, with services added, recovers after the container restarts: https://gist.github.com/siggy/3e353bfd086f167895c723282dacf065

Adding services may have been a red herring, a successful run with either gist should result in a CrashLoopBackoff, indicating the proxy has received traffic across several container restarts.

@siggy siggy reopened this May 25, 2018
@olix0r
Copy link
Member

olix0r commented May 25, 2018

As siggy mentioned, the nature of the repro has made it hard to nail this down exactly.

When a container restarts, the proxy attempts to rebind the client to that endpoint. This appears to work more-or-less as intended: the terminus container completes its work and exits and, when kubernetes restarts the process, the proxy reconnects properly. However, sometimes the proxy continues to attempt to reconnect to the local container, but it continually gets connection refused errors.

I suspected that, in this situation, the terminus container is unable to start because the proxy is too aggressively consuming CPU resources (attempting to reconnect). To validate this theory, I created a branch that adds a delay into the reconnect logic so that the terminus container has a better shot of getting on its feet. This does not resolve the issue. The proxy can still get into situations where it continually receives an error:

DBUG conduit_proxy::bind connect error to Endpoint { address: V4(10.1.1.110:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)

Looking at the terminus container, its status is reported as:

    State:          Running
      Started:      Fri, 25 May 2018 14:46:55 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 25 May 2018 14:46:22 +0000
      Finished:     Fri, 25 May 2018 14:46:42 +0000

And in the logs I see:

time="2018-05-25T14:47:15Z" level=info msg="TerminateAfter limit hit (60), stopping [terminus-grpc:9090-h1:-1]"
time="2018-05-25T14:47:15Z" level=info msg="Stopping service [terminus-grpc:9090-h1:-1] due to handler"
time="2018-05-25T14:47:15Z" level=info msg="Shutting down [grpc-9090]"

This seems to indicate that the terminus application is still running but no longer accepting connections! The process started at 14:46:55 and logged that it was shutting down at 14:47:15, but it hasn't actually shut down!

Here's the configuration I'm using: https://gist.github.com/olix0r/c87538ca14ca93bb5f940780079edcfa

@olix0r
Copy link
Member

olix0r commented May 25, 2018

I cannot reproduce this with HTTP/1 but I can reproduce this with HTTP/2.

It appears that there is a pending request that prevents the server from terminating:

request_total{authority="server.bb.svc.cluster.local:9090",direction="inbound"} 181
response_total{authority="server.bb.svc.cluster.local:9090",direction="inbound",classification="success",status_code="200",grpc_status_code="0"} 180

@olix0r
Copy link
Member

olix0r commented May 25, 2018

The server sees only one pending request:

:; kubectl exec -n bb -c conduit-proxy server-855d64458b-28pnb -- curl -s localhost:4191/metrics |rg -v '^#' |rg -e request_total -e response_total      
request_total{authority="server.bb.svc.cluster.local:9090",direction="inbound"} 181
response_total{authority="server.bb.svc.cluster.local:9090",direction="inbound",classification="success",status_code="200",grpc_status_code="0"} 180

The middle tier sees 27 (and growing) pending requests:

:; kubectl exec -n bb -c conduit-proxy middle-7549b6d8b8-c5f2z -- curl -s localhost:4191/metrics |rg -v '^#' |rg -e request_total -e response_total
request_total{authority="middle.bb.svc.cluster.local:8080",direction="inbound"} 220
request_total{authority="server.bb.svc.cluster.local:9090",direction="outbound",dst_conduit_io_control_plane_ns="conduit",dst_deployment="server",dst_namespace="bb",dst_pod="server-855d64458b-28pnb",dst_pod_template_hash="4118200146",dst_service="server"} 207
response_total{authority="middle.bb.svc.cluster.local:8080",direction="inbound",classification="failure",status_code="500"} 39
response_total{authority="server.bb.svc.cluster.local:9090",direction="outbound",dst_conduit_io_control_plane_ns="conduit",dst_deployment="server",dst_namespace="bb",dst_pod="server-855d64458b-28pnb",dst_pod_template_hash="4118200146",dst_service="server",classification="success",status_code="200",grpc_status_code="0"} 180
response_total{authority="middle.bb.svc.cluster.local:8080",direction="inbound",classification="success",status_code="200"} 180

@olix0r
Copy link
Member

olix0r commented May 25, 2018

Removing the proxy from either the middle or server tiers seems to resolve the behavior. When both the middle and server tiers have the proxy, this bug is exhibited.

Fairly reliably, this occurs on the 3rd iteration of the server (i.e. after 2 restarts).

Shortly before the test breaks, we see the terminus server send GoAway frames to the proxy:

TRCE h2::proto::connection recv GOAWAY; frame=GoAway { last_stream_id: StreamId(2147483647), error_code: NO_ERROR }
TRCE h2::proto::settings send_pending_ack; pending=None
TRCE h2::proto::connection recv PING; frame=Ping { ack: false, payload: [1, 6, 1, 8, 0, 3, 3, 9] }
TRCE h2::proto::settings send_pending_ack; pending=None
TRCE h2::proto::streams::prioritize try reclaim frame
TRCE h2::proto::streams::prioritize poll_complete
TRCE h2::proto::streams::prioritize schedule_pending_open
TRCE h2::proto::streams::prioritize pop_frame
TRCE h2::proto::streams::prioritize try reclaim frame
TRCE h2::proto::settings send_pending_ack; pending=None
TRCE h2::proto::connection recv GOAWAY; frame=GoAway { last_stream_id: StreamId(121), error_code: NO_ERROR }
TRCE h2::proto::settings send_pending_ack; pending=None
TRCE h2::proto::streams::prioritize try reclaim frame
TRCE h2::proto::streams::prioritize poll_complete
TRCE h2::proto::streams::prioritize schedule_pending_open
TRCE h2::proto::streams::prioritize pop_frame
TRCE h2::proto::streams::prioritize try reclaim frame
TRCE h2::proto::settings send_pending_ack; pending=None
TRCE h2::proto::connection recv HEADERS; frame=Headers { stream_id: StreamId(121), stream_dep: None, flags: HeadersFlag { end_stream: false, end_headers: true, padded: false, priority: false } }
TRCE h2::proto::streams::flow_control inc_window; sz=65535; old=0; new=65535
TRCE h2::proto::streams::flow_control inc_window; sz=65535; old=0; new=65535
TRCE h2::proto::streams::streams recv_headers; stream=StreamId(121); state=State { inner: Idle }
TRCE h2::proto::streams::recv opening stream; init_window=65535
TRCE h2::proto::streams::store Queue::push
TRCE h2::proto::streams::store  -> first entry
TRCE h2::proto::streams::counts transition_after; stream=StreamId(121); state=State { inner: Open { local: AwaitingHeaders, remote: Streaming } }; is_closed=false; pending_send_empty=true; buffered_send_data=0; num_recv=1; num_send=0
TRCE h2::proto::settings send_pending_ack; pending=None
TRCE h2::proto::connection recv DATA; frame=Data { stream_id: StreamId(121), flags: DataFlags { end_stream: true }, pad_len: None }
TRCE h2::proto::streams::recv recv_data; size=67; connection=61515; stream=65535
TRCE h2::proto::streams::flow_control send_data; sz=67; window=61515; available=65535
TRCE h2::proto::streams::flow_control send_data; sz=67; window=65535; available=65535
TRCE h2::proto::streams::state recv_close: Open => HalfClosedRemote(AwaitingHeaders)
TRCE h2::proto::streams::counts transition_after; stream=StreamId(121); state=State { inner: HalfClosedRemote(AwaitingHeaders) }; is_closed=false; pending_send_empty=true; buffered_send_data=0; num_recv=1; num_send=0
TRCE h2::proto::settings send_pending_ack; pending=None
TRCE h2::proto::streams::prioritize try reclaim frame
TRCE h2::proto::streams::prioritize poll_complete
TRCE h2::proto::streams::prioritize schedule_pending_open
TRCE h2::proto::streams::prioritize pop_frame
TRCE h2::proto::streams::prioritize try reclaim frame
TRCE h2::proto::streams::streams next_incoming; id=StreamId(121), state=State { inner: HalfClosedRemote(AwaitingHeaders) }
TRCE conduit_proxy::inbound recognize local=10.1.1.165:4143 orig=Some(V4(10.1.1.165:9090))
TRCE conduit_proxy::inbound recognize key=Some((V4(10.1.1.165:9090), Http2))
TRCE h2::proto::settings send_pending_ack; pending=None
TRCE h2::proto::streams::prioritize try reclaim frame
TRCE h2::proto::streams::prioritize poll_complete
TRCE h2::proto::streams::prioritize schedule_pending_open
TRCE h2::proto::streams::prioritize pop_frame
TRCE h2::proto::streams::prioritize try reclaim frame
TRCE conduit_proxy::bind poll_ready: stack already bound
TRCE conduit_proxy::bind poll_ready: stack already bound
WARN conduit_proxy::bind connect error to Endpoint { address: V4(10.1.1.165:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)

@olix0r olix0r changed the title investigate low throughput after gRPC connections disconnect Proxy may not gracefully close all streams May 25, 2018
@olix0r
Copy link
Member

olix0r commented May 25, 2018

tap helps to shed some light on this:

req id=0:19 src=10.1.1.169:46564 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:19 src=10.1.1.169:46564 dst=10.1.1.168:9090 :status=200 latency=17412µs
end id=0:19 src=10.1.1.169:46564 dst=10.1.1.168:9090 grpc-status=OK duration=939µs response-length=79B
req id=0:20 src=10.1.1.169:46564 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
req id=0:21 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:21 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=51902µs
end id=0:21 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=14018µs response-length=79B
req id=0:22 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:22 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=32939µs
end id=0:22 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=4336µs response-length=79B
req id=0:23 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:23 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=19897µs
end id=0:23 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=3229µs response-length=79B
req id=0:24 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:24 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=17146µs
end id=0:24 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=3585µs response-length=79B
req id=0:25 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:25 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=11327µs
end id=0:25 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=2187µs response-length=79B
req id=0:26 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:26 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=15840µs
end id=0:26 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=3922µs response-length=79B
req id=0:27 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:27 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=14329µs
end id=0:27 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=2384µs response-length=79B
req id=0:28 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:28 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=19303µs
end id=0:28 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=33275µs response-length=79B
req id=0:29 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:29 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=19966µs
end id=0:29 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=5574µs response-length=79B
req id=0:30 src=10.1.1.169:46814 dst=10.1.1.168:9090 :method=POST :authority=server.bb.svc.cluster.local:9090 :path=/buoyantio.bb.TheService/theFunction
rsp id=0:30 src=10.1.1.169:46814 dst=10.1.1.168:9090 :status=200 latency=25263µs
end id=0:30 src=10.1.1.169:46814 dst=10.1.1.168:9090 grpc-status=OK duration=8390µs response-length=79B

It looks like request id=0:20 never receives an end event. More investigation is needed to determine exactly when this request is routed through the server proxy.

@olix0r
Copy link
Member

olix0r commented May 28, 2018

Request 1

mid sends a request to srv, which connects to its application and sends the request.
The application responds and sends a GOAWAY, closing the connection. srv continues to
send the response to mid.

2018-05-28T19:14:51.052271043Z mid TRCE conduit_proxy::transparency::server server={proxy=inbound; peer=10.1.1.197:56536} transparency detected HTTP/1
2018-05-28T19:14:51.058250699Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:51.058881815Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } }
2018-05-28T19:14:51.060108926Z mid TRCE conduit_proxy::control::destination server={proxy=outbound; peer=10.1.1.196:51912} resolve; authority=DnsNameAndPort { host: Name("server.bb.svc.cluster.local"), port: 9090 }
2018-05-28T19:14:51.093466316Z mid TRCE conduit_proxy::control::destination Out { dest: Hostname(DnsNameAndPort { host: Name("server.bb.svc.cluster.local"), port: 9090 }), proto: Http2 }, watch: Ok(Ready(Some(Insert(V4(10.1.1.195:9090), Metadata { metric_labels: Some(DstLabels { formatted: "dst_conduit_io_control_plane_ns=\"conduit\",dst_deployment=\"server\",dst_namespace=\"bb\",dst_pod=\"server-65d49dd9dc-b982b\",dst_pod_template_hash=\"2180588587\",dst_service=\"server\"", original: {"pod": "server-65d49dd9dc-b982b", "service": "server", "deployment": "server", "pod_template_hash": "2180588587", "namespace": "bb", "conduit_io_control_plane_ns": "conduit"} }) }))))
2018-05-28T19:14:51.098327394Z srv TRCE conduit_proxy::transparency::server server={proxy=inbound; peer=10.1.1.196:52140} transparency detected HTTP/2
2018-05-28T19:14:51.105399115Z mid TRCE h2::proto::connection client={proxy=outbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:14:51.107034500Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:51.110369301Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52140} recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:14:51.110386780Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52140} recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }
2018-05-28T19:14:51.111280580Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52140} recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:51.115283609Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52140} recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } }
2018-05-28T19:14:51.116944451Z srv TRCE h2::proto::connection client={proxy=inbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:14:51.117206440Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } })
2018-05-28T19:14:51.117234632Z mid TRCE h2::proto::connection client={proxy=outbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }
2018-05-28T19:14:51.117450564Z srv TRCE h2::proto::streams::prioritize client={proxy=inbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:51.119549784Z srv TRCE h2::proto::streams::prioritize client={proxy=inbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } })
2018-05-28T19:14:51.126360894Z srv TRCE h2::proto::connection client={proxy=inbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }
2018-05-28T19:14:51.129624471Z srv TRCE h2::proto::connection client={proxy=inbound}, recv WINDOW_UPDATE; frame=WindowUpdate { stream_id: StreamId(0), size_increment: 66 }
2018-05-28T19:14:51.129817128Z srv TRCE h2::proto::connection client={proxy=inbound}, recv PING; frame=Ping { ack: false, payload: [2, 4, 16, 16, 9, 14, 7, 7] }
2018-05-28T19:14:51.130244793Z srv TRCE h2::proto::connection client={proxy=inbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:51.130398366Z srv TRCE h2::proto::connection client={proxy=inbound}, recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags }
2018-05-28T19:14:51.130982498Z srv TRCE h2::proto::connection client={proxy=inbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } }
2018-05-28T19:14:51.134926114Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52140} writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:51.137176403Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52140} writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags })
2018-05-28T19:14:51.138137300Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52140} writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } })
2018-05-28T19:14:51.162318439Z mid TRCE h2::proto::connection client={proxy=outbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:51.162774663Z srv TRCE h2::proto::connection client={proxy=inbound}, recv GOAWAY; frame=GoAway { last_stream_id: StreamId(2147483647), error_code: NO_ERROR }
2018-05-28T19:14:51.162783551Z srv TRCE h2::proto::connection client={proxy=inbound}, recv PING; frame=Ping { ack: false, payload: [1, 6, 1, 8, 0, 3, 3, 9] }
2018-05-28T19:14:51.162798245Z mid TRCE h2::proto::connection client={proxy=outbound}, recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags }
2018-05-28T19:14:51.168062151Z mid TRCE h2::proto::connection client={proxy=outbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } }
2018-05-28T19:14:51.180433821Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:51.181867031Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags })
2018-05-28T19:14:51.182854360Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } })
2018-05-28T19:14:51.185756724Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv WINDOW_UPDATE; frame=WindowUpdate { stream_id: StreamId(0), size_increment: 78 }
2018-05-28T19:14:51.185821714Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv PING; frame=Ping { ack: false, payload: [2, 4, 16, 16, 9, 14, 7, 7] }

Request 2

mid sends another request to srv.

As mid routes the request, it processes service discovery updates reflecting the fact
that srv's application restarted. Because the node is removed and added immediately,
mid establishes anew connection to srv:

2018-05-28T19:14:52.047130374Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv HEADERS; frame=Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:52.050584771Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv DATA; frame=Data { stream_id: StreamId(3), flags: DataFlags { end_stream: true } }
2018-05-28T19:14:52.051841616Z mid TRCE conduit_proxy::control::destination Out { dest: Hostname(DnsNameAndPort { host: Name("server.bb.svc.cluster.local"), port: 9090 }), proto: Http2 }, watch: Ok(Ready(Some(Remove(V4(10.1.1.195:9090)))))
2018-05-28T19:14:52.470053615Z mid TRCE conduit_proxy::control::destination Out { dest: Hostname(DnsNameAndPort { host: Name("server.bb.svc.cluster.local"), port: 9090 }), proto: Http2 }, watch: Ok(Ready(Some(Insert(V4(10.1.1.195:9090), Metadata { metric_labels: Some(DstLabels { formatted: "dst_conduit_io_control_plane_ns=\"conduit\",dst_deployment=\"server\",dst_namespace=\"bb\",dst_pod=\"server-65d49dd9dc-b982b\",dst_pod_template_hash=\"2180588587\",dst_service=\"server\"", original: {"pod": "server-65d49dd9dc-b982b", "service": "server", "deployment": "server", "pod_template_hash": "2180588587", "namespace": "bb", "conduit_io_control_plane_ns": "conduit"} }) }))))
2018-05-28T19:14:52.481024954Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:52.490801480Z srv TRCE conduit_proxy::transparency::server server={proxy=inbound; peer=10.1.1.196:52162} transparency detected HTTP/2
2018-05-28T19:14:52.494736315Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52162} recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:14:52.495417989Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } })

Again, srv connects to its application and sends the request. The application responds
and sends a GOAWAY, closing the connection. srv continues to send the response to mid.

2018-05-28T19:14:52.497711748Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52162} recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:52.498858277Z mid TRCE h2::proto::connection client={proxy=outbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:14:52.499958200Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52162} recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } }
2018-05-28T19:14:52.504755095Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52162} recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }
2018-05-28T19:14:52.510992604Z mid TRCE h2::proto::connection client={proxy=outbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }
2018-05-28T19:14:52.526728635Z srv TRCE h2::proto::streams::prioritize client={proxy=inbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:52.537449895Z srv TRCE h2::proto::streams::prioritize client={proxy=inbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } })
2018-05-28T19:14:52.552733742Z srv TRCE h2::proto::connection client={proxy=inbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:14:52.563941837Z srv TRCE h2::proto::connection client={proxy=inbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }
2018-05-28T19:14:52.564048148Z srv TRCE h2::proto::connection client={proxy=inbound}, recv WINDOW_UPDATE; frame=WindowUpdate { stream_id: StreamId(0), size_increment: 66 }
2018-05-28T19:14:52.571657606Z srv TRCE h2::proto::connection client={proxy=inbound}, recv PING; frame=Ping { ack: false, payload: [2, 4, 16, 16, 9, 14, 7, 7] }
2018-05-28T19:14:52.579655443Z srv TRCE h2::proto::connection client={proxy=inbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:52.581484835Z srv TRCE h2::proto::connection client={proxy=inbound}, recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags }
2018-05-28T19:14:52.585093113Z srv TRCE h2::proto::connection client={proxy=inbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } }
2018-05-28T19:14:52.585180634Z srv TRCE h2::proto::connection client={proxy=inbound}, recv GOAWAY; frame=GoAway { last_stream_id: StreamId(2147483647), error_code: NO_ERROR }
2018-05-28T19:14:52.585189910Z srv TRCE h2::proto::connection client={proxy=inbound}, recv PING; frame=Ping { ack: false, payload: [1, 6, 1, 8, 0, 3, 3, 9] }
2018-05-28T19:14:52.596526701Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52162} writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:52.607168897Z mid TRCE h2::proto::connection client={proxy=outbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:52.607208425Z mid TRCE h2::proto::connection client={proxy=outbound}, recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags }
2018-05-28T19:14:52.607209737Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52162} writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags })
2018-05-28T19:14:52.607240388Z mid TRCE h2::proto::connection client={proxy=outbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } }
2018-05-28T19:14:52.607260469Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52162} writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } })
2018-05-28T19:14:52.624208705Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Headers(Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:52.625716796Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Data(Data { stream_id: StreamId(3), flags: DataFlags })
2018-05-28T19:14:52.626403215Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Headers(Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: true } })
2018-05-28T19:14:52.630189717Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv WINDOW_UPDATE; frame=WindowUpdate { stream_id: StreamId(0), size_increment: 78 }
2018-05-28T19:14:52.631078667Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv PING; frame=Ping { ack: false, payload: [2, 4, 16, 16, 9, 14, 7, 7] }
2018-05-28T19:14:53.062622642Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:14:53.074354565Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Data(Data { stream_id: StreamId(3), flags: DataFlags { end_stream: true } })

Request 3

mid sends another request to srv. This time, a destination removal event hasn't been
received, so the request goes on the same connection as last time. However, Kubernetes
puts the application in CrashLoopBackoff so the proxy can't forward the request and it
eventually times out:

2018-05-28T19:14:53.044642518Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv HEADERS; frame=Headers { stream_id: StreamId(5), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:53.047798177Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv DATA; frame=Data { stream_id: StreamId(5), flags: DataFlags { end_stream: true } }
2018-05-28T19:14:53.062282130Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52162} recv HEADERS; frame=Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:14:53.076237679Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52162} recv DATA; frame=Data { stream_id: StreamId(3), flags: DataFlags { end_stream: true } }
2018-05-28T19:14:53.071404760Z srv WARN conduit_proxy::bind   In { addr: V4(10.1.1.195:9090), proto: Http2 }, connect error to Endpoint { address: V4(10.1.1.195:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)
2018-05-28T19:15:02.071820317Z srv DBUG conduit_proxy::bind   In { addr: V4(10.1.1.195:9090), proto: Http2 }, connect error to Endpoint { address: V4(10.1.1.195:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)
2018-05-28T19:15:03.033557940Z mid ERR! conduit_proxy serve={proxy=outbound; addr=127.0.0.1:4140}, turning operation timed out after 10s into 500
2018-05-28T19:15:03.040195266Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Reset(Reset { stream_id: StreamId(3), error_code: CANCEL })
2018-05-28T19:15:03.043722011Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52162} recv RST_STREAM; frame=Reset { stream_id: StreamId(3), error_code: CANCEL }

Request 4

mid sends another request to srv. This time, the only endpoint is removed from service
discovery as we attempt to route it.

2018-05-28T19:15:03.043738152Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Headers(Headers { stream_id: StreamId(5), flags: HeadersFlag { end_stream: true } })
2018-05-28T19:15:03.047542662Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv RST_STREAM; frame=Reset { stream_id: StreamId(5), error_code: CANCEL }
2018-05-28T19:15:03.061682240Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv HEADERS; frame=Headers { stream_id: StreamId(7), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:15:03.066603938Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv DATA; frame=Data { stream_id: StreamId(7), flags: DataFlags { end_stream: true } }
2018-05-28T19:15:03.075574493Z mid TRCE conduit_proxy::control::destination Out { dest: Hostname(DnsNameAndPort { host: Name("server.bb.svc.cluster.local"), port: 9090 }), proto: Http2 }, watch: Ok(Ready(Some(Remove(V4(10.1.1.195:9090)))))

But the endpoint is restored before the bind timeout is reached. As the endpoint is
restored, a new connection is established from mid to srv and the request is issued

2018-05-28T19:15:07.730676665Z mid TRCE conduit_proxy::control::destination Out { dest: Hostname(DnsNameAndPort { host: Name("server.bb.svc.cluster.local"), port: 9090 }), proto: Http2 }, watch: Ok(Ready(Some(Insert(V4(10.1.1.195:9090), Metadata { metric_labels: Some(DstLabels { formatted: "dst_conduit_io_control_plane_ns=\"conduit\",dst_deployment=\"server\",dst_namespace=\"bb\",dst_pod=\"server-65d49dd9dc-b982b\",dst_pod_template_hash=\"2180588587\",dst_service=\"server\"", original: {"namespace": "bb", "service": "server", "pod_template_hash": "2180588587", "deployment": "server", "conduit_io_control_plane_ns": "conduit", "pod": "server-65d49dd9dc-b982b"} }) }))))
2018-05-28T19:15:07.738428186Z srv TRCE conduit_proxy::transparency::server server={proxy=inbound; peer=10.1.1.196:52314} transparency detected HTTP/2
2018-05-28T19:15:07.743387003Z mid TRCE h2::proto::connection client={proxy=outbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:15:07.744950182Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:15:07.748623048Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52314} recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:15:07.752403469Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } })
2018-05-28T19:15:07.753386678Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52314} recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }
2018-05-28T19:15:07.756102494Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52314} recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:15:07.760171124Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52314} recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags { end_stream: true } }
2018-05-28T19:15:07.761401808Z mid TRCE h2::proto::connection client={proxy=outbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }

Here's where things get weird: somehow two requests are issued to the application. We
only send request headers for the second request, however. The first request is never
fully sent so, even though the application issues a GoAway, it does not terminate since
it's waiting for this request to complete.

Perhaps Request 3, which was canceled, is somehow queued and partially sent?

2018-05-28T19:15:08.098993188Z srv TRCE h2::proto::streams::prioritize client={proxy=inbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:15:08.100790694Z srv TRCE h2::proto::streams::prioritize client={proxy=inbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:15:08.119429334Z srv TRCE h2::proto::connection client={proxy=inbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(0), ... }
2018-05-28T19:15:08.120500366Z srv TRCE h2::proto::connection client={proxy=inbound}, recv SETTINGS; frame=Settings { flags: SettingsFlags(1), ... }
2018-05-28T19:15:08.123763008Z srv TRCE h2::proto::streams::prioritize client={proxy=inbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Data(Data { stream_id: StreamId(3), flags: DataFlags { end_stream: true } })
2018-05-28T19:15:08.127286629Z srv TRCE h2::proto::connection client={proxy=inbound}, recv WINDOW_UPDATE; frame=WindowUpdate { stream_id: StreamId(0), size_increment: 66 }
2018-05-28T19:15:08.127964826Z srv TRCE h2::proto::connection client={proxy=inbound}, recv PING; frame=Ping { ack: false, payload: [2, 4, 16, 16, 9, 14, 7, 7] }
2018-05-28T19:15:08.128894074Z srv TRCE h2::proto::connection client={proxy=inbound}, recv HEADERS; frame=Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:15:08.131167852Z srv TRCE h2::proto::connection client={proxy=inbound}, recv DATA; frame=Data { stream_id: StreamId(3), flags: DataFlags }
2018-05-28T19:15:08.133010468Z srv TRCE h2::proto::connection client={proxy=inbound}, recv HEADERS; frame=Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: true } }
2018-05-28T19:15:08.134929940Z srv TRCE h2::proto::connection client={proxy=inbound}, recv GOAWAY; frame=GoAway { last_stream_id: StreamId(2147483647), error_code: NO_ERROR }
2018-05-28T19:15:08.135516526Z srv TRCE h2::proto::connection client={proxy=inbound}, recv PING; frame=Ping { ack: false, payload: [1, 6, 1, 8, 0, 3, 3, 9] }
2018-05-28T19:15:08.154145304Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52314} writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:15:08.161287802Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52314} writing frame=Frame::Data(Data { stream_id: StreamId(1), flags: DataFlags })
2018-05-28T19:15:08.164750313Z srv TRCE h2::proto::streams::prioritize server={proxy=inbound; peer=10.1.1.196:52314} writing frame=Frame::Headers(Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } })
2018-05-28T19:15:08.164774155Z srv TRCE h2::proto::connection client={proxy=inbound}, recv GOAWAY; frame=GoAway { last_stream_id: StreamId(3), error_code: NO_ERROR }
2018-05-28T19:15:08.164781915Z mid TRCE h2::proto::connection client={proxy=outbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:15:08.164797485Z mid TRCE h2::proto::connection client={proxy=outbound}, recv DATA; frame=Data { stream_id: StreamId(1), flags: DataFlags }
2018-05-28T19:15:08.164821074Z mid TRCE h2::proto::connection client={proxy=outbound}, recv HEADERS; frame=Headers { stream_id: StreamId(1), flags: HeadersFlag { end_stream: true } }
2018-05-28T19:15:08.176461219Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Headers(Headers { stream_id: StreamId(7), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:15:08.176547355Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Data(Data { stream_id: StreamId(7), flags: DataFlags })
2018-05-28T19:15:08.176589472Z mid TRCE h2::proto::streams::prioritize server={proxy=outbound; peer=10.1.1.196:51912} writing frame=Frame::Headers(Headers { stream_id: StreamId(7), flags: HeadersFlag { end_stream: true } })
2018-05-28T19:15:08.182875364Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv WINDOW_UPDATE; frame=WindowUpdate { stream_id: StreamId(0), size_increment: 78 }
2018-05-28T19:15:08.184368051Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv PING; frame=Ping { ack: false, payload: [2, 4, 16, 16, 9, 14, 7, 7] }
2018-05-28T19:15:08.197935769Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv HEADERS; frame=Headers { stream_id: StreamId(9), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:15:08.203245306Z mid TRCE h2::proto::connection server={proxy=outbound; peer=10.1.1.196:51912} recv DATA; frame=Data { stream_id: StreamId(9), flags: DataFlags { end_stream: true } }
2018-05-28T19:15:08.212146861Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Headers(Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: false } })
2018-05-28T19:15:08.217834309Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52314} recv HEADERS; frame=Headers { stream_id: StreamId(3), flags: HeadersFlag { end_stream: false } }
2018-05-28T19:15:08.237250796Z mid TRCE h2::proto::streams::prioritize client={proxy=outbound; addr=10.1.1.195:9090; proto=Http2}, writing frame=Frame::Data(Data { stream_id: StreamId(3), flags: DataFlags { end_stream: true } })
2018-05-28T19:15:08.249470148Z srv TRCE h2::proto::connection server={proxy=inbound; peer=10.1.1.196:52314} recv DATA; frame=Data { stream_id: StreamId(3), flags: DataFlags { end_stream: true } }
2018-05-28T19:15:08.254502064Z srv WARN conduit_proxy::bind   In { addr: V4(10.1.1.195:9090), proto: Http2 }, connect error to Endpoint { address: V4(10.1.1.195:9090), dst_labels: None }: Error attempting to establish underlying session layer: Connection refused (os error 111)

@olix0r
Copy link
Member

olix0r commented May 29, 2018

After some discussion with @seanmonstar, we think that tower-h2's server likely needs to be updated to handle cancellation. tower-rs/tower-h2#29

@olix0r olix0r changed the title Proxy may not gracefully close all streams Proxy does not honor request cancellation May 29, 2018
seanmonstar added a commit that referenced this issue Jun 1, 2018
This includes the changes that should detect when a client sends a `RST_STREAM`, and cancels our pending response future or streaming body.

Closes #986
khappucino pushed a commit to Nordstrom/linkerd2 that referenced this issue Mar 5, 2019
This includes the changes that should detect when a client sends a `RST_STREAM`, and cancels our pending response future or streaming body.

Closes linkerd#986
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants