Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar send unexpected RST packet to ingress gateway, which result in intermittent 503 at client side during upload files. #36489

Closed
3 of 15 tasks
flyingyang opened this issue Dec 13, 2021 · 1 comment

Comments

@flyingyang
Copy link
Contributor

Bug Description

We have a storage pod running in our cluster with istio injection enabled and get some intermittent 503 UC errors at client side during upload 300M files.

Our topology is below, we have an ingress gateway to manage all the traffic entering in our mesh.
image

First, we check the envoy access log in gateway and sidecar, we see the 503 UC response in gateway side, but not see any 503 in sidecar side. So we think the problem maybe occur in the connection between ingress gateway and sidecar.
Then, we tune the evnoy log level to debug and found this: The ingress gateway's connection is terminated by sidecar, the sidecar receive the "Connection reset by peer" msg.

  • ingress gateway envoy log
2021-12-07T08:37:28.358739Z debug   envoy http2 [C488678497] updating connection-level initial window size to 268435456
2021-12-07T08:37:28.358906Z debug   envoy http  [C488678497] new stream
2021-12-07T08:37:28.358940Z debug   envoy http  [C488678497][S10432259314075048153] request headers complete (end_stream=false):
':method', 'PUT'
':path', '/random300M_1'
':scheme', 'https'
':authority', ''test.example.com:8443'
'user-agent', 'curl/7.58.0'
'accept', '*/*'
'content-length', '314572800'
'content-type', 'application/x-www-form-urlencoded'

2021-12-07T08:37:28.359015Z debug   envoy router    [C488678497][S10432259314075048153] cluster 'outbound|443||storage-internal-svc.test.svc.kubernetes.io' match for URL '/random300M_1'
2021-12-07T08:37:28.359053Z debug   envoy router    [C488678497][S10432259314075048153] router decoding headers:
':method', 'PUT'
':path', '/random300M_1'
':scheme', 'https'
':authority', 'test.example.com:8443'
'user-agent', 'curl/7.58.0'
'accept', '*/*'
'content-length', '314572800'
'content-type', 'application/x-www-form-urlencoded'
'x-forwarded-proto', 'https'
'x-envoy-internal', 'true'
'x-request-id', 'd0784d05-2b65-4121-b309-a28fe69c91b0'
2021-12-07T08:37:28.367909Z debug   envoy router    [C488678497][S10432259314075048153] pool ready
2021-12-07T08:37:28.367956Z debug   envoy http  [C488678497][S10432259314075048153] Read-disabling downstream stream due to filter callbacks.
2021-12-07T08:37:28.430711Z debug   envoy http2 [C488678497] Stream 1 disabled, unconsumed_bytes 0 read_disable_count 0
...
2021-12-07T08:37:29.800316Z debug   envoy http  [C488678497][S10432259314075048153] Read-enabling downstream stream due to filter callbacks.
2021-12-07T08:37:29.800325Z debug   envoy http2 [C488678497] Stream 1 enabled, unconsumed_bytes 239943680 read_disable_count 1
2021-12-07T08:37:29.800409Z debug   envoy router    [C488678497][S10432259314075048153] upstream reset: reset reason: connection termination, transport failure reason:
2021-12-07T08:37:29.800486Z debug   envoy http  [C488678497][S10432259314075048153] Sending local reply with details upstream_reset_before_response_started{connection termination}
2021-12-07T08:37:29.800523Z debug   envoy http  [C488678497][S10432259314075048153] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '95'
'content-type', 'text/plain'
'date', 'Tue, 07 Dec 2021 08:37:29 GMT'
2021-12-07T08:37:29.800557Z debug   envoy http  [C488678497][S10432259314075048153] doEndStream() resetting stream
2021-12-07T08:37:29.800592Z debug   envoy http  [C488678497][S10432259314075048153] stream reset
2021-12-07T08:37:29.810355Z debug   envoy http2 [C488678497] sent reset code=0
2021-12-07T08:37:29.810375Z debug   envoy http2 [C488678497] stream closed: 0
2021-12-07T08:37:29.828192Z debug   envoy connection    [C488678497] remote close
2021-12-07T08:37:29.828199Z debug   envoy connection    [C488678497] closing socket: 0
2021-12-07T08:37:29.828212Z debug   envoy connection    [C488678497] SSL shutdown: rc=1
2021-12-07T08:37:29.828237Z debug  envoy conn_handler  [C488678497] adding to cleanup list
  • sidecar envoy log
2021-12-08T09:56:53.295405Z     debug   envoy http      [C3211] new stream
2021-12-08T09:56:53.295478Z     debug   envoy http      [C3211][S10234019614774064697] request headers complete (end_stream=false):
2021-12-08T09:56:53.295613Z     debug   envoy filter    [C3211] validateX509 mode PERMISSIVE: ssl=true, has_user=true
2021-12-08T09:56:53.295616Z     debug   envoy filter    [C3211] trust domain validation skipped
2021-12-08T09:56:53.295676Z     debug   envoy router    [C3211][S10234019614774064697] cluster 'inbound|9000||' match for URL '/random300M_1'
2021-12-08T09:56:53.295711Z     debug   envoy router    [C3211][S10234019614774064697] router decoding headers:
2021-12-08T09:56:53.295729Z     debug   envoy router    [C3211][S10234019614774064697] pool ready
2021-12-08T09:56:53.300136Z     debug   envoy http      [C3211][S10234019614774064697] Read-disabling downstream stream due to filter callbacks.
...
2021-12-08T09:56:53.860992Z     debug   envoy http      [C3211][S10234019614774064697] Read-enabling downstream stream due to filter callbacks.
2021-12-08T09:56:53.862350Z     debug   envoy http      [C3211][S10234019614774064697] Read-disabling downstream stream due to filter callbacks.
2021-12-08T09:56:53.864464Z     debug   envoy http      [C3211][S10234019614774064697] Read-enabling downstream stream due to filter callbacks.
**2021-12-08T09:56:53.865059Z     debug   envoy misc  Unknown error code 104 details Connection reset by peer**
2021-12-08T09:56:53.865079Z     debug   envoy connection        [C3211] remote close
2021-12-08T09:56:53.865081Z     debug   envoy connection        [C3211] closing socket: 0
 2021-12-08T09:56:53.865089Z    debug   envoy misc  Unknown error code 32 details Broken pipe
**2021-12-08T09:56:53.865091Z     debug   envoy connection        [C3211] SSL shutdown: rc=-1**
2021-12-08T09:56:53.865109Z     debug   envoy http      [C3211][S10234019614774064697] stream reset
2021-12-08T09:56:53.865405Z     debug   envoy router    [C3211][S10234019614774064697] resetting pool request
2021-12-08T09:56:53.865633Z     debug   envoy conn_handler      [C3211] adding to cleanup list

The ip 10.0.0.1 is the gateway's pod ip, the ip 10.0.02 is the application's pod ip, the application open its 9000 port.
Next, we do the tcpdump in the application node, and find something strange. We fount the first RST packet is sidecar send to the ingress gateway and the sidecar envoy didn't know that.
The sidecar think the ingress gateway close the connection and then send FIN/RST to application.

05:34:34.814254 IP 10.0.0.1.9000 > 10.0.0.2.53304: Flags [R], seq 3142962686, win 0, length 0
05:34:34.814989 IP 10.0.0.1.9000 > 10.0.0.2.53304: Flags [R], seq 3142962686, win 0, length 0
05:34:34.815045 IP 10.0.0.2.53304 > 10.0.0.1.9000: Flags [R], seq 1351926695, win 0, length 0
05:34:34.815536 IP 127.0.0.1.57088 > 127.0.0.1.9000: Flags [F.], seq 1128675839, ack 912453698, win 256, options [nop,nop,TS val 397913316 ecr 397912839], length 0
05:34:34.826703 IP 127.0.0.1.57088 > 127.0.0.1.9000: Flags [R], seq 1128675840, win 0, length 0

The frequency of this problem is 2%, we try to minus the sidecar worker num to 1 and disable mTLS, this issue still exist.

Dose this issue occurred before?

Version

istio version: 1.10
kubectl version: 1.18.8

Additional Information

No response

Affected product area

  • Docs
  • Installation
  • Networking
  • Performance and Scalability
  • Extensions and Telemetry
  • Security
  • Test and Release
  • User Experience
  • Developer Infrastructure
  • Upgrade
  • Multi Cluster
  • Virtual Machine
  • Control Plane Revisions

Is this the right place to submit this?

  • This is not a security vulnerability
  • This is not a question about how to use Istio
@flyingyang
Copy link
Contributor Author

The root cause is the invalid state packets are sent to the application. Let us move to the PR talk about this. #36536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant