Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy proxy not working for HTTP/1.1 requests which are multi-part form uploads #13240

Closed
tariq1890 opened this issue Sep 23, 2020 · 12 comments
Closed
Labels
area/http investigate Potential bug that needs verification stale stalebot believes this issue/PR has not been touched recently

Comments

@tariq1890
Copy link

Description:

We are facing issues when two services (which are sidecar'ed with envoy proxies) communicate over HTTP and attempt a multi-part POST requests. Please note that other POST and GET requests work fine with no issues.

We get the generic envoy 503 error: upstream connect error or disconnect/reset before headers. I am unable to understand what exactly went wrong from this error.

These envoy proxies are managed through Istio and there have been multiple issues posted highlighting this issue:

istio/istio#27423
istio/istio#27373
istio/istio#25734

Any insights on this would be appreciated. What do we look for? What most likely could have gone wrong as far as envoy proxy configuration is concerned?

@tariq1890 tariq1890 added bug triage Issue requires triage labels Sep 23, 2020
@tariq1890
Copy link
Author

Please refer to this link for debug logs: istio/istio#27423 (comment)

@yanavlasov
Copy link
Contributor

I have looked at the log and it seems the the upstream server has closed connection about 500ms after it was established and request was sent. You may want to look at the logs on the upstream service to see if there was a reason for closing the connection.

10:09:03  2020-09-18T17:08:34.099062Z	debug	envoy pool	queueing request due to no available connections
10:09:03  2020-09-18T17:08:34.099066Z	debug	envoy pool	creating a new connection
10:09:03  2020-09-18T17:08:34.099126Z	debug	envoy client	[C28] connecting
10:09:03  2020-09-18T17:08:34.099133Z	debug	envoy connection	[C28] connecting to 100.96.104.65:3000
10:09:03  2020-09-18T17:08:34.102893Z	debug	envoy connection	[C28] connection in progress
10:09:03  2020-09-18T17:08:34.103020Z	debug	envoy http	[C27][S2136150783590076220] request end stream
10:09:03  2020-09-18T17:08:34.103041Z	debug	envoy connection	[C28] connected
10:09:03  2020-09-18T17:08:34.103080Z	debug	envoy connection	[C28] handshake expecting read
10:09:03  2020-09-18T17:08:34.106345Z	debug	envoy connection	[C28] handshake expecting read
10:09:03  2020-09-18T17:08:34.106359Z	debug	envoy connection	[C28] handshake expecting read
10:09:03  2020-09-18T17:08:34.106676Z	debug	envoy connection	[C28] handshake complete
10:09:03  2020-09-18T17:08:34.106709Z	debug	envoy client	[C28] connected
10:09:03  2020-09-18T17:08:34.106716Z	debug	envoy pool	[C28] attaching to next request
10:09:03  2020-09-18T17:08:34.106722Z	debug	envoy pool	[C28] creating stream
10:09:03  2020-09-18T17:08:34.106734Z	debug	envoy router	[C27][S2136150783590076220] pool ready
...
10:09:03  2020-09-18T17:08:50.683095Z	debug	envoy connection	[C28] 
10:09:03  2020-09-18T17:08:50.683120Z	debug	envoy connection	[C28] remote close
10:09:03  2020-09-18T17:08:50.683124Z	debug	envoy connection	[C28] closing socket: 0
10:09:03  2020-09-18T17:08:50.683182Z	debug	envoy connection	[C28] SSL shutdown: rc=0
10:09:03  2020-09-18T17:08:50.683194Z	debug	envoy connection	[C28] 
10:09:03  2020-09-18T17:08:50.683216Z	debug	envoy client	[C28] disconnect. resetting 1 pending request
10:09:03  2020-09-18T17:08:50.683228Z	debug	envoy client	[C28] request reset
10:09:03  2020-09-18T17:08:50.683240Z	debug	envoy router	[C27][S2136150783590076220] upstream reset: reset reason connection termination

@yanavlasov yanavlasov added investigate Potential bug that needs verification area/http and removed bug triage Issue requires triage labels Sep 24, 2020
@tariq1890
Copy link
Author

Does Envoy enforce a 500ms timeout by default? The upstream service is known to take about 5000ms since this is a data upload.

When I look at the application logs, it does look like the request is received, but it eventually gets flooded with the same requests, because of envoy's retry policy and eventually returns a 503

@tariq1890
Copy link
Author

Would trace logs give us more info here? I usually shy away trace logs because it ends up emitting way too much info

@yanavlasov
Copy link
Contributor

I does not look like Envoy's timeout, since it is the remote (from Envoy's perspective) end that closed the connection. Are you suggesting to enable trace logging on Envoy? I doubt it would yield more information. I think the key piece is why the remove end had closed the connection.

@tariq1890
Copy link
Author

@yanavlasov We figured out the issue. It was caused by the Envoy proxy OOMing on the server's end. Looks like the server's envoy needed more memory to process the multipart request.

I don't claim to be any envoy expert, but what I don't get is that the client-side envoy faces no memory issues while trying to send this multi-part request. Isn't the volume of data received same as that of what is sent?

@yanavlasov
Copy link
Contributor

yanavlasov commented Sep 30, 2020

Are both Envoys configured the same? Especially buffer sizes? If the buffers are too large it is possible that one Envoy quickly pushes too much data into its peer causing it to run out of memory.

@github-actions
Copy link

github-actions bot commented Dec 9, 2020

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 9, 2020
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@sp0cket
Copy link

sp0cket commented Mar 21, 2021

Can we please reopen this? I have the similar issue with

And I tested by curl with a tiny 204 bytes file

Test detail:

  • http2 json post/get/delete... -> istio ingress -> istio sidecar -> grpc-json-transcoder -> grpc service success
  • http2 post form data -> istio ingress -> istio sidecar -> grpc-json-transcoder -> grpc service error with EOF
  • http1.1 json post/get/delete... -> istio ingress -> istio sidecar -> grpc-json-transcoder -> grpc service success
  • http1.1 post form data -> istio ingress -> istio sidecar -> grpc-json-transcoder -> grpc service success
  • http2 post form data -> istio ingress -> istio sidecar -> http-bin success

@shortwavedave
Copy link

Hitting this issue, anyone know of a workaround?

@pranavelric
Copy link

When attempting to upload a file, I receive a response_code_details : 'downstream_remote_disconnect' and a response_code: 0. Does anyone know of any potential workarounds for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/http investigate Potential bug that needs verification stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

5 participants