Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elevated server closed the stream without sending trailers after enabling oghttp2 #32371

Closed
milton0825 opened this issue Feb 13, 2024 · 14 comments
Assignees
Labels
area/http bug stale stalebot believes this issue/PR has not been touched recently

Comments

@milton0825
Copy link
Contributor

milton0825 commented Feb 13, 2024

Title: Elevated server closed the stream without sending trailers after enabling oghttp2

Description:
We are seeing elevated level of server closed the stream without sending trailers in grpc-go after enabling oghttp2 via
envoy.reloadable_features.http2_use_oghttp2.

Repro steps:

  1. Enable oghttp2 by setting envoy.reloadable_features.http2_use_oghttp2 to true.
  2. Run GRPC service with Envoy version of 20c7368afa9d686a109f9601ae1b9b6028b74b0a.

Admin and Stats Output:

Include the admin output for the following endpoints: /stats,
/clusters, /routes, /server_info. For more information, refer to the
admin endpoint documentation.

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Config:

Include the config used to configure Envoy.

Logs:

Include the access logs and the Envoy logs.

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Call Stack:

If the Envoy binary is crashing, a call stack is required.
Please refer to the Bazel Stack trace documentation.

@milton0825 milton0825 added bug triage Issue requires triage labels Feb 13, 2024
@milton0825
Copy link
Contributor Author

cc: @birenroy

@kyessenov kyessenov added area/http and removed triage Issue requires triage labels Feb 14, 2024
@birenroy
Copy link
Contributor

Do you have any log output from a grpc-go client? That would help us with debugging.

@milton0825
Copy link
Contributor Author

milton0825 commented Feb 14, 2024

Captured the following logs in go-grpc
service-gogrpc-debug.csv

@milton0825
Copy link
Contributor Author

milton0825 commented Feb 15, 2024

So it seems like the GRPC client is not expecting a data frame with END_STREAM flag set. I am wondering where in envoy (or oghttp2) could be touching the flag.

@birenroy
Copy link
Contributor

/assign @birenroy

copybara-service bot pushed a commit to google/quiche that referenced this issue Feb 17, 2024
This option is not set, and makes the code more confusing.

My hope is that the simplification helps us debug envoyproxy/envoy#32371.

PiperOrigin-RevId: 607833789
@birenroy
Copy link
Contributor

I wasn't able to find any server closed the stream without sending trailers in the logs you attached. Are you still able to reproduce the problem?

It would also be helpful to know the failure rate before and after enabling oghttp2. Are we talking 0.1%? 1%? 10%?

@milton0825
Copy link
Contributor Author

Those are the logs from grpc-go client after I enabled:

GRPC_GO_LOG_VERBOSITY_LEVEL=99
GRPC_GO_LOG_SEVERITY_LEVEL=info

I can still see server closed the stream without sending trailers error message returned from grpc-go in the app.

The failure rate was ~0.001% and increase to the level of 0.1%.

@milton0825
Copy link
Contributor Author

@birenroy any other information you need from me?

@birenroy
Copy link
Contributor

birenroy commented Mar 5, 2024

Sorry for the delay. An increase from 0.001% to 0.1% might be difficult to reproduce, and therefore difficult to track down, especially if the reproduction involves running a complete gRPC service through an Envoy.

I am working on some cleanup to the HTTP/2 codec in #32378; I'd like to revisit this after that PR.

@birenroy
Copy link
Contributor

birenroy commented Apr 1, 2024

@milton0825 Now that #32378 has been merged, would you be able to try enabling the oghttp2 feature again?

@milton0825
Copy link
Contributor Author

It seems like #32378 is just code clean up. Is it going to fix the issue?

@birenroy
Copy link
Contributor

birenroy commented Apr 1, 2024

It simplifies the flow of HTTP/2 protocol events through the Envoy codec layer. At the very least, it will make the issue easier to investigate if we can reproduce the bug.

Copy link

github-actions bot commented May 1, 2024

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label May 1, 2024
Copy link

github-actions bot commented May 8, 2024

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/http bug stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

3 participants