Skip to content

EOF behaviour in bi-di streams #2954

@tjungblu

Description

@tjungblu

Hey!

I'm coming from etcd-io/etcd#13909 where I need some help to understand how the bidirectional streaming is supposed to work. etcd uses the grpc-gateway in front of the grpc-proxy which in turn runs against the etcd server. etcd has a feature called "watches" where you get a stream of all changes of the database.

For some reason, the gateway was implemented with a stream of WatchRequests, allowing you to create, cancel watch requests from the client stream. The server will then forward the requests and just proxy all results back.

This feature is fairly old (>6 years) and seems that it used to work using curl at some point as documented here:
https://etcd.io/docs/v3.5/dev-guide/api_grpc_gateway/#watch-keys

Since it's a bit complicated to set etcd up with the proxy and such, I replicated the issue as minimally as possible below using an integration test, which luckily shows the same error:

master...tjungblu:grpc-gateway:repro

TL;DR; the bidi stream results in EOF errors:

    --- FAIL: TestBidiStream/bidi_stream_case (0.00s)
        integration_test.go:1814: resp.StatusCode = 500; want 200
        integration_test.go:1818: response = [{"error":{"code":2,"message":"EOF","details":[]}}]; want [{"result":{"data":"some-text"}} {"result":{"data":"some-text"}}]

🐛 Bug Report

The sequence of events is the following:

  1. request comes in, containing the initial request
  2. request gets decoded, no error and passed along
  3. body is now empty, EOF is returned by the decoder and the loop stops
  4. the loop then returns either context canceled due to the close or the EOF error itself (which is also a race condition problem here)

To Reproduce

Checkout the branch in my fork:
master...tjungblu:grpc-gateway:repro

then just run make test to observe the test failure.

Expected behavior

That test should pass and I should receive a 200 with the stream of elements until I close the connection of the post request. Which kinda requires a solution for #2434 as well...

Actual Behavior

Test fails :) Please let me know if we're using it wrong here, I was just picking this up from an issue report.

Your Environment

Fedora 36, go version go1.18.2 linux/amd64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions