Half-closed without a request #8866

tommyulfsparre · 2022-01-25T13:45:05Z

gRPC version
1.40.0

Environment
A client and server communicating over localhost.

I have a simple client and server that does nothing beside responding to an incoming RPC:

  @Override
  public void concat(PingRequest request, StreamObserver<PingResponse> responseObserver) {
    responseObserver.onNext(PingResponse.newBuilder().build());
    responseObserver.onCompleted();
  }

The client sends 100 RPCs with a 100ms deadline in parallel over a single channel that is in a READY state and shutdowns after all RPCs has been sent.

Server side:
13:49:11.197  INFO [ault-executor-5] com.example.Main             : trace-seq: 97 - ServerCall.Listener.close with status code: INTERNAL, desc: Half-closed without a request

Client side:
13:49:10.881 [grpc-nio-worker-ELG-1-2] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0x3141a0f1, L:/127.0.0.1:50244 - R:/127.0.0.1:5993] OUTBOUND HEADERS: streamId=145 headers=GrpcHttp2OutboundHeaders[:authority: 127.0.0.1:5993, :path: /example.test.v1.PingService/Ping, :method: POST, :scheme: http, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.40.0, trace-seq: 97, grpc-accept-encoding: gzip, grpc-trace-bin: AAB33fg6kgpaWIFFOW9S6A41Af/EVN8t8akZAgA, grpc-timeout: 27946551n] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false

13:49:10.979 [grpc-nio-worker-ELG-1-2] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0x3141a0f1, L:/127.0.0.1:50244 - R:/127.0.0.1:5993] OUTBOUND DATA: streamId=145 padding=0 endStream=true length=16 bytes=000000000b0a04666f6f6f1203626172

13:49:11.004 [grpc-nio-worker-ELG-1-2] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0x3141a0f1, L:/127.0.0.1:50244 - R:/127.0.0.1:5993] OUTBOUND RST_STREAM: streamId=145 errorCode=8

(trace-seq: 97 is metadata that gets sent along with each RPC)

After the JVM has started we get a bunch of RPC with status: INTERNAL, desc: Half-closed without a request server side, client report the expected DEADLINE_EXCEEDED . So before any JIT or lazy class loading.

Re-running the client (keeping the server running) doesn't not produce those error and we can see that only onCancel gets called when the client cancel an RPC. Instead of close(Status.INTERNAL) -> onHalfClose -> onCancel that we saw before.

I guess that gRPC can't do much about the slow JVM startup but is the INTERNAL, desc: Half-closed without a request expected in those cases?

The text was updated successfully, but these errors were encountered:

ejona86 · 2022-01-28T14:18:52Z

Strange.

"Half-closed without a request" should mean that the client didn't send (all of) a request. But those logs show the request DATA frame and the 000000000b shows an uncompressed message (first byte) of size 11 (next 4 bytes). 5 byte header + 11 byte message is 16 bytes, which agrees with the data frame. Nothing looks strange here; I'd have expected it to work.

Am I right in assuming that 1.40.0 is the version for both the client and server? I don't recognize this as a recent bug. And since this is over localhost, there's almost certainly not a proxy involved.

Do you have any interceptors that could be causing this problem? If an interceptor is wrapping the ServerCall.Listener and not immediately forwarding onMessage(), then it'd be suspect. This sort of bug would be easy in an interceptor that does some work asynchronously. If an interceptor delays calling onMessage(), then it needs to also delay onHalfClose().

tommyulfsparre · 2022-01-28T16:32:19Z

Am I right in assuming that 1.40.0 is the version for both the client and server? I don't recognize this as a recent bug. And since this is over localhost, there's almost certainly not a proxy involved.

Correct both client and server is using 1.40.0. I will try and see if I can reproduce this with a newer version.

Do you have any interceptors that could be causing this problem? If an interceptor is wrapping the ServerCall.Listener and not immediately forwarding onMessage(), then it'd be suspect. This sort of bug would be easy in an interceptor that does some work asynchronously. If an interceptor delays calling onMessage(), then it needs to also delay onHalfClose().

I added a logging interceptors just for debugging purposes:

  private ServerInterceptor logging() {
    var seq = new AtomicInteger(0);

    return new ServerInterceptor() {
      @Override
      public <ReqT, RespT> Listener<ReqT> interceptCall(
          ServerCall<ReqT, RespT> call, Metadata headers, ServerCallHandler<ReqT, RespT> next) {
        var remoteSeq = headers.get(TRACE_SEQ_KEY);
        var seqNr = seq.incrementAndGet();

        return new ForwardingServerCallListener.SimpleForwardingServerCallListener<ReqT>(
            next.startCall(
                new ForwardingServerCall.SimpleForwardingServerCall<ReqT, RespT>(call) {
                  @Override
                  public void sendMessage(final RespT message) {
                    super.sendMessage(message);
                    logger.info("{} - sendMessage", seq);
                  }

                  @Override
                  public void close(final Status status, final Metadata trailers) {
                    super.close(status, trailers);
                    logger.info(
                        "{} trace-seq: {} - closed with status code: {}, desc: {}",
                        seqNr,
                        remoteSeq,
                        status.getCode(),
                        status.getDescription());
                  }

                  @Override
                  public void sendHeaders(final Metadata headers) {
                    super.sendHeaders(headers);
                    logger.info("{} - sendHeader", seqNr);
                  }
                },
                headers)) {

          @Override
          public void onHalfClose() {
            super.onHalfClose();
            logger.info("{} - onHalfClose", seqNr);
          }

          @Override
          public void onMessage(final ReqT message) {
            super.onMessage(message);
            logger.info("{} - onMessage", seqNr);
          }

          @Override
          public void onCancel() {
            super.onCancel();
            logger.info("{} - onCancel", seqNr);
          }

          @Override
          public void onComplete() {
            super.onComplete();
            logger.info("{} - onComplete", seqNr);
          }
        };
      }
    };
}

could this cause the observed behaviour?

ejona86 · 2022-01-31T05:21:00Z

I expect this issue will be present in newer versions too, but confirming is still worthwhile.

That interceptor doesn't look to be a problem. I will note that the logs may be in a misleading order because within super.onHalfClose() there may be a call to call.close(). So it may then appear as if the close() happens before the onHalfClose() when in truth the onHalfClose() triggered the close(). But that's minor.

Is that the only interceptor in use?

tommyulfsparre · 2022-01-31T07:19:25Z

Is that the only interceptor in use?

Here is a minimal reproducible(at least on my machine) example: https://github.com/tommyulfsparre/grpc-half-close

dapengzhang0 · 2022-02-01T19:43:59Z

Here is a minimal reproducible(at least on my machine) example: https://github.com/tommyulfsparre/grpc-half-close

I can reproduce the "Half-closed without a request" error with the given reproducible.

dapengzhang0 · 2022-02-01T21:08:45Z

"Half-closed without a request" should mean that the client didn't send (all of) a request. But those logs show the request DATA frame and the 000000000b shows an uncompressed message (first byte) of size 11 (next 4 bytes). 5 byte header + 11 byte message is 16 bytes, which agrees with the data frame. Nothing looks strange here; I'd have expected it to work.

@ejona86

Seems when RST_STREAM is received,

listener.halfClosed()
https://github.com/grpc/grpc-java/blob/v1.44.0/core/src/main/java/io/grpc/internal/AbstractServerStream.java#L231

can race with
listener.onMessage()
https://github.com/grpc/grpc-java/blob/v1.44.0/core/src/main/java/io/grpc/internal/ServerImpl.java#L834
https://github.com/grpc/grpc-java/blob/v1.44.0/core/src/main/java/io/grpc/internal/ServerCallImpl.java#L318

irock · 2022-02-03T18:45:51Z

Tangential question to whether this is a race condition or not: Is INTERNAL the most accurate code to describe this error condition? Would DATA_LOSS or CANCELLED be more accurate status codes?

As far as I understand, it is possible for this condition to occur under normal circumstances if the client closes a stream prematurely. Is this a server error or a client error? The current implementation is categorizing it as a server error. Do we want to change this?

ejona86 · 2022-02-03T20:23:39Z

The current implementation is categorizing it as a server error.

@irock, INTERNAL is not a "server error." It's a "things are fundamentally broken." The INTERNAL usage on that line is appropriate, because the client believes the RPC is client-streaming instead of unary, so the client and server appear not to agree on the schema.

It is a bug though that line is executed. But the "wrong status" part of it doesn't matter too much because the client already cancelled and so won't see it.

ejona86 · 2022-02-03T20:55:48Z

There's no compression in the reproduction. So I think this might be it:

Transport receives full request
In the successful cases the application request()s a message, so delivery to the application begins. In failure case, this is too slow and doesn't happen
Transport receives RST_STREAM, which closes the deframer with an option to throw away any messages.

grpc-java/core/src/main/java/io/grpc/internal/AbstractServerStream.java

Line 286 in 74b054b

closeDeframer(true);
The deframer cleans up, but it hadn't yet started processing a message, so nextFrame == null so hasPartialMessage == false.

grpc-java/core/src/main/java/io/grpc/internal/MessageDeframer.java

Line 216 in 7308d92

boolean hasPartialMessage = nextFrame != null && nextFrame.readableBytes() > 0;
That makes AbstractServerStream think this is a graceful end, so it inserts halfClosed(), even though it didn't deliver the message.

grpc-java/core/src/main/java/io/grpc/internal/AbstractServerStream.java

Line 231 in 74b054b

listener.halfClosed();

Could the fix be as easy as avoiding the entire endOfStream case if immediateCloseRequested == true? Looks like it. I haven't run the reproduction yet to check.

@tommyulfsparre, the reproduction looks to have been very helpful here.

ejona86 · 2022-02-03T21:03:05Z

Maybe we need something more; I'd have to look at complete() more.

irock · 2022-02-24T16:37:39Z

Hi @ejona86! Do you have any news on a potential fix for this issue?

YifeiZhuang · 2022-07-01T18:20:38Z

At the server side, transport first notifies stream created, then server call is started, and request() messages on the stream. Meanwhile, the stream receives halfClose() and notifies the call. However, it seems no matter how long request() is delayed, scheduling of notifying halfClose() on the call always happen after scheduling of onMessage(), which guarantees request is fulfilled before halfclose. @ejona do you know why this order happens?

YifeiZhuang · 2022-07-01T18:46:21Z

never mind, that situation was in successful case.

When deadline is small enough( in my local test, request delay =100ms, deadline=400ms), we do see 100% of time that, due to deadline and slow request(), server call's halfClose() is notified before onMessage(), therefore INTERNAL "Half-closed without a request" is always triggered, however, the client always sees DEADLINE_EXCEEDED estatus code, instead of INTERNAL. Does this mean it might be client side issue?

ejona86 · 2022-07-01T23:03:58Z

Cancellation drains the MessageDeframer, but our code handling the deframer closure is noticing we've received halfClose on-the-wire and reporting that accidentally. The problem here isn't the ordering between onMessage() and halfClose(). The problem is that halfClose() shouldn't be reported at all. The cancellation code threw away the request message, so it won't ever be delivered.

The call is already cancelled when this INTERNAL error occurs, so the server doesn't send the status to the client. The client processes the deadline locally to see the DEADLINE_EXCEEDED.

YifeiZhuang · 2022-07-06T17:44:19Z

oh i looked at the original issue again, so even in the undesired situation the client never receives INTERNAL error, only server sees that. The client always see DEADLINE_EXCEEDED. The it looks the reproduction is ok.

Cancellation drains the MessageDeframer, but our code handling the deframer closure is noticing we've received halfClose on-the-wire and reporting that accidentally. The problem here isn't the ordering between onMessage() and halfClose(). The problem is that halfClose() shouldn't be reported at all. The cancellation code threw away the request message, so it won't ever be delivered.

The received halfClose from the client was caused by client cancellation, right? Then it makes sense, the deadline was long enough for the server to receive the request but not processed it, and be small enough to cancel before it all completes to cause the INTERNAL error.

ejona86 · 2022-07-06T17:47:58Z

The received halfClose from the client was caused by client cancellation, right?

No, I expect that was part of the original request. Something like:

Client sends request headers, message, halfClose
... delay
Client sends cancellation

Commonly that delay is large enough that the server would have request()ed before receiving the cancellation and so we don't normally see the issue.

But the rest of what you said sounds fine.

tommyulfsparre added the question label Jan 25, 2022

ejona86 added bug and removed question labels Feb 3, 2022

ejona86 mentioned this issue Feb 8, 2022

io.grpc.StatusRuntimeException: INTERNAL: No value received for unary call #8797

Closed

ejona86 assigned YifeiZhuang May 31, 2022

YifeiZhuang mentioned this issue Jul 11, 2022

core: server stream should not deliver halfClose() when call is immediately cancelled #9362

Merged

YifeiZhuang closed this as completed in #9362 Jul 12, 2022

github-actions bot locked as resolved and limited conversation to collaborators Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half-closed without a request #8866

Half-closed without a request #8866

tommyulfsparre commented Jan 25, 2022 •

edited

Loading

ejona86 commented Jan 28, 2022

tommyulfsparre commented Jan 28, 2022 •

edited

Loading

ejona86 commented Jan 31, 2022

tommyulfsparre commented Jan 31, 2022 •

edited

Loading

dapengzhang0 commented Feb 1, 2022

dapengzhang0 commented Feb 1, 2022

irock commented Feb 3, 2022

ejona86 commented Feb 3, 2022

ejona86 commented Feb 3, 2022

ejona86 commented Feb 3, 2022

irock commented Feb 24, 2022

YifeiZhuang commented Jul 1, 2022

YifeiZhuang commented Jul 1, 2022

ejona86 commented Jul 1, 2022

YifeiZhuang commented Jul 6, 2022

ejona86 commented Jul 6, 2022

Half-closed without a request #8866

Half-closed without a request #8866

Comments

tommyulfsparre commented Jan 25, 2022 • edited Loading

ejona86 commented Jan 28, 2022

tommyulfsparre commented Jan 28, 2022 • edited Loading

ejona86 commented Jan 31, 2022

tommyulfsparre commented Jan 31, 2022 • edited Loading

dapengzhang0 commented Feb 1, 2022

dapengzhang0 commented Feb 1, 2022

irock commented Feb 3, 2022

ejona86 commented Feb 3, 2022

ejona86 commented Feb 3, 2022

ejona86 commented Feb 3, 2022

irock commented Feb 24, 2022

YifeiZhuang commented Jul 1, 2022

YifeiZhuang commented Jul 1, 2022

ejona86 commented Jul 1, 2022

YifeiZhuang commented Jul 6, 2022

ejona86 commented Jul 6, 2022

tommyulfsparre commented Jan 25, 2022 •

edited

Loading

tommyulfsparre commented Jan 28, 2022 •

edited

Loading

tommyulfsparre commented Jan 31, 2022 •

edited

Loading