Not receiving async done tag for failed incoming call requests #10136

jeady · 2017-03-13T20:03:40Z

The comment on "AsyncNotifyWhenDone" states "Has to be called before the rpc starts" but it seems that if the request tag is returned with ok=false (i.e. because the CQ is shutting down) then the async done tag is never received. Instead, I expect the async done tag to be received regardless of whether or not an incoming call request was successfully received.

Worth noting is I have also seen cases where I received both the call request with ok=false and and done tag when there was some error (such as deadline exceeded) and not when the CQ is shutting down.

ctiller · 2017-03-13T20:21:05Z

Hey @lyuxuan - can you take a first peek at this.

I think we need a test case that:

creates a ServerContext and calls AsyncNotifyWhenDone
requests a call (using the async server API) using that ServerContext
shuts down the server

And I think the expectation is that right now we'll see the tag for (2) come back but not for (1).
We should also see (1).

lyuxuan · 2017-03-14T01:13:58Z

I add a test case to the async_end2end_test, and indeed (1) does not come back.
#10143

lyuxuan · 2017-03-23T18:59:10Z

It's blocking internal user

jeady · 2017-03-28T19:29:31Z

As a side note, the reason we're calling AsyncNotifyWhenDone is because we want to be able to call ServerContext::IsCancelled, which has this comment:

// When using async API, it is only safe to call IsCancelled after
// the AsyncNotifyWhenDone tag has been delivered

We would probably be able to infer the "done-ness" of the connection without this tag otherwise.

jeady · 2017-05-08T19:45:44Z

Any status update here?

lyuxuan · 2017-05-08T20:11:17Z

We have a fix plan, and should be able to update it by the end of this week. Thanks!

abaldeva · 2017-08-18T19:01:57Z

This issue is still present in 1.4.2. Any updates?

lyuxuan · 2017-08-19T00:32:12Z

Fix is in progress. #11955

vjpai · 2017-10-11T06:06:30Z

Is this issue still alive or is it closed?

ctiller · 2017-10-11T06:12:13Z

It's alive: @sreecha was going to look at it in the next week or so.

…

On Tue, Oct 10, 2017 at 11:07 PM Vijay Pai ***@***.***> wrote: Is this issue still alive or is it closed? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10136 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJpudbEIvjobSrrGSMqN1VDH2m6ncrnBks5srFsEgaJpZM4Mbv4O> .

muxi · 2017-12-11T20:29:14Z

ping?

abaldeva · 2018-08-01T19:15:39Z

Are there any plans for fixing this issue? For us, this results in lot of memory leaks at shutdown which screws up our memory leak validation tool chain.

Thanks.

sreecha · 2018-08-01T19:42:51Z

Hi @abaldeva , I looked at it a few months ago and realized that the fix is very invasive and requires a lot of rewiring the code. It is currently in the back-burner. I am going through all our open issues this week; I'll give you a more concrete answer mid to late week.

AspirinSJL · 2018-08-08T17:47:00Z

I also saw this issue when working on #15853. I will add some comment to that API so that users are warned about this issue before using that API.

mehrdada · 2019-05-31T02:51:33Z

Past the two year mark and still an open issue. Pretty annoying as it seems to be the only exception to "whatever you feed the CQ, you'll get back" invariant.

stale · 2019-11-27T03:09:56Z

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 180 days. It will be closed automatically if no further update occurs in 1 day. Thank you for your contributions!

mehrdada · 2020-02-21T20:14:23Z

Issue definitely persists -- reopening

stale · 2020-05-06T05:22:44Z

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions!

Signed-off-by: Mark D. Roth <roth@google.com> Mirrored from https://github.com/envoyproxy/envoy @ 2fcd75f1c3295d667c4bbac9fd2c0ba430662d44

xiedeacc · 2022-04-10T04:11:08Z

is this issue still exists?

adrianimboden · 2022-04-23T00:31:26Z

Yes, it is still an issue. I also have to disable leak checking for most tests because of this now.

Sadly, as I understand it, from a user point it is unavoidable: You have to call context.AsyncNotifyWhenDone before the request starts. If it never starts it is also not safe to delete the tag myself as it sometimes seems to get delivered anyway?

Disabling leak checking in those tests seems to be easiest.

I assume the reason for not fixing this is that most applications don't start/stop the servers all the time on production.

yoonseok-kim · 2023-01-13T05:25:04Z

I come up with a workaround to solve this problem at the application level. Depending on whether rpc initiate is in progress, it is determined whether to free the (dynamically allocated) tag to be delivered to AsyncNotifyWhenDone.

I wonder if this method is safe?

class CallDataInterface {
public:
    virtual ~CallDataInterface() = default;
    virtual void Proceed() = 0;
};

class DoneCallData : public CallDataInterface {
public:
    DoneCallData() = default;
    ~DoneCallData() = default;

    void Proceed() override { delete this; }
};

class SayHelloCallData : public CallDataInterface {
   public:
    SayHelloCallData(Greeter::AsyncService* service, ServerCompletionQueue* cq)
        : service_(service), cq_(cq), responder_(&ctx_), status_(CREATE) {
      initiate_ = false;
      done_ = std::make_unique<DoneCallData>();
      Proceed();
    }

    ~SayHelloCallData() {
        if (initiate_) {
            done_.release();
        }
    }

    void Proceed() override {
      if (status_ == CREATE) {
        status_ = PROCESS;
        ctx_.AsyncNotifyWhenDone(done_.get());
        service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_,
                                  this);
      } else if (status_ == PROCESS) {
        initiate_ = true;
        new SayHelloCallData(service_, cq_);

        std::string prefix("Hello ");
        reply_.set_message(prefix + request_.name());

        status_ = FINISH;
        responder_.Finish(reply_, Status::OK, this);
      } else {
        GPR_ASSERT(status_ == FINISH);
        delete this;
      }
    }

   private:
    bool initiate_;
    std::unique_ptr<DoneCallData> done_;
    Greeter::AsyncService* service_;
    ServerCompletionQueue* cq_;
    ServerContext ctx_;

    HelloRequest request_;
    HelloReply reply_;

    ServerAsyncResponseWriter<HelloReply> responder_;

    enum CallStatus { CREATE, PROCESS, FINISH };
    CallStatus status_;
  };

(written with 'grpc helloworld async server example'.)

As the issue[1] documents, the behavior of AsyncNotifyWhenDone is documented as: "The comment on `AsyncNotifyWhenDone` states "Has to be called before the rpc starts" but it seems that if the request tag is returned with ok=false (i.e. because the CQ is shutting down) then the async done tag is never received. Instead, I expect the async done tag to be received regardless of whether or not an incoming call request was successfully received." The TODO item is marked closed as stale, and it seems unlikely this will be resolved, without breaking existing users whose code is written under the assumption that the tag is not seen if the call never starts, so it may be time to documented the idiosyncratic corner case and make it the expected behavior. [1]: grpc#10136

As the [issue](#10136) documents, the behavior of AsyncNotifyWhenDone is documented as: "The comment on `AsyncNotifyWhenDone` states "Has to be called before the rpc starts" but it seems that if the request tag is returned with ok=false (i.e. because the CQ is shutting down) then the async done tag is never received. Instead, I expect the async done tag to be received regardless of whether or not an incoming call request was successfully received." The TODO item is marked closed as stale, and it seems unlikely this will be resolved, without breaking existing users whose code is written under the assumption that the tag is not seen if the call never starts, so it may be time to documented the idiosyncratic corner case and make it the expected behavior.

As the [issue](grpc#10136) documents, the behavior of AsyncNotifyWhenDone is documented as: "The comment on `AsyncNotifyWhenDone` states "Has to be called before the rpc starts" but it seems that if the request tag is returned with ok=false (i.e. because the CQ is shutting down) then the async done tag is never received. Instead, I expect the async done tag to be received regardless of whether or not an incoming call request was successfully received." The TODO item is marked closed as stale, and it seems unlikely this will be resolved, without breaking existing users whose code is written under the assumption that the tag is not seen if the call never starts, so it may be time to documented the idiosyncratic corner case and make it the expected behavior.

brunexgeek · 2024-03-21T01:46:55Z

Sadly, as I understand it, from a user point it is unavoidable: You have to call context.AsyncNotifyWhenDone before the request starts. If it never starts it is also not safe to delete the tag myself as it sometimes seems to get delivered anyway?

In my implementation, I used a memory pool for managing call data objects. Rather than allocating memory for each individual object, I used a preallocated buffer from the memory pool and instantiate with placement new (e.g. new (buffer) CallData()). If the notification never arrives, the buffer remains reserved in the pool, although "wasted". However, I have the flexibility to deallocate these buffers either upon stopping the server or dynamically during execution (e.g. employing a timeout mechanism).

markdroth · 2024-03-21T15:22:18Z

This is a known deficiency of the CQ-based async API. At this point, we would recommend users migrate to the newer callback-style API instead.

ctiller assigned lyuxuan Mar 13, 2017

lyuxuan assigned sreecha Mar 29, 2017

abaldeva mentioned this issue Sep 20, 2017

Recv close api change #11955

Closed

lyuxuan removed their assignment Apr 12, 2018

vjpai assigned vjpai and unassigned sreecha Apr 12, 2018

vjpai added priority/P1 kind/bug lang/c++ priority/P2 and removed priority/P1 labels May 15, 2018

stale bot added the disposition/stale label Nov 27, 2019

stale bot closed this as completed Dec 4, 2019

mehrdada reopened this Feb 21, 2020

stale bot removed the disposition/stale label Feb 21, 2020

stale bot added the disposition/stale label May 6, 2020

stale bot closed this as completed May 13, 2020

Rantanen mentioned this issue May 13, 2020

Client-side terminating bidi streams in async server yields tags in random order resulting in crashes #22021

Closed

yashykt mentioned this issue Jan 26, 2021

Asynchronous server crashing with "pure virtual method called" #25073

Closed

Tradias mentioned this issue Oct 22, 2022

How to get notified when client close Tradias/asio-grpc#51

Closed

mehrdada mentioned this issue May 22, 2023

[API] Document gotcha in AsyncNotifyWhenDone behavior #33208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not receiving async done tag for failed incoming call requests #10136

Not receiving async done tag for failed incoming call requests #10136

jeady commented Mar 13, 2017

ctiller commented Mar 13, 2017

lyuxuan commented Mar 14, 2017 •

edited

lyuxuan commented Mar 23, 2017

jeady commented Mar 28, 2017

jeady commented May 8, 2017

lyuxuan commented May 8, 2017

abaldeva commented Aug 18, 2017

lyuxuan commented Aug 19, 2017

vjpai commented Oct 11, 2017

ctiller commented Oct 11, 2017 via email

muxi commented Dec 11, 2017

abaldeva commented Aug 1, 2018

sreecha commented Aug 1, 2018

AspirinSJL commented Aug 8, 2018

mehrdada commented May 31, 2019

stale bot commented Nov 27, 2019

mehrdada commented Feb 21, 2020

stale bot commented May 6, 2020

xiedeacc commented Apr 10, 2022

adrianimboden commented Apr 23, 2022

yoonseok-kim commented Jan 13, 2023 •

edited

brunexgeek commented Mar 21, 2024

markdroth commented Mar 21, 2024

Not receiving async done tag for failed incoming call requests #10136

Not receiving async done tag for failed incoming call requests #10136

Comments

jeady commented Mar 13, 2017

ctiller commented Mar 13, 2017

lyuxuan commented Mar 14, 2017 • edited

lyuxuan commented Mar 23, 2017

jeady commented Mar 28, 2017

jeady commented May 8, 2017

lyuxuan commented May 8, 2017

abaldeva commented Aug 18, 2017

lyuxuan commented Aug 19, 2017

vjpai commented Oct 11, 2017

ctiller commented Oct 11, 2017 via email

muxi commented Dec 11, 2017

abaldeva commented Aug 1, 2018

sreecha commented Aug 1, 2018

AspirinSJL commented Aug 8, 2018

mehrdada commented May 31, 2019

stale bot commented Nov 27, 2019

mehrdada commented Feb 21, 2020

stale bot commented May 6, 2020

xiedeacc commented Apr 10, 2022

adrianimboden commented Apr 23, 2022

yoonseok-kim commented Jan 13, 2023 • edited

brunexgeek commented Mar 21, 2024

markdroth commented Mar 21, 2024

lyuxuan commented Mar 14, 2017 •

edited

yoonseok-kim commented Jan 13, 2023 •

edited