New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not receiving async done tag for failed incoming call requests #10136
Comments
Hey @lyuxuan - can you take a first peek at this. I think we need a test case that:
And I think the expectation is that right now we'll see the tag for (2) come back but not for (1). |
I add a test case to the async_end2end_test, and indeed (1) does not come back. |
It's blocking internal user |
As a side note, the reason we're calling AsyncNotifyWhenDone is because we want to be able to call ServerContext::IsCancelled, which has this comment: // When using async API, it is only safe to call IsCancelled after We would probably be able to infer the "done-ness" of the connection without this tag otherwise. |
Any status update here? |
We have a fix plan, and should be able to update it by the end of this week. Thanks! |
This issue is still present in 1.4.2. Any updates? |
Fix is in progress. #11955 |
Is this issue still alive or is it closed? |
It's alive: @sreecha was going to look at it in the next week or so.
…On Tue, Oct 10, 2017 at 11:07 PM Vijay Pai ***@***.***> wrote:
Is this issue still alive or is it closed?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10136 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AJpudbEIvjobSrrGSMqN1VDH2m6ncrnBks5srFsEgaJpZM4Mbv4O>
.
|
ping? |
Are there any plans for fixing this issue? For us, this results in lot of memory leaks at shutdown which screws up our memory leak validation tool chain. Thanks. |
Hi @abaldeva , I looked at it a few months ago and realized that the fix is very invasive and requires a lot of rewiring the code. It is currently in the back-burner. I am going through all our open issues this week; I'll give you a more concrete answer mid to late week. |
I also saw this issue when working on #15853. I will add some comment to that API so that users are warned about this issue before using that API. |
Past the two year mark and still an open issue. Pretty annoying as it seems to be the only exception to "whatever you feed the CQ, you'll get back" invariant. |
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 180 days. It will be closed automatically if no further update occurs in 1 day. Thank you for your contributions! |
Issue definitely persists -- reopening |
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions! |
Signed-off-by: Mark D. Roth <roth@google.com> Mirrored from https://github.com/envoyproxy/envoy @ 2fcd75f1c3295d667c4bbac9fd2c0ba430662d44
is this issue still exists? |
Yes, it is still an issue. I also have to disable leak checking for most tests because of this now. Sadly, as I understand it, from a user point it is unavoidable: You have to call context.AsyncNotifyWhenDone before the request starts. If it never starts it is also not safe to delete the tag myself as it sometimes seems to get delivered anyway? Disabling leak checking in those tests seems to be easiest. I assume the reason for not fixing this is that most applications don't start/stop the servers all the time on production. |
I come up with a workaround to solve this problem at the application level. Depending on whether rpc initiate is in progress, it is determined whether to free the (dynamically allocated) tag to be delivered to I wonder if this method is safe? class CallDataInterface {
public:
virtual ~CallDataInterface() = default;
virtual void Proceed() = 0;
};
class DoneCallData : public CallDataInterface {
public:
DoneCallData() = default;
~DoneCallData() = default;
void Proceed() override { delete this; }
};
class SayHelloCallData : public CallDataInterface {
public:
SayHelloCallData(Greeter::AsyncService* service, ServerCompletionQueue* cq)
: service_(service), cq_(cq), responder_(&ctx_), status_(CREATE) {
initiate_ = false;
done_ = std::make_unique<DoneCallData>();
Proceed();
}
~SayHelloCallData() {
if (initiate_) {
done_.release();
}
}
void Proceed() override {
if (status_ == CREATE) {
status_ = PROCESS;
ctx_.AsyncNotifyWhenDone(done_.get());
service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_,
this);
} else if (status_ == PROCESS) {
initiate_ = true;
new SayHelloCallData(service_, cq_);
std::string prefix("Hello ");
reply_.set_message(prefix + request_.name());
status_ = FINISH;
responder_.Finish(reply_, Status::OK, this);
} else {
GPR_ASSERT(status_ == FINISH);
delete this;
}
}
private:
bool initiate_;
std::unique_ptr<DoneCallData> done_;
Greeter::AsyncService* service_;
ServerCompletionQueue* cq_;
ServerContext ctx_;
HelloRequest request_;
HelloReply reply_;
ServerAsyncResponseWriter<HelloReply> responder_;
enum CallStatus { CREATE, PROCESS, FINISH };
CallStatus status_;
}; (written with 'grpc helloworld async server example'.) |
As the issue[1] documents, the behavior of AsyncNotifyWhenDone is documented as: "The comment on `AsyncNotifyWhenDone` states "Has to be called before the rpc starts" but it seems that if the request tag is returned with ok=false (i.e. because the CQ is shutting down) then the async done tag is never received. Instead, I expect the async done tag to be received regardless of whether or not an incoming call request was successfully received." The TODO item is marked closed as stale, and it seems unlikely this will be resolved, without breaking existing users whose code is written under the assumption that the tag is not seen if the call never starts, so it may be time to documented the idiosyncratic corner case and make it the expected behavior. [1]: grpc#10136
As the [issue](#10136) documents, the behavior of AsyncNotifyWhenDone is documented as: "The comment on `AsyncNotifyWhenDone` states "Has to be called before the rpc starts" but it seems that if the request tag is returned with ok=false (i.e. because the CQ is shutting down) then the async done tag is never received. Instead, I expect the async done tag to be received regardless of whether or not an incoming call request was successfully received." The TODO item is marked closed as stale, and it seems unlikely this will be resolved, without breaking existing users whose code is written under the assumption that the tag is not seen if the call never starts, so it may be time to documented the idiosyncratic corner case and make it the expected behavior.
As the [issue](grpc#10136) documents, the behavior of AsyncNotifyWhenDone is documented as: "The comment on `AsyncNotifyWhenDone` states "Has to be called before the rpc starts" but it seems that if the request tag is returned with ok=false (i.e. because the CQ is shutting down) then the async done tag is never received. Instead, I expect the async done tag to be received regardless of whether or not an incoming call request was successfully received." The TODO item is marked closed as stale, and it seems unlikely this will be resolved, without breaking existing users whose code is written under the assumption that the tag is not seen if the call never starts, so it may be time to documented the idiosyncratic corner case and make it the expected behavior.
In my implementation, I used a memory pool for managing call data objects. Rather than allocating memory for each individual object, I used a preallocated buffer from the memory pool and instantiate with placement new (e.g. |
This is a known deficiency of the CQ-based async API. At this point, we would recommend users migrate to the newer callback-style API instead. |
The comment on "AsyncNotifyWhenDone" states "Has to be called before the rpc starts" but it seems that if the request tag is returned with ok=false (i.e. because the CQ is shutting down) then the async done tag is never received. Instead, I expect the async done tag to be received regardless of whether or not an incoming call request was successfully received.
Worth noting is I have also seen cases where I received both the call request with ok=false and and done tag when there was some error (such as deadline exceeded) and not when the CQ is shutting down.
The text was updated successfully, but these errors were encountered: