-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On client's read errors quick subsequent requests may wait for response indefinitely #415
Comments
Thanks for raising this issue! I'll try to take a closer look this week, but also, feel free to send a PR for review. |
Could I see your code where the problem occurs? I'm curious how the RequestDispatch is being polled. When it hits an error, is it dropped? |
BTW, I was able to refactor an existing test to consistently trigger this race: master...request_dispatch_race |
Can you see if this branch fixes the problem for you? https://github.com/tikue/tarpc/tree/request_dispatch_race |
Thanks a lot for implementing it @tikue! I tried it out and, unfortunately, this still results in the same problem. But your code LGTM in general. So I suspect that this may be due to the use of |
Hm, but the The problem with polling in a loop even when pending is returned is that it blocks in a nonblocking function. For example, a sender may have reserved a permit and then subsequently went to sleep for an hour. We don't want to block a Tokio thread for an hour waiting for the client to wake up, as there could be other async tasks that need to run in the meantime. |
Oh, true, you're right - the code should work just fine like that! Also, I just noticed something strange in my tests. Although you changed the propagated error a little, my tests worked. But now when I try to reproduce my tests, I need to adjust the expected error message. Maybe I or cargo did something wrong, let me test it again. |
So the test has been running repeatedly for more than 20min now and no errors so far. Sorry for causing confusion earlier and thanks again for implementing the fix @tikue ! |
That's great news, thanks for confirming! Yeah, I changed the client errors a little since more types of channel errors are propagated now. I might still revisit them a bit. (benefits of being perpetually pre-1.0...) |
chore(crypto): CRP-2380 bump `tarpc` version to `0.34` To test a fix for one of our tests (see MR !17088 and the [github issue](google/tarpc#415) for `tarpc`), the CSP code had to be adapted to the newer `tarpc` version. Since this work was already done, we can also just bump the version already now and it's likely that `0.34` will get a minor update with the fix that will work without adjusting our code. See merge request dfinity-lab/public/ic!17478
If I understand correctly, the dispatch task is terminated on a receiving error in transport (from
pump_read()
) and subsequent requests return inBut for me it does sometime happen that if the request is filed immediately after the error, then
does not error and waits indefinitely in
response_guard.response().await
. Not sure why this is happening, since I would expect that allDispatchRequest
s would be dropped and also the waiters would notice that and return an error. But it seems that there is a race condition?Unfortunately, I couldn't produce a minimal working example. A small example "just works".
If I add
self.pending_requests_mut().close();
topump_read()
, I can't reproduce the issue anymore. Not sure if that is a solution or it just masks the issue.The text was updated successfully, but these errors were encountered: