New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpclb re-resolution #13671
grpclb re-resolution #13671
Conversation
f15a55a
to
3644b1a
Compare
|
1 similar comment
|
|
1 similar comment
|
|
1 similar comment
|
This looks really good! In particular, I think you've done a good job checking for all of the various edge cases involved in handling the timeouts and the race conditions for the timer callback. My comments are mostly nits or structural improvements that are tangentially related to the new functionality. However, we will need to add a test for the new functionality before we can merge. Please let me know if you have any questions about any of this. Thanks! Reviewed 3 of 3 files at r1. include/grpc/impl/codegen/grpc_types.h, line 299 at r1 (raw file):
Please also document the default if this argument is not specified. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 447 at r1 (raw file):
I suspect that this struct is not needed anymore. @dgquintas can confirm this, but I think the original reason for allocating a new struct whenever we instantiate a new RR policy is that we used to have a lot of cases where we unreffed and then recreated the RR policy, which meant that there could be connectivity state callbacks for multiple RR policies in flight at the same time, and each one needed its own state. But between David's change to add update functionality to LB policies and your change to allow LB policies to request reresolution instead of shutting down, I don't think we have any more cases where we'll unref and recreate the RR policy. So I think the elements from this struct can be moved directly into src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 874 at r1 (raw file):
Suggest calling this something like src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1052 at r1 (raw file):
Nit: Please remove blank lines within functions, both here and below. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1671 at r1 (raw file):
Let's move this new code down to line 1704, since that's where we've actually set the new serverlist. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1869 at r1 (raw file):
Can combine both of these conditionals into one. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1933 at r1 (raw file):
I think this function is now essentially the same as in the other two LB policies, so we can probably remove this method from the LB policy vtable and implement it just once in the base class. src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc, line 364 at r1 (raw file):
I don't think it's necessary to change the RR policy in this PR. Instead, we can just have the grpclb code check for either TRANSIENT_FAILURE or IDLE. Comments from Reviewable |
Thanks for reviewing! I'm still adding the test in grpclb_end2end_test. Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. include/grpc/impl/codegen/grpc_types.h, line 299 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. (Fallback counterpart is added.) src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 447 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. Also, should we actually cancel the notification closure somehow? The previous code didn't cancel it. I tried to cancel it in src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 874 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1052 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Looks like this file has been using many blanks in functions. Also the blank line here is preexisting. Should I remove all the blank lines in this function? src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1671 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1869 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. (Fallback counterpart is added.) src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1933 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Yes, except that we are still using shutting_down from each derived policy. I think checking that we are not shutting down is useful, I proposed to move it into the base policy class. Actually I think some other members (e.g., started_picking) are also common, maybe we should move them too. But then it's a little more verbose to check src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc, line 364 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. I though this will cause unnecessary re-resolution when the LB connection fails before it starts picking. But after a second thought, it seems impossible because we make LB call after we start picking. Comments from Reviewable |
|
|
|
Reviewed 7 of 7 files at r2. src/core/ext/filters/client_channel/lb_policy.h, line 209 at r2 (raw file):
This should still have the src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 447 at r1 (raw file): Previously, AspirinSJL (Juanli Shen) wrote…
No, we shouldn't need to cancel it. When we unref the RR policy, that will cause it to go into state SHUTDOWN, and the pending notification callback will be returned telling us that that's happened. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1052 at r1 (raw file): Previously, AspirinSJL (Juanli Shen) wrote…
Let's do this opportunistically. When you're working on a function anyway, if it has them, go ahead and remove them -- especially if you might be tempted to add more with your new code. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1933 at r1 (raw file): Previously, AspirinSJL (Juanli Shen) wrote…
I'd prefer not to move First, the shutdown code is too deeply embedded in the logic of each implementation. I think it's generally a bad idea to have too much interaction between superclasses and subclasses, because it splits up the logic between them and makes it hard to follow. In fact, the C++ style guide discourages implementation inheritance for exactly this reason. Second, although all of our implementations currently use a simple bool to indicate shutting down, I could imagine other implementations that do something different. I don't think it makes sense to tie ourselves to this one particular implementation. So, here are the alternatives I see, in decreasing order of preference:
src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 883 at r2 (raw file):
Could just call Comments from Reviewable |
|
I've spent 2 days on this timer bug but I haven't understood why it happens. To reproduce it, run
|
@AspirinSJL - yes, will take a look once you resolve the merge conflicts and update the PR. |
|
|
Review status: 5 of 9 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. src/core/ext/filters/client_channel/lb_policy.h, line 209 at r2 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1052 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1933 at r1 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. I choose solution 2. I think we should keep the assertion, even though we have some duplicate code. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 883 at r2 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. Comments from Reviewable |
|
|
|
Reviewed 6 of 9 files at r5. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 899 at r5 (raw file):
Suggest moving this down to an src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 904 at r5 (raw file):
I suspect this is the cause of the timer bug you're trying to track down: We should not be setting I think this line should be removed. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1046 at r5 (raw file):
Same here. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1679 at r5 (raw file):
Same here. src/core/lib/iomgr/timer_generic.cc, line 168 at r5 (raw file):
I assume this was added just for debugging purposes. Please make sure to revert the changes to this file before this PR gets merged. test/cpp/end2end/grpclb_end2end_test.cc, line 1141 at r5 (raw file):
The goal of this test is to check the new functionality being added in this PR, which is that the grpclb policy will request reresolution. However, if we're explicitly telling the fake resolver to return new data here, then this test doesn't actually cover that new functionality, because it would return the new data even if the grpclb policy did not request reresolution. I think that in order to test this, we need a way to set the fake resolver's test/cpp/end2end/grpclb_end2end_test.cc, line 1171 at r5 (raw file):
This isn't necessary; it's already covered in various tests above. Comments from Reviewable |
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 899 at r5 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 904 at r5 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Right, this is the cause. Thanks! I will also delete the unnecessary src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1046 at r5 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 1679 at r5 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. src/core/lib/iomgr/timer_generic.cc, line 168 at r5 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Yes, it's in the temp-logging commit. I will revert it before merging. test/cpp/end2end/grpclb_end2end_test.cc, line 1141 at r5 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. Also, I changed test/cpp/end2end/grpclb_end2end_test.cc, line 1171 at r5 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. So only keeping one (Vanilla) is enough? I guess so because we will use grpclb as long as there is any balancer address, which is true for every test here. I am removing all of those except the Vanilla one. Comments from Reviewable |
|
7ad6fed
to
09baca8
Compare
|
|
Reviewed 2 of 9 files at r27, 7 of 7 files at r28. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 854 at r28 (raw file):
It just occurred to me that now that we're going to re-resolve whenever our connection to a single backend fails (which will be handled by Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 854 at r28 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. Good catch! I moved the checking of a good balancer connection to Comments from Reviewable |
|
|
|
Reviewed 1 of 1 files at r29. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 757 at r29 (raw file):
Please add a comment here explaining that we only want to do a re-resolution if we're not talking to a balancer, because otherwise we expect to get updated addresses from the balancer. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 855 at r29 (raw file):
These assertions don't really buy us anything, because none of the code below requires these constraints, so let's just remove them. Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 757 at r29 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 855 at r29 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Done. Do we add assertions only when the code below require them? I think it's also helpful (e.g., for debugging) to assert some expectations at some points even when the code below don't rely on them. Comments from Reviewable |
|
|
Reviewed 1 of 1 files at r30. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 855 at r29 (raw file): Previously, AspirinSJL (Juanli Shen) wrote…
Whenever you add an assertion, you are essentially adding a dependency on some external input whose value you are constraining. This dependency comes with a cost, because it means that if we ever change the external input such that it can be set to a value that the assertion is preventing, we need to remember to change the assertion -- and we might not even know where the assertion is when we change the external code. As a result, you should only add assertions if the benefit you get from the assertion outweighs the overhead of adding that dependency. If you are writing code that is actually using an input but you are not expecting to ever receive certain values and it would unnecessarily complicate things to handle those values, then it makes sense to add an assertion, so that you blow up with a useful message before you get to code that was not written to handle the value (thus resulting in undefined behavior). In that situation, the assertion adds value (avoiding undefined behavior) and doesn't really add any cost, because if the calling code ever changed such that the assertion would fail, the code protected by the assertion would have needed to change anyway. In other words, in this case, the code already depends on the external code that provides the value, so the fact that the assertion is there doesn't add a new dependency. In contrast, if you are writing code that does not actually care about the value that you are asserting on, the assertion does add a dependency without actually providing any significant value, so it's best to avoid it. Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc, line 855 at r29 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Thanks for explaining! If we don't have the assertions here, is there any other way to check that the RR never changes to Comments from Reviewable |
ff318c2
to
33cdd57
Compare
|
|
|
Anyway, it's so green, let's ship it! |
I don't think this is something we need to check for. In general, we should have tests that cover the behavior of RR, but it's hard to write tests for the absence of a behavior, unless there are specific conditions under which the code is likely to exhibit that behavior (and I don't think there are in this case). |
Currently, grpclb will hand off the re-resolution decision to its embedded RR policy. This PR lets grpclb to be in charge of its re-resolution. The connectivity of the balancers will also be taken into consideration so that grpclb will re-resolve only when both balancers and backends have been unreachable for some timeout.
This change is