Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move name resolution retry from managed channel to name resolver. #9758

Merged
merged 4 commits into from
Dec 16, 2022

Conversation

temawi
Copy link
Contributor

@temawi temawi commented Dec 15, 2022

This change has these main aspects to it:

  1. Removal of any name resolution responsibility from ManagedChannelImpl
  2. Creation of a new RetryScheduler to own generic retry logic
    • Can also be used outside the name resolution context
  3. Creation of a new RetryingNameScheduler that can be used to wrap any polling name resolver to add retry capability
  4. A new facility in NameResolver to allow implementations to notify listeners on the success of name resolution attempts
    • RetryingNameScheduler relies on this

@temawi temawi changed the title Name resolution from ManagedChannel to DnsNameResolver Name resolution retry from ManagedChannel to DnsNameResolver Dec 15, 2022
@temawi temawi changed the title Name resolution retry from ManagedChannel to DnsNameResolver Name resolution retry from managed channel to name resolver. Dec 15, 2022
This change has these main aspects to it:

1. Removal of any name resolution responsibility from ManagedChannelImpl
2. Creation of a new RetryScheduler to own generic retry logic
     - Can also be used outside the name resolution context
3. Creation of a new RetryingNameScheduler that can be used to wrap any
   polling name resolver to add retry capability
4. A new facility in NameResolver to allow implementations to notify
   listeners on the success of name resolution attempts
     - RetryingNameScheduler relies on this
@larry-safran larry-safran changed the title Name resolution retry from managed channel to name resolver. Name resolution retry move from managed channel to name resolver. Dec 15, 2022
@temawi temawi changed the title Name resolution retry move from managed channel to name resolver. Move name resolution retry from managed channel to name resolver. Dec 15, 2022
Copy link
Contributor

@larry-safran larry-safran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpointing review will continue tomorrow.

api/src/main/java/io/grpc/NameResolver.java Outdated Show resolved Hide resolved
* @since 1.21.0
*/
public abstract void onResult(ResolutionResult resolutionResult);
public abstract boolean onResult(ResolutionResult resolutionResult);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should make sure that in release notes this change is mentioned since it breaks source-code compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, this will be highlighted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejona86 are we now not trying to provide a smoother transition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy I had spoken to Terry about shouldn't have needed to change this boolean. In fact, the boolean right now is rather broken from a current threading and future threading perspective. Other parts of this change aren't quite right either. We should roll back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting in #9767

core/src/main/java/io/grpc/internal/RetryScheduler.java Outdated Show resolved Hide resolved
if (scheduledHandle != null && scheduledHandle.isPending()) {
return -1;
}
long delayNanos = policy.nextBackoffNanos();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want a check to verify that it returns a positive number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could, but if the task can't be scheduled for some reason we would get a RejectedExecutionException which is pretty good at conveying what wen't wrong. Since the existing code to schedule retries didn't make this upfront check either, I'm thinking it's ok the way it is. Let me know if you have a stronger opinion on it.

newResolver(name, 81, GrpcUtil.NOOP_PROXY_DETECTOR, Stopwatch.createUnstarted(fakeTicker));
DnsNameResolver resolver = (DnsNameResolver) newResolver(
name, 81, GrpcUtil.NOOP_PROXY_DETECTOR,
Stopwatch.createUnstarted(fakeTicker)).getRetriedNameResolver();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't you using the RetryingNameResolver for the shutdown? Seems like getting the delegate makes mistakes much easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that would be way more precise. I updated all the tests to only use DnsNameResolver when they specifically need its API.

@temawi temawi merged commit 43bc578 into grpc:master Dec 16, 2022
@temawi temawi deleted the dns-backoff branch December 16, 2022 23:31
ejona86 added a commit to ejona86/grpc-java that referenced this pull request Dec 20, 2022
…ver. (grpc#9758)"

This reverts commit 43bc578. It breaks
API without stair-stepping and needs to be tweaked architecturally.
@ejona86
Copy link
Member

ejona86 commented Dec 20, 2022

The boolean onResult() return will be wrong/delayed or thread-unsafe. There's two possibilities:

  • The NR is calling onResult from sync context. In this case lastAddressesAccepted won't be updated in-line in onResult() so the return value will be the previous call's value. Long-term, we probably need onResult to be called from sync context
  • The NR is not calling onResult from sync context. In this case lastAddressesAccepted may still be out-dated, as the runnable can run on other threads. But even if it has been run already lastAddressesAccepted is read outside of the sync context and isn't volatile.

I had seen that problem earlier and had suggested to Terry to avoid it by doing the callback. It seems the approach wasn't entirely clear. What I had imagined had no new public API, but would have some private hackery between RetryingNameResolver and ManagedChannelImpl until the point we can fix the threading issues.

Approach:

  1. ManagedChannelImpl can't return a boolean, so use a callback instead. To avoid adding temporary API we'd delete once the return is fixed, pass the callback as an Attribute in ResolutionResult. ManagedChannelImpl and RetryingNameResolver would both have visibility to the Attributes.Key
  2. RetryingNameResolver should implement start(Listener2) and wrap the Listener2 with its own Listener. Within onResult(), it would modify the ResolutionResult to add the Attributes.Key for the callback before calling the original onResult(). (Note that there's some annoyance here as start(Listener) will need an implementation copied from the base class to delegate to start(Listener2) since it is overridden in ForwardingNameResolver.)

ejona86 added a commit that referenced this pull request Dec 20, 2022
…ver. (#9758)"

This reverts commit 43bc578. It breaks
API without stair-stepping and needs to be tweaked architecturally.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants