-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #311. DiscoveryClientNameResolver leak and performance issue. #313
Conversation
4f984c0
to
ae51cdd
Compare
Do you want a preliminary review/feedback or only after you're done? |
Patch has been deployed to our staging environment for test and ready for review. @ST-DDT |
Looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all a very good improvement just, there are just a few things I would like to address.
...ure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientResolverFactory.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientResolverFactory.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientResolverFactory.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientResolverFactory.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientResolverFactory.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientResolverFactory.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientResolverFactory.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientResolverFactory.java
Outdated
Show resolved
Hide resolved
...figure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientNameResolver.java
Outdated
Show resolved
Hide resolved
...figure/src/main/java/net/devh/boot/grpc/client/nameresolver/DiscoveryClientNameResolver.java
Outdated
Show resolved
Hide resolved
There might be a rare case, where |
I created a patch with some fixes/changes/ideas. Unfortunately I forgot to commit them one by one, so the commit looks somewhat messy. |
Any reason why listener did not receiver their first(second) update? If there is such case, current implementation has the same issue. New listener has been notified when registering or put into the list which will pass to |
I have reviewed your patch, it does improve a lot. But I have only one concern about your patch. You notify all listeners in the |
The issue is that the cache updates aren't distributed to newly registered listeners (Race-Condition). I will try to explain the issue using code: public final synchronized void registerListener(String name , Listener listener ) { // name = "service", listener = listener2
listenerMap.computeIfAbsent(name, n -> Sets.newHashSet()).add(listener); // listener2 gets added to [listener1]
List<ServiceInstance> instances = serviceInstanceMap.get(name); // instances = ["10.0.0.1"]
if (instances != null) { // true
List<EquivalentAddressGroup> targets = convert(name, instances); // targets = ["10.0.0.1"]
if (targets.isEmpty()) { // false
// serviceInstanceMap.remove(name);
} else {
listener.onAddresses(targets, Attributes.EMPTY); // Addresses set to ["10.0.0.1"]
}
}
refresh(name);
} public final synchronized void refresh(String name) { // name = "service"
Future<List<ServiceInstance>> future = discoverClientTasks.get(name); // future = future12345 (running)
// no resolver is running with this service name.
if (CollectionUtils.isEmpty(listenerMap.get(name))) { // false
// return;
}
// exit when resolving but not a force refresh
if (resolving(future) && !forceRefresh(name)) { // resolving && no force refresh => true
return; // Exit
}
[...]
} Next refresh interval: public final synchronized void refresh(String name) { // name = "service"
Future<List<ServiceInstance>> future = discoverClientTasks.get(name); // future = future12345 (done)
// no resolver is running with this service name.
if (CollectionUtils.isEmpty(listenerMap.get(name))) { // false
// return;
}
// exit when resolving but not a force refresh
if (resolving(future) && !forceRefresh(name)) { // false && true => false
// return;
}
// update cached instances when not a force refresh.
if (!forceRefresh(name)) { // true
List<ServiceInstance> instances = getResolveResult(future); // instances = ["10.0.0.2"]
if (instances != KEEP_PREVIOUS) {
serviceInstanceMap.put(name, instances); // ["10.0.0.2"] replaces ["10.0.0.1"] and listener2 not updated
}
}
discoverClientTasks.put(name,
executor.submit(new Resolve(name,
Sets.newHashSet(listenerMap.get(name)), // [listener1, listener2]
Lists.newArrayList(serviceInstanceMap.get(name)) // ["10.0.0.2"]
)) // future67890
); // future67890 replaces future12345
} // future67890
// name = "service"
private List<ServiceInstance> resolveInternal() {
final List<ServiceInstance> newInstanceList = client.getInstances(name); // newInstanceList = ["10.0.0.2"]
if (CollectionUtils.isEmpty(newInstanceList)) { // false
[...]
}
if (!needsToUpdateConnections(newInstanceList)) { // still ["10.0.0.2"] => no need to update => true
return KEEP_PREVIOUS; // Exit => listener2 will never be updated with ["10.0.0.2"]
}
[...]
} I hope this explains the issue. EDIT: On further thought, the issue happens if a listener is registered after the first resolution and the last resolution returned a different result then the cache one. While writing the above explanation I actually found a more serious problem, but that is easy to fix. If for some reason the connection to the discovery service fails, then Fix: private void notifyStatus(Status status) {
serviceInstanceMap.put(name, ImmutableList.of()); // <--- Reset cache (needs synchronization)
for (Listener listener : savedListenerList) {
listener.onError(status);
}
} |
According to the javadocs: /**
* Receives address updates.
*
* <p>All methods are expected to return quickly. // <---
*
* @since 1.0.0
*/
@ExperimentalApi("https://github.com/grpc/grpc-java/issues/1770")
@ThreadSafe
public interface Listener { So it shouldn't take "that" long.
If we create a copy, then we might run into the above cache issue again We could try to avoid that by invoking force refreshs when registering new listeners, but this could cause performance issues as well. I also considered removing the "nothing has changed check", but I'm not sure about the performance implications here (maybe this forces all connections to be reopened, which would be mind blowing expensive) (See also: grpc/grpc-java#6524) |
I just got a reply from: grpc/grpc-java#6523 (comment)
So we should probably update our code here as well. |
@wangzw Do you have time to fix this in the next few days? |
I will do it in this week. |
I updated my branch to also include a unit test |
内存泄漏确实存在,但是性能问题似乎是和 |
Each channel has a resolver, and each resolver sends request to Eureka independently. So if there are many channels. Eureka will receive many duplicated request from different resolvers. So move |
@ST-DDT I have merged you patch and make some small modifications. I moved the code of notifying listeners out of synchronized block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果一个service有多个服务提供者,实际client对此service也只有一个channel和一个
NameResolver
, 不用根据service name对NameResolver
分组并一起刷新。这一整块代码看起来修改的有点多。Each channel has a resolver, and each resolver sends request to Eureka independently. So if there are many channels. Eureka will receive many duplicated request from different resolvers. So move
DiscoveryClient::getInstances
from resolver toDiscoveryClientResolverFactory
or any other singleton object will reduce the request quantity significantly.
Do you have any metrics/data that show how long each call takes? I looked at the sources and it looks like it uses a local cache (inside eureka's discovery client, so it does not make any HTTP calls at all) that get asynchronously updated every 30 seconds and thus can return almost instantly.
for (final Listener2 listener : listeners) { | ||
listener.onResult(result); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to change this to preserve the if (serviceList != KEEP_PREVIOUS) {
logic.
E.g. by adding an else statement above that returns early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if serviceList == KEEP_PREVIOUS
is true, listeners
should be empty and complete the loop quickly.
查询服务列表是从本地缓存查的 |
Got intensive eureka request message from log, not sure if the request is cached or not. |
I will wait for a response on grpc/grpc-java#6524 (comment) before possibly merging this PR. |
The bug itself has been fixed with #320 . Thanks for bringing the issue to our attention as well as contributing a lot to solving it. |
DiscoveryClientResolverFactory
to reduce the workload of Eureka.DiscoveryClientNameResolver
to null to avoid leak ofChannel
.DiscoveryClientResolverFactory
@ST-DDT
Fixes #311