Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eureka seems to do not recalculate numberOfRenewsPerMinThreshold during evicting expired leases #2407

Closed
bakomchik opened this issue Oct 30, 2017 · 12 comments

Comments

@bakomchik
Copy link

bakomchik commented Oct 30, 2017

Hi
in my test environment services are restarts frequently by mesos-marathon without unregistration(SIGKILL)
so i noticed that eureka instance is in Self-preservation-mode frequently.
eureka has 65 registered services ,but numberOfRenewsPerMinThreshold is 368
that seems to be incorrect.
i found that numberOfRenewsPerMinThreshold decreased only via REST-API

com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#cancel

but when lease expired

com.netflix.eureka.registry.AbstractInstanceRegistry#evict(long)

numberOfRenewsPerMinThreshold does not decreases,
so numberOfRenewsPerMinThreshold does not correlate with real instance count and expectedNumberOfRenewsPerMin

some greps:

REGISTERD

grep "Registered instance" eureka.log| wc -l
----
286

EXPIRED

grep "Registry: expired lease for" eureka.log| wc -l
----
213

RENEWAL

grep "Current renewal" eureka.log
----
929790926 2017-10-30 10:08:11,991 INFO  [ ReplicaAwareInstanceRegistry - RenewalThresholdUpdater ] c.n.e.r.PeerAwareInstanceRegistryImpl                   | Current renewal threshold is : 368
@p-zalejko
Copy link

Hi,
I also ran into the same problem and have been investigating it for a while. Here is what I found:

The "Renews threshold" is decreased if a service is shutdown gracefully. Then, before going down it calls com.netflix.eureka.resources.InstanceResource#cancelLease endpoint. It executes PeerAwareInstanceRegistryImpl#cancel method that updates "Renews threshold" value.

If, for instance, a service was killed by SIGKILL then the "cancelLease" is not called. At some point the "evict" method gets triggered and eventually it executes
org.springframework.cloud.netflix.eureka.server.InstanceRegistry#internalCancel method. And there is an issue: it cancels the instance but does not call the com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#cancel method (which updates "Renews threshold" value).

In my case, I tested this issue in the following way:

  1. For testing purposes, I updated my configuration to get services unregistered quickly:
eureka:
  client:
    registry-fetch-interval-seconds: 15
  server:
    evictionIntervalTimerInMs: 10000
    renewalThresholdUpdateIntervalMs: 15000
  instance:
    lease-renewal-interval-in-seconds: 15
    lease-expiration-duration-in-seconds: 35
    registry-fetch-interval-seconds: 15
  1. Launched an eureka server
  2. Launched 3 services and registered them within the eureka server.
  3. Started killing one of them and restarting again.

After repeating the step 3 many times I got a big "Renews threshold" value, even greater than Renews (last min) . If the self-preservation mode is enabled it can lead to problems because if the "Renews threshold" is never decreased then even after launching back all services, all the renews sent by them might not be enough to escape from the activated "self-preservation" mode (if it has been already activated).

@holy12345
Copy link
Contributor

@p-zalejko I've been following this discussion, is this problem solved?

@p-zalejko
Copy link

Hi, @holy12345 . I think it isn't. I tested a milestone version (2.x) and it behaves the same.

@holy12345
Copy link
Contributor

holy12345 commented Nov 29, 2017

@p-zalejko Hi,
This will not happen if Eureka Server self-preservation is turned off. The value of Renews threshold is correct.

PeerAwareInstanceRegistryImpl#updateRenewalThreshold()

    private void updateRenewalThreshold() {
        try {
            Applications apps = eurekaClient.getApplications();
            int count = 0;
            for (Application app : apps.getRegisteredApplications()) {
                for (InstanceInfo instance : app.getInstances()) {
                    if (this.isRegisterable(instance)) {
                        ++count;
                    }
                }
            }
            synchronized (lock) {
                // Update threshold only if the threshold is greater than the
                // current expected threshold of if the self preservation is disabled.
                if ((count * 2) > (serverConfig.getRenewalPercentThreshold() * numberOfRenewsPerMinThreshold)
                        || (!this.isSelfPreservationModeEnabled())) {
                    this.expectedNumberOfRenewsPerMin = count * 2;
                    this.numberOfRenewsPerMinThreshold = (int) ((count * 2) * serverConfig.getRenewalPercentThreshold());
                }
            }
            logger.info("Current renewal threshold is : {}", numberOfRenewsPerMinThreshold);
        } catch (Throwable e) {
            logger.error("Cannot update renewal threshold", e);
        }
    }

First of all, this scheduled task defaults to every 15 minutes. if self-preservation is off, the Renews threshold value is recalculated. if self-preservation is turned on, there is no problem as long as the first if statement is true,but i do not understand how to make this condition .apps.getRegisteredApplications() is to get all registered client information?

Do I understand it correctly? Best wishes.

@p-zalejko
Copy link

p-zalejko commented Nov 29, 2017

Hi @holy12345, I think you are right:

If the self-preservation is turned off (server.enableSelfPreservation: false) then we do not have a problem because we do not have self-preservation ;)

But if it is enabled I would expect that thePeerAwareInstanceRegistryImpl#updateRenewalThreshold()updated the 'expectedNumberOfRenewsPerMin' periodically.

Unfortunately, in terms of the PeerAwareInstanceRegistryImpl#updateRenewalThreshold() I do not know why the Applications apps = eurekaClient.getApplications() returns an empty list of apps in my case. As a result, count is 0 so it never gets into the if statement and update the expectedNumberOfRenewsPerMin. It might be a different issue, that could also solve the problem we are discussing on.

@holy12345
Copy link
Contributor

@p-zalejko First of all thank you for your reply :) You're right, count is 0 too in my test environment.What i am trying to do next is to understand why the count is zero, maybe solve this problem, and the problem you ask is solved.If there is progress, I will inform you first, best wishes!

@holy12345
Copy link
Contributor

@p-zalejko Hello, is your eureka server one? I think it should not be a cluster. If it is not a cluster, the value actually obtained is an empty list. best wishes

@yolenw
Copy link

yolenw commented Apr 26, 2018

i have to follow up on this one, does this mean we are not expected to turn on "self-preservation mode" for eureka server in a none-cluster mode? thanks. @holy12345

@spencergibb
Copy link
Member

Closing this due to inactivity. Please re-open if there's more to discuss.

@Jeffrey-Hassan
Copy link

Jeffrey-Hassan commented Aug 7, 2019

@spencergibb This issue seems to persist as of latest version (1.9.12).

I came across this as well and did some digging on the source, it seems to go to the difference between a graceful shutdown and a non-graceful shutdown (e.g. the EvictTask is removing the instances, rather than a call to cancel a lease).

Ultimately, the AbstractInstanceRegistry.evict method makes a call to "internalCancel" which does not modify the "expectedNumberOfClientsSendingRenews" count. Now we have the "count" of total instances from the update task, which is calculated in PeerAwareInstanceRegistryImpl.updateRenewalThreshold by cycling through all apps and then instances of each app and keeping a tally, with one less for the evicted instance, but the "expectedNumberOfClientsSendingRenews" does not account for that evicted instance.

For a graceful shutdown of a service, PeerAwareInstanceRegistryImpl.cancel seems to be called, which decrements "expectedNumberOfClientsSendingRenews" properly.

I would've expected the variable to be decremented regardless of whether it's an eviction from the timer task or a graceful shutdown. The best place seems to be before internalCancel is called in AbstractInstanceRegistry.evict. Thoughts?

@troshko111
Copy link

@Jeffrey-Hassan this has been fixed in v1.9.17.

@spencergibb
Copy link
Member

See #3743

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants