Documentation: changing Eureka renewal frequency WILL break the self-preservation feature of the server #373

brenuart · 2015-06-02T18:09:09Z

In section "Why is it slow to register a Service?" of the documentation, it is said one can speed-up the client registration process by changing the heartbeat interval to a higher frequency (default is 30seconds).
The documentation also says it might not be a good idea in production without giving much explanations on the consequences.

One should be aware the self-preservation feature of the Eureka server makes the assumption clients are sending their heartbeat every 30 seconds - and this is not configurable. Using a different value will therefore break that functionality. It is definitely not a good idea to play with that parameter...

dsyer · 2015-06-08T05:42:20Z

It's fine for demos and development work (where you often want a faster turnaround and don't care about self preservation mode). If you'd like to explain what's happening it would be great to see a pull request for the documentation (it's right there in github next to the source code).

brenuart · 2015-06-08T08:41:29Z

I understand, but before asking for changes in the documentation I think my analysis needs to be reviewed by someone with better understanding of Eureka's internals and make sure the behaviour/consequences I describe are correct.

The point is that the Eureka server makes implicit assumption that clients are sending their heartbeat at a fixed rate of 1 every 30 seconds. If two instances are registered in the registry, the server expects to receive 2 instances * 1 heartbeat/30s * threshold % every minutes. With a threshold set at 85%, it expects 3 heartbeats in the last minute. If the rate drops below this value, the self protection mode is activated. If you loose one of the two instances, the server receives at most two heartbeats and activates the self protection mode.

Now, if clients are sending their heartbeats twice faster (every 15s) - the server receives 8 heartbeats per minutes and keeps receiving 4/min if you loose one of the two instances. Hence the self protection mode is not activated...

This examples shows the consequences of using a heartbeat frequency other than 30s: it breaks the self protection mode mechanism.

The initial registration is actually trigger by the first heartbeat: the client tries to send the first heartbeat and receives a "not found" answer from the server which means it doesn't know the instance. The client then immediately attempts to register the instance. This process only happens 30s (eureka.instance.leaseRenewalIntervalInSeconds) after startup - hence the extra delay before the instance shows up in the registry.

You can always speed-up the initial registration process by lowering the value of eureka.client.initialInstanceInfoReplicationIntervalSeconds. This parameter controls the initial delay before the client transmits changes made to the InstanceInfo (like the UP/DOWN status). Because of how it is initialized, the InstanceInfo is always dirty after startup - so the client will always try to replicate it at least once. Since this replication is implemented by re-registering the instance against the server, using a very low value for the initial delay will trigger the instance registration quicker... without changing the heart beat interval and therefore keeping the self-protection mode safe.

During demo/development, if you want to detect "dead" instances quicker, I would suggest to play with the eureka.instance.leaseExpirationDurationInSeconds parameter instead. The value is set to 90s by default, which means a lease is expired after 3 consecutive missing heartbeats. This is of course pretty long during demo/dev, but you can always lower it to 30s - and this won't affect the self-protection feature.

Hope all of this makes sense.

dsyer · 2015-06-08T09:02:24Z

Totally makes sense, but I'm not sure there is anyone with a better understanding of Eureka internals at this point (at least not a regular visitor to this project). Even the people I know at Netflix probably won't want to go into any more detail than that (most people just use it after all). We need some of your analysis in the documentation really, plus some sensible guidelines, and defaults that don't stop people from making progress quickly when they are getting started.

brenuart · 2015-06-08T09:06:11Z

Does that mean you feel as lonely as me?
;-)

brenuart · 2015-06-09T12:01:43Z

Here is a first attempt to answer the question "Why does it take so long to register an instance with Eureka?"
Tell me if you think this kind of information should go into the documentation. If so, I'll try to find some time to add more details and examples, together with "recommendations" on how to speed-up things where possible. However, it would be nice if other people could challenge the content... it is only what I understood after all. My english isn't good enough neither, so please, rephrase when needed.

(1) Client Registration

When using the default configuration, registration happens at the first heartbeat sent to the server. Since the client just started, the server doesn't know anything about it and replies with a 404 forcing the client to register. The client then immediately issues a second call with all the registration information. The client is now registered.

The first heartbeat happens 30 seconds after startup (eureka.instance.leaseRenewalIntervalInSeconds) - so your instance won't appear in the Eureka registry before this interval.

(2) Server ResponseCache

The server maintains a response cache that is updated every 30s by default (eureka.server.response-cache-update-interval-ms). So even if your instance is just registered, it won't appear in the result of a call to the /eureka/apps REST endpoint.

However, your instance may appear in the UI web interface just after registration. This is because the web front-end bypasses the response cache used by the REST API...

If you know the instanceId, you can still get some details from Eureka about it by calling /eureka/apps/<appName>/<instanceId>. This endpoint doesn't make use of the response cache. But since it requires to know the instance, it is of no help in the discovery process...

So, it may take up to another 30s for other clients to discover your newly registered instance.

(3) Client cache refresh

Eureka client maintain a cache of the registry information. This cache is refreshed every 30 seconds by default ('eureka.client.registryFetchIntervalSeconds`). So again, it may take another 30s before a client decides to refresh its local cache and discover newly registered instances.

(4) LoadBalancer refresh

The load balancer used by Ribbon gets its information from the local Eureka client. It also maintains a local cache to avoid calling the discovery client for every request. This cache is refreshed every 30s (ribbon.serverListRefreshInterval). So again, it may take another 30s before your Ribbon client can make use of the newly registered instance.

Note: this local cache is apparently required only to reduce the cost of obtaining server information from the used ServerList. This is not the case with none of the server list provided by default: DiscoveryEnabledNIWSServerList with Eureka, ConfigurationBasedServerList without.

At the end, if you are not lucky, it may take up to 2 minutes before your newly registered instance starts receiving trafic from other clients.

andrewserff · 2015-11-24T05:08:42Z

I'm trying to figure out how to get the delay between server start and registration in Zuul as low as possible in development only. With the defaults, we are stuck twiddling our thumbs for like 2 min every time we restart a service and then want to hit it through our zuul proxy. The information from @brenuart has helped a little, but I still can't seem to get it right. I would love some help on getting a configuration for spring.profiles: dev that has the whole registration process down to the seconds. This is what I've tried so far:
Zuul Config:

spring:
    profiles: dev

eureka:
    instance:
        registryFetchIntervalSeconds: 1
        leaseRenewalIntervalInSeconds: 2
        leaseExpirationDurationInSeconds: 5
    client:
        initialInstanceInfoReplicationIntervalSeconds: 5
ribbon:
    ServerListRefreshInterval: 1000

Service Config:

spring.profiles: dev
eureka:
    instance:
        registryFetchIntervalSeconds: 1
        leaseRenewalIntervalInSeconds: 2
    client:
        initialInstanceInfoReplicationIntervalSeconds: 5

Right now, Zuul isn't fully noticing that the service is down (I just get a blank page and not a forwarding error like usual). Maybe once we figure this out, it could be added to the documentation that @brenuart wrote? If I should post this over on SO instead, let me know.

Also, @brenuart, I think you documentation is much better than what's there, but it would be great to add all the options (like the eureka.client.initialInstanceInfoReplicationIntervalSeconds) to the documentation.

brenuart · 2015-11-24T09:46:39Z

Not much time for the moment but I'll try to extend coverage of "my" doc. Maybe I should coordinate with @spencergibb to incorporate those few lines into the official doc and have them to be reviewed by them.

spencergibb · 2015-11-25T19:22:35Z

Another stack overflow question. http://stackoverflow.com/questions/33921557/understanding-spring-cloud-eureka-server-parameters-and-configuration

I'm going to take a shot next week.

spencergibb · 2016-05-23T23:52:54Z

Related docs #203

mransonwang · 2016-09-19T07:49:02Z

Just did a testing on local machine, the extreme configuration as the following:

Eureka server:

server:
port: 8761

eureka:
instance:
hostname: localhost
server:
response-cache-update-interval-ms: 500
eviction-interval-timer-in-ms: 500
client:
register-with-eureka: false
fetch-registry: false
service-url:
default_zone: http://${eureka.instance.hostname}:${server.port}/eureka/

Two service end:

server:
port: 1111

spring:
application:
name: Compute-Service

eureka:
instance:
hostname: localhost
prefer-ip-address: true
lease-renewal-interval-in-seconds: 1
lease-expiration-duration-in-seconds: 2
client:
initial-instance-info-replication-interval-seconds: 0
instance-info-replication-interval-seconds: 1
registry-fetch-interval-seconds: 1
service-url:
default-zone: http://localhost:8761/eureka/

server:
port: 2222

spring:
application:
name: Compute-Service

eureka:
instance:
hostname: localhost
prefer-ip-address: true
lease-renewal-interval-in-seconds: 1
lease-expiration-duration-in-seconds: 2
client:
initial-instance-info-replication-interval-seconds: 0
instance-info-replication-interval-seconds: 1
registry-fetch-interval-seconds: 1
service-url:
default-zone: http://localhost:8761/eureka/

The result I observed the if one of services went down, the other service got the instance list from Eureka server was quickly, no more than 5 seconds in my environment. I think it's suitable for dev/test purpose.

But I wonder if there's simple way to reset Ribbon serverListRefreshInterval value in a Spring Boot application? @brenuart

asarkar · 2017-01-17T22:46:48Z

@brenuart I've a related question for which I created a ticket in the Eureka repo. What does "self-preservation" mean actually and does it work the same way for Eureka peers vs. clients?

Netflix/eureka#890

Also see, #1627

brenuart · 2017-01-18T10:43:43Z

self-preservation is a mechanism by which the Eureka registry stops expiring entries when it detects that an "important" amount of services didn't renew their lease in time. This should protect the registry from clearing all entries when a (partial) network failure occurs.

asarkar · 2017-01-21T04:32:34Z

I've created a blog post with the details of Eureka here, that fills in some missing detail from Spring doc or Netflix blog. It is the result of several days of debugging and digging through source code.

spencergibb · 2017-01-25T18:56:03Z

@asarkar would you mind if we (or you via PR) integrate your information into our documentation?

asarkar · 2017-01-25T18:59:59Z

@spencergibb I can PR it but there are some areas that I'd like to be elaborated, especially regions and zones. My post also has links to 2 open tickets that I believe should be answered/closed first because my post references to those.

…k - see spring-cloud/spring-cloud-netflix#373

sunnykaka · 2017-09-15T14:51:56Z

@brenuart Your configuration is not correct, right one is ribbon.ServerListRefreshInterval.

nitindhomse · 2022-01-12T13:43:06Z

I have similar issue, I have posted question here, could you please help me - https://stackoverflow.com/questions/70648380/what-would-be-the-best-self-preservation-time-parameter-configuration-for-eureka

BrokenWingsIcarus · 2022-06-08T02:17:27Z

The application can call Eureka's API to force the refresh state at startup or shutdown

dsyer added the documentation label Jun 8, 2015

spencergibb added Backlog and removed Backlog labels Nov 17, 2015

spencergibb added the ready label Dec 2, 2015

sfat mentioned this issue Jun 13, 2016

How long is the registration information retained on the client side after eureka server closed #1095

Closed

asarkar mentioned this issue Jan 18, 2017

Relationship among various Eureka properties and how those affect Eureka clustering and failover #1627

Closed

asarkar mentioned this issue Feb 3, 2017

Eureka peer-peer communication doc is ambiguous Netflix/eureka#890

Closed

jonashackt added a commit to jonashackt/cxf-spring-cloud-netflix-docker that referenced this issue Jun 24, 2017

Configuring the hell out of Zuul & Eureka - but it doesn´t really wor…

f83f8c5

…k - see spring-cloud/spring-cloud-netflix#373

fahimfarookme mentioned this issue Nov 26, 2017

Eureka Client Instance Health Check is failing. #2458

Closed

dansali mentioned this issue Jun 15, 2018

Add GZIP to api calls CognizantGrads-TeamALT/grizzly-backend#92

Merged

mgtriffid mentioned this issue Jul 2, 2018

Add possibility to configure expected interval between clients' renew… Netflix/eureka#1093

Merged

brenuart mentioned this issue Nov 2, 2018

eureka registration time lag with ribbon or springclound? #3263

Closed

stefanocke mentioned this issue Jun 16, 2020

Clarify whether to use cache for Spring Cloud LoadBalancer with Eureka Discovery Client spring-cloud/spring-cloud-commons#775

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation: changing Eureka renewal frequency WILL break the self-preservation feature of the server #373

Documentation: changing Eureka renewal frequency WILL break the self-preservation feature of the server #373

brenuart commented Jun 2, 2015

dsyer commented Jun 8, 2015

brenuart commented Jun 8, 2015

dsyer commented Jun 8, 2015

brenuart commented Jun 8, 2015

brenuart commented Jun 9, 2015

andrewserff commented Nov 24, 2015

brenuart commented Nov 24, 2015

spencergibb commented Nov 25, 2015

spencergibb commented May 23, 2016

mransonwang commented Sep 19, 2016

asarkar commented Jan 17, 2017 •

edited

brenuart commented Jan 18, 2017

asarkar commented Jan 21, 2017

spencergibb commented Jan 25, 2017

asarkar commented Jan 25, 2017

sunnykaka commented Sep 15, 2017

nitindhomse commented Jan 12, 2022

BrokenWingsIcarus commented Jun 8, 2022

Documentation: changing Eureka renewal frequency *WILL* break the self-preservation feature of the server #373

Documentation: changing Eureka renewal frequency *WILL* break the self-preservation feature of the server #373

Comments

brenuart commented Jun 2, 2015

dsyer commented Jun 8, 2015

brenuart commented Jun 8, 2015

dsyer commented Jun 8, 2015

brenuart commented Jun 8, 2015

brenuart commented Jun 9, 2015

(1) Client Registration

(2) Server ResponseCache

(3) Client cache refresh

(4) LoadBalancer refresh

andrewserff commented Nov 24, 2015

brenuart commented Nov 24, 2015

spencergibb commented Nov 25, 2015

spencergibb commented May 23, 2016

mransonwang commented Sep 19, 2016

asarkar commented Jan 17, 2017 • edited

brenuart commented Jan 18, 2017

asarkar commented Jan 21, 2017

spencergibb commented Jan 25, 2017

asarkar commented Jan 25, 2017

sunnykaka commented Sep 15, 2017

nitindhomse commented Jan 12, 2022

BrokenWingsIcarus commented Jun 8, 2022

Documentation: changing Eureka renewal frequency WILL break the self-preservation feature of the server #373

Documentation: changing Eureka renewal frequency WILL break the self-preservation feature of the server #373

asarkar commented Jan 17, 2017 •

edited