EIP publicip association not correctly updated on fresh instance #1321

Open
nick-pww opened this Issue Sep 6, 2016 · 22 comments

Comments

Projects
None yet
8 participants
@nick-pww
Contributor

nick-pww commented Sep 6, 2016

I've been directed over here from the eureka folks, as they believe this should just 'work'. Have the following issue running off spring-cloud-netflix:1.1.4.RELEASE. The issue I opened over there is: Netflix/eureka#840

There seems to be a problem with public EIP address association not being correctly updated when a new AWS server starts and has a new Eureka server starting with it. When the server starts up, it correctly registers itself:

2016-09-06 15:55:29.040  WARN 3399 --- [Thread-10] com.netflix.eureka.aws.EIPManager        : The selected EIP 54.67.102.122 is associated with another instance i-0666b391 according to AWS, hence skipping this
2016-09-06 15:55:29.628  INFO 3399 --- [Thread-10] com.netflix.eureka.aws.EIPManager        :


Associated i-25f11391 running in zone: us-west-1c to elastic IP: X.X.X.X

But, every minute after that we get the following log entry:

2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Got 1 instances from neighboring DS node
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : No peers needed to prime.
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Changing status to UP
2016-09-06 16:24:55.713  WARN 3399 --- [Eureka-EIPBinder] com.netflix.eureka.aws.EIPManager        : The selected EIP X.X.X.X is associated with another instance i-0666b391 according to AWS, hence skipping this
2016-09-06 16:24:55.804  INFO 3399 --- [Eureka-EIPBinder] com.netflix.eureka.aws.EIPManager        : My instance i-25f11391 seems to be already associated with the EIP X.X.X.X

Debugging this, the call to isEIPBound() is always failing, and this is because the following is always null:

String myPublicIP = ((AmazonInfo) myInfo.getDataCenterInfo()).get(MetaDataKey.publicIpv4);

It looks like there is stale datacenterinfo and it never gets refreshed (from what I can tell) and there there are no settings available to have it refreshed automatically.

The odd side affect of this, and we noticed, is that the registry continually gets wiped, and reset causing obvious potential issues down stream for our clients.

I have been trying to find where this datacenter info might be refreshed, but am unable to find anything that might actually do that.

The deployed app only has a single main class in it:

@SpringBootApplication
@EnableEurekaServer
@EnableAutoConfiguration
public class EurekaServer {

    @Value("${server.port}")
    private Integer nonSecurePort;
    @Autowired
    private InetUtils utils;

    public static void main(String[] args) {
        new SpringApplicationBuilder(EurekaServer.class).web(true).run(args);
    }

    @Bean
    @Profile("aws")
    public EurekaInstanceConfigBean awsEurekaConfig() {
        EurekaInstanceConfigBean b = new EurekaInstanceConfigBean(utils);
        b.setNonSecurePort(nonSecurePort);
        b.setSecurePortEnabled(false);
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        b.setDataCenterInfo(info);
        return b;
    }

}
@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Sep 6, 2016

Member

Interesting. I assume this is running on AWS? What is the configuration?

Member

spencergibb commented Sep 6, 2016

Interesting. I assume this is running on AWS? What is the configuration?

@nick-pww

This comment has been minimized.

Show comment
Hide comment
@nick-pww

nick-pww Sep 6, 2016

Contributor

Yes, running on AWS. Here are the relevant configs (coming from spring-cloud config server):
Global config for all apps:

eureka.instance.leaseRenewalIntervalInSeconds=30
eureka.client.healthcheck.enabled=true
eureka.datacenter=cloud

Config for just the server apps:

eureka:
    client:
        registerWithEureka: false
        fetchRegistry: false

And servers have:

eureka.client.serviceUrl.defaultZone=....

setup as well with the relevant EIPs assigned.

Contributor

nick-pww commented Sep 6, 2016

Yes, running on AWS. Here are the relevant configs (coming from spring-cloud config server):
Global config for all apps:

eureka.instance.leaseRenewalIntervalInSeconds=30
eureka.client.healthcheck.enabled=true
eureka.datacenter=cloud

Config for just the server apps:

eureka:
    client:
        registerWithEureka: false
        fetchRegistry: false

And servers have:

eureka.client.serviceUrl.defaultZone=....

setup as well with the relevant EIPs assigned.

@qiangdavidliu

This comment has been minimized.

Show comment
Hide comment
@qiangdavidliu

qiangdavidliu Sep 6, 2016

Contributor

@nick-pww I just noticed your config. The thread that DiscoveryClient uses to refresh local instanceInfo (and hence datacenterInfo) is only started if registerWithEureka is true (it tries to save the extra cpu resource if registration is not configured). Is there a reason you are configured with register = false?

Contributor

qiangdavidliu commented Sep 6, 2016

@nick-pww I just noticed your config. The thread that DiscoveryClient uses to refresh local instanceInfo (and hence datacenterInfo) is only started if registerWithEureka is true (it tries to save the extra cpu resource if registration is not configured). Is there a reason you are configured with register = false?

@nick-pww

This comment has been minimized.

Show comment
Hide comment
@nick-pww

nick-pww Sep 6, 2016

Contributor

@qiangdavidliu Going off several examples and docs. One of which is here:
https://spring.io/guides/gs/service-registration-and-discovery/

I can turn that off, but one problem I had before that with that and 'fetchRegistry' on was that the servers were essentially always 'registering' applications even if they were no longer up because it was getting info from the other eureka servers. Basically, applications would never unregister, and if they did, they had a good chance of coming back when the servers synced again.

Also, I've read in other places that having the server register with itself can make the 'renew' threshold act oddly in some cases.

Will try to re-enable just that option and see what happens.

Contributor

nick-pww commented Sep 6, 2016

@qiangdavidliu Going off several examples and docs. One of which is here:
https://spring.io/guides/gs/service-registration-and-discovery/

I can turn that off, but one problem I had before that with that and 'fetchRegistry' on was that the servers were essentially always 'registering' applications even if they were no longer up because it was getting info from the other eureka servers. Basically, applications would never unregister, and if they did, they had a good chance of coming back when the servers synced again.

Also, I've read in other places that having the server register with itself can make the 'renew' threshold act oddly in some cases.

Will try to re-enable just that option and see what happens.

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Sep 6, 2016

Member

Also from Netflix/eureka#840 (comment) (typo fixed)

Note that the Amazon based datacenter info refreshes in ApplicationInfoManager only occurs if the config is of CloudInstanceConfig.

Our config isn't a CloudInstanceConfig

Member

spencergibb commented Sep 6, 2016

Also from Netflix/eureka#840 (comment) (typo fixed)

Note that the Amazon based datacenter info refreshes in ApplicationInfoManager only occurs if the config is of CloudInstanceConfig.

Our config isn't a CloudInstanceConfig

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Sep 6, 2016

Member

@nick-pww those guides are for single instance eureka's, production should be a peered cluster, see #1251.

Member

spencergibb commented Sep 6, 2016

@nick-pww those guides are for single instance eureka's, production should be a peered cluster, see #1251.

@nick-pww

This comment has been minimized.

Show comment
Hide comment
@nick-pww

nick-pww Sep 6, 2016

Contributor

@spencergibb It's not really clear that those are 'development' only options that should be set. Would recommend that a large note or something goes in there stating such.

@qiangdavidliu + @spencergibb I've changed the config but still have the same issue with new instances. I'm still getting the:

2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..

messages, and it's still resetting every minute. Both servers are registering with each other and show up in the list of applications, but the one where I cleared the EIP and restarted is exhibiting this still, while the one that I didn't seems to be working as expected.

(new config edit)

eureka:
    client:
        registerWithEureka: true
        fetchRegistry: false
Contributor

nick-pww commented Sep 6, 2016

@spencergibb It's not really clear that those are 'development' only options that should be set. Would recommend that a large note or something goes in there stating such.

@qiangdavidliu + @spencergibb I've changed the config but still have the same issue with new instances. I'm still getting the:

2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..

messages, and it's still resetting every minute. Both servers are registering with each other and show up in the list of applications, but the one where I cleared the EIP and restarted is exhibiting this still, while the one that I didn't seems to be working as expected.

(new config edit)

eureka:
    client:
        registerWithEureka: true
        fetchRegistry: false
@florind

This comment has been minimized.

Show comment
Hide comment
@florind

florind Sep 7, 2016

I am actually struggling with the exact same issue.
Explicitly setting hostname and IP address in the EurekaInstanceConfigBean @bean is also not working:

        eurekaInstanceConfig.setIpAddress(info.get(AmazonInfo.MetaDataKey.publicIpv4));
        eurekaInstanceConfig.setHostname(info.get(AmazonInfo.MetaDataKey.publicHostname));

as this bean seems to be initialized before EIPManager binds an EIP address and so both values are null.
The lame hack so far is that I listen to EurekaRegistryAvailableEvent and restart the application if EurekaInstanceConfigBean.getHostname() is null as the second time around the EIP is already bound to the aws instance and it all works...

florind commented Sep 7, 2016

I am actually struggling with the exact same issue.
Explicitly setting hostname and IP address in the EurekaInstanceConfigBean @bean is also not working:

        eurekaInstanceConfig.setIpAddress(info.get(AmazonInfo.MetaDataKey.publicIpv4));
        eurekaInstanceConfig.setHostname(info.get(AmazonInfo.MetaDataKey.publicHostname));

as this bean seems to be initialized before EIPManager binds an EIP address and so both values are null.
The lame hack so far is that I listen to EurekaRegistryAvailableEvent and restart the application if EurekaInstanceConfigBean.getHostname() is null as the second time around the EIP is already bound to the aws instance and it all works...

@qiangdavidliu

This comment has been minimized.

Show comment
Hide comment
@qiangdavidliu

qiangdavidliu Sep 7, 2016

Contributor

@spencergibb at Netflix we use the CloudInstanceConfig that has the ability to refresh the underlying AmazonInfo. Does the spring cloud configs do similar?

Contributor

qiangdavidliu commented Sep 7, 2016

@spencergibb at Netflix we use the CloudInstanceConfig that has the ability to refresh the underlying AmazonInfo. Does the spring cloud configs do similar?

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Sep 7, 2016

Member

@qiangdavidliu no it doesn't :-(

Member

spencergibb commented Sep 7, 2016

@qiangdavidliu no it doesn't :-(

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Sep 7, 2016

Member

It extends PropertiesInstanceConfig and we use boot @ConfigurationProperties to load properties so we needed a different class, but since it implemented an interface EurekaInstanceConfig when we started it was ok. I wonder if we could break the business logic out into a separate class that get's injected so we could reuse it? We can always copy/paste.

Member

spencergibb commented Sep 7, 2016

It extends PropertiesInstanceConfig and we use boot @ConfigurationProperties to load properties so we needed a different class, but since it implemented an interface EurekaInstanceConfig when we started it was ok. I wonder if we could break the business logic out into a separate class that get's injected so we could reuse it? We can always copy/paste.

@qiangdavidliu

This comment has been minimized.

Show comment
Hide comment
@qiangdavidliu

qiangdavidliu Sep 7, 2016

Contributor

Let me see what I can do on that.

Contributor

qiangdavidliu commented Sep 7, 2016

Let me see what I can do on that.

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Sep 7, 2016

Member

thanks!

Member

spencergibb commented Sep 7, 2016

thanks!

@herder

This comment has been minimized.

Show comment
Hide comment
@herder

herder Sep 9, 2016

Contributor

This works for us:

@Configuration
@Slf4j
@ConditionalOnAwsCloudEnvironment
@EnableContextInstanceData
@Import(UtilAutoConfiguration.class)
@AutoConfigureAfter(UtilAutoConfiguration.class)
public class AwsInstanceConfig {

    @Value("${server.port:${SERVER_PORT:${PORT:8080}}}")
    int nonSecurePort;

    @Value("${management.port:${MANAGEMENT_PORT:${server.port:${SERVER_PORT:${PORT:8080}}}}}")
    int managementPort;

    @Value("${eureka.instance.hostname:${EUREKA_INSTANCE_HOSTNAME:}}")
    String hostname;

    @Autowired
    ConfigurableEnvironment env;


    @Bean
    public EurekaInstanceConfigBean eurekaInstanceConfigBean(InetUtils utils) {
        log.info("Setting AmazonInfo on EurekaInstanceConfigBean");
        final EurekaInstanceConfigBean instance = new EurekaInstanceConfigBean(utils) {

            @Scheduled(initialDelay = 30000L, fixedRate = 30000L)
            public void refreshInfo() {
                log.debug("Checking datacenter info changes");
                AmazonInfo newInfo = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
                if (!this.getDataCenterInfo().equals(newInfo)) {
                    log.info("Updating datacenterInfo to {}", newInfo);
                    ((AmazonInfo) this.getDataCenterInfo()).setMetadata(newInfo.getMetadata());
                }
            }

            private AmazonInfo getAmazonInfo() {
                return (AmazonInfo) getDataCenterInfo();
            }

            @Override
            public String getHostname() {
                AmazonInfo info = getAmazonInfo();
                final String publicHostname = info.get(AmazonInfo.MetaDataKey.publicHostname);
                return this.isPreferIpAddress() ?
                    info.get(AmazonInfo.MetaDataKey.localIpv4) :
                    publicHostname == null ?
                        info.get(AmazonInfo.MetaDataKey.localHostname) : publicHostname;
            }

            @Override
            public String getHostName(final boolean refresh) {
                return getHostname();
            }

            @Override
            public String getHomePageUrl() {
                return super.getHomePageUrl();
            }

            @Override
            public String getStatusPageUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getStatusPageUrlPath();
            }

            @Override
            public String getHealthCheckUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getHealthCheckUrlPath();
            }
        };
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        log.info("Info: {}", info);
        instance.setDataCenterInfo(info);
        instance.setNonSecurePort(this.nonSecurePort);
        instance.setInstanceId(getDefaultInstanceId(this.env));
        if (this.managementPort != this.nonSecurePort && this.managementPort != 0) {
            if (StringUtils.hasText(this.hostname)) {
                instance.setHostname(this.hostname);
            }
        }

        return instance;
    }

}

I.e. we do a scheduled check on whether the datacenterinfo has been updated, and reset it in that case.
I'm sure there's room for cleanup here, but maybe it's a start?

Contributor

herder commented Sep 9, 2016

This works for us:

@Configuration
@Slf4j
@ConditionalOnAwsCloudEnvironment
@EnableContextInstanceData
@Import(UtilAutoConfiguration.class)
@AutoConfigureAfter(UtilAutoConfiguration.class)
public class AwsInstanceConfig {

    @Value("${server.port:${SERVER_PORT:${PORT:8080}}}")
    int nonSecurePort;

    @Value("${management.port:${MANAGEMENT_PORT:${server.port:${SERVER_PORT:${PORT:8080}}}}}")
    int managementPort;

    @Value("${eureka.instance.hostname:${EUREKA_INSTANCE_HOSTNAME:}}")
    String hostname;

    @Autowired
    ConfigurableEnvironment env;


    @Bean
    public EurekaInstanceConfigBean eurekaInstanceConfigBean(InetUtils utils) {
        log.info("Setting AmazonInfo on EurekaInstanceConfigBean");
        final EurekaInstanceConfigBean instance = new EurekaInstanceConfigBean(utils) {

            @Scheduled(initialDelay = 30000L, fixedRate = 30000L)
            public void refreshInfo() {
                log.debug("Checking datacenter info changes");
                AmazonInfo newInfo = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
                if (!this.getDataCenterInfo().equals(newInfo)) {
                    log.info("Updating datacenterInfo to {}", newInfo);
                    ((AmazonInfo) this.getDataCenterInfo()).setMetadata(newInfo.getMetadata());
                }
            }

            private AmazonInfo getAmazonInfo() {
                return (AmazonInfo) getDataCenterInfo();
            }

            @Override
            public String getHostname() {
                AmazonInfo info = getAmazonInfo();
                final String publicHostname = info.get(AmazonInfo.MetaDataKey.publicHostname);
                return this.isPreferIpAddress() ?
                    info.get(AmazonInfo.MetaDataKey.localIpv4) :
                    publicHostname == null ?
                        info.get(AmazonInfo.MetaDataKey.localHostname) : publicHostname;
            }

            @Override
            public String getHostName(final boolean refresh) {
                return getHostname();
            }

            @Override
            public String getHomePageUrl() {
                return super.getHomePageUrl();
            }

            @Override
            public String getStatusPageUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getStatusPageUrlPath();
            }

            @Override
            public String getHealthCheckUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getHealthCheckUrlPath();
            }
        };
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        log.info("Info: {}", info);
        instance.setDataCenterInfo(info);
        instance.setNonSecurePort(this.nonSecurePort);
        instance.setInstanceId(getDefaultInstanceId(this.env));
        if (this.managementPort != this.nonSecurePort && this.managementPort != 0) {
            if (StringUtils.hasText(this.hostname)) {
                instance.setHostname(this.hostname);
            }
        }

        return instance;
    }

}

I.e. we do a scheduled check on whether the datacenterinfo has been updated, and reset it in that case.
I'm sure there's room for cleanup here, but maybe it's a start?

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Sep 9, 2016

Member

@herder Netflix devs have moved the functionality to a shared class that we will be able to leverage. Netflix/eureka#843

Member

spencergibb commented Sep 9, 2016

@herder Netflix devs have moved the functionality to a shared class that we will be able to leverage. Netflix/eureka#843

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Oct 5, 2016

Member

This depends on #1345

Member

spencergibb commented Oct 5, 2016

This depends on #1345

@elnur

This comment has been minimized.

Show comment
Hide comment
@elnur

elnur Oct 23, 2016

Can't wait to get this released.

elnur commented Oct 23, 2016

Can't wait to get this released.

@DickChesterwood

This comment has been minimized.

Show comment
Hide comment
@DickChesterwood

DickChesterwood Feb 9, 2017

Many thanks to @herder for the suggested auto-refresh hack; working great for me.

I can't quite work out when the Eureka 1.6 upgrade will appear, will it be in the Dalston release train?

It's far too long to read but I've documented my experiments here - let me know if I've made any blunders

Edit to add that the OP noticed that not doing this refresh causes the registry to be wiped; I had the opposite experience that instances never get expired (it's not self preservation!). I can't think how that could be the case, so I'd be interested if anyone has any insight.

DickChesterwood commented Feb 9, 2017

Many thanks to @herder for the suggested auto-refresh hack; working great for me.

I can't quite work out when the Eureka 1.6 upgrade will appear, will it be in the Dalston release train?

It's far too long to read but I've documented my experiments here - let me know if I've made any blunders

Edit to add that the OP noticed that not doing this refresh causes the registry to be wiped; I had the opposite experience that instances never get expired (it's not self preservation!). I can't think how that could be the case, so I'd be interested if anyone has any insight.

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Feb 9, 2017

Member

thanks @DickChesterwood. 1.6 is part of Dalston. See spring-cloud-release/milestones

Member

spencergibb commented Feb 9, 2017

thanks @DickChesterwood. 1.6 is part of Dalston. See spring-cloud-release/milestones

@DickChesterwood

This comment has been minimized.

Show comment
Hide comment
@DickChesterwood

DickChesterwood Feb 9, 2017

Lovely thanks Spencer!

Lovely thanks Spencer!

@gadamsciv

This comment has been minimized.

Show comment
Hide comment
@gadamsciv

gadamsciv Apr 10, 2018

@spencergibb Is this still an issue? I'm experiencing the same issue using Edgware.RELEASE. Is the scheduled task workaround still necessary?

@spencergibb Is this still an issue? I'm experiencing the same issue using Edgware.RELEASE. Is the scheduled task workaround still necessary?

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Apr 10, 2018

Member

@gadamsciv it is still open, so yes.

Member

spencergibb commented Apr 10, 2018

@gadamsciv it is still open, so yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment