Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP publicip association not correctly updated on fresh instance #1321

Open
nick-pww opened this issue Sep 6, 2016 · 23 comments
Open

EIP publicip association not correctly updated on fresh instance #1321

nick-pww opened this issue Sep 6, 2016 · 23 comments

Comments

@nick-pww
Copy link
Contributor

nick-pww commented Sep 6, 2016

I've been directed over here from the eureka folks, as they believe this should just 'work'. Have the following issue running off spring-cloud-netflix:1.1.4.RELEASE. The issue I opened over there is: Netflix/eureka#840

There seems to be a problem with public EIP address association not being correctly updated when a new AWS server starts and has a new Eureka server starting with it. When the server starts up, it correctly registers itself:

2016-09-06 15:55:29.040  WARN 3399 --- [Thread-10] com.netflix.eureka.aws.EIPManager        : The selected EIP 54.67.102.122 is associated with another instance i-0666b391 according to AWS, hence skipping this
2016-09-06 15:55:29.628  INFO 3399 --- [Thread-10] com.netflix.eureka.aws.EIPManager        :


Associated i-25f11391 running in zone: us-west-1c to elastic IP: X.X.X.X

But, every minute after that we get the following log entry:

2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Got 1 instances from neighboring DS node
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : No peers needed to prime.
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Changing status to UP
2016-09-06 16:24:55.713  WARN 3399 --- [Eureka-EIPBinder] com.netflix.eureka.aws.EIPManager        : The selected EIP X.X.X.X is associated with another instance i-0666b391 according to AWS, hence skipping this
2016-09-06 16:24:55.804  INFO 3399 --- [Eureka-EIPBinder] com.netflix.eureka.aws.EIPManager        : My instance i-25f11391 seems to be already associated with the EIP X.X.X.X

Debugging this, the call to isEIPBound() is always failing, and this is because the following is always null:

String myPublicIP = ((AmazonInfo) myInfo.getDataCenterInfo()).get(MetaDataKey.publicIpv4);

It looks like there is stale datacenterinfo and it never gets refreshed (from what I can tell) and there there are no settings available to have it refreshed automatically.

The odd side affect of this, and we noticed, is that the registry continually gets wiped, and reset causing obvious potential issues down stream for our clients.

I have been trying to find where this datacenter info might be refreshed, but am unable to find anything that might actually do that.

The deployed app only has a single main class in it:

@SpringBootApplication
@EnableEurekaServer
@EnableAutoConfiguration
public class EurekaServer {

    @Value("${server.port}")
    private Integer nonSecurePort;
    @Autowired
    private InetUtils utils;

    public static void main(String[] args) {
        new SpringApplicationBuilder(EurekaServer.class).web(true).run(args);
    }

    @Bean
    @Profile("aws")
    public EurekaInstanceConfigBean awsEurekaConfig() {
        EurekaInstanceConfigBean b = new EurekaInstanceConfigBean(utils);
        b.setNonSecurePort(nonSecurePort);
        b.setSecurePortEnabled(false);
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        b.setDataCenterInfo(info);
        return b;
    }

}
@spencergibb
Copy link
Member

Interesting. I assume this is running on AWS? What is the configuration?

@nick-pww
Copy link
Contributor Author

nick-pww commented Sep 6, 2016

Yes, running on AWS. Here are the relevant configs (coming from spring-cloud config server):
Global config for all apps:

eureka.instance.leaseRenewalIntervalInSeconds=30
eureka.client.healthcheck.enabled=true
eureka.datacenter=cloud

Config for just the server apps:

eureka:
    client:
        registerWithEureka: false
        fetchRegistry: false

And servers have:

eureka.client.serviceUrl.defaultZone=....

setup as well with the relevant EIPs assigned.

@qiangdavidliu
Copy link
Contributor

qiangdavidliu commented Sep 6, 2016

@nick-pww I just noticed your config. The thread that DiscoveryClient uses to refresh local instanceInfo (and hence datacenterInfo) is only started if registerWithEureka is true (it tries to save the extra cpu resource if registration is not configured). Is there a reason you are configured with register = false?

@nick-pww
Copy link
Contributor Author

nick-pww commented Sep 6, 2016

@qiangdavidliu Going off several examples and docs. One of which is here:
https://spring.io/guides/gs/service-registration-and-discovery/

I can turn that off, but one problem I had before that with that and 'fetchRegistry' on was that the servers were essentially always 'registering' applications even if they were no longer up because it was getting info from the other eureka servers. Basically, applications would never unregister, and if they did, they had a good chance of coming back when the servers synced again.

Also, I've read in other places that having the server register with itself can make the 'renew' threshold act oddly in some cases.

Will try to re-enable just that option and see what happens.

@spencergibb
Copy link
Member

Also from Netflix/eureka#840 (comment) (typo fixed)

Note that the Amazon based datacenter info refreshes in ApplicationInfoManager only occurs if the config is of CloudInstanceConfig.

Our config isn't a CloudInstanceConfig

@spencergibb
Copy link
Member

@nick-pww those guides are for single instance eureka's, production should be a peered cluster, see #1251.

@nick-pww
Copy link
Contributor Author

nick-pww commented Sep 6, 2016

@spencergibb It's not really clear that those are 'development' only options that should be set. Would recommend that a large note or something goes in there stating such.

@qiangdavidliu + @spencergibb I've changed the config but still have the same issue with new instances. I'm still getting the:

2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..

messages, and it's still resetting every minute. Both servers are registering with each other and show up in the list of applications, but the one where I cleared the EIP and restarted is exhibiting this still, while the one that I didn't seems to be working as expected.

(new config edit)

eureka:
    client:
        registerWithEureka: true
        fetchRegistry: false

@florind
Copy link

florind commented Sep 7, 2016

I am actually struggling with the exact same issue.
Explicitly setting hostname and IP address in the EurekaInstanceConfigBean @bean is also not working:

        eurekaInstanceConfig.setIpAddress(info.get(AmazonInfo.MetaDataKey.publicIpv4));
        eurekaInstanceConfig.setHostname(info.get(AmazonInfo.MetaDataKey.publicHostname));

as this bean seems to be initialized before EIPManager binds an EIP address and so both values are null.
The lame hack so far is that I listen to EurekaRegistryAvailableEvent and restart the application if EurekaInstanceConfigBean.getHostname() is null as the second time around the EIP is already bound to the aws instance and it all works...

@qiangdavidliu
Copy link
Contributor

@spencergibb at Netflix we use the CloudInstanceConfig that has the ability to refresh the underlying AmazonInfo. Does the spring cloud configs do similar?

@spencergibb
Copy link
Member

@qiangdavidliu no it doesn't :-(

@spencergibb
Copy link
Member

It extends PropertiesInstanceConfig and we use boot @ConfigurationProperties to load properties so we needed a different class, but since it implemented an interface EurekaInstanceConfig when we started it was ok. I wonder if we could break the business logic out into a separate class that get's injected so we could reuse it? We can always copy/paste.

@qiangdavidliu
Copy link
Contributor

Let me see what I can do on that.

@spencergibb
Copy link
Member

thanks!

@herder
Copy link
Contributor

herder commented Sep 9, 2016

This works for us:

@Configuration
@Slf4j
@ConditionalOnAwsCloudEnvironment
@EnableContextInstanceData
@Import(UtilAutoConfiguration.class)
@AutoConfigureAfter(UtilAutoConfiguration.class)
public class AwsInstanceConfig {

    @Value("${server.port:${SERVER_PORT:${PORT:8080}}}")
    int nonSecurePort;

    @Value("${management.port:${MANAGEMENT_PORT:${server.port:${SERVER_PORT:${PORT:8080}}}}}")
    int managementPort;

    @Value("${eureka.instance.hostname:${EUREKA_INSTANCE_HOSTNAME:}}")
    String hostname;

    @Autowired
    ConfigurableEnvironment env;


    @Bean
    public EurekaInstanceConfigBean eurekaInstanceConfigBean(InetUtils utils) {
        log.info("Setting AmazonInfo on EurekaInstanceConfigBean");
        final EurekaInstanceConfigBean instance = new EurekaInstanceConfigBean(utils) {

            @Scheduled(initialDelay = 30000L, fixedRate = 30000L)
            public void refreshInfo() {
                log.debug("Checking datacenter info changes");
                AmazonInfo newInfo = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
                if (!this.getDataCenterInfo().equals(newInfo)) {
                    log.info("Updating datacenterInfo to {}", newInfo);
                    ((AmazonInfo) this.getDataCenterInfo()).setMetadata(newInfo.getMetadata());
                }
            }

            private AmazonInfo getAmazonInfo() {
                return (AmazonInfo) getDataCenterInfo();
            }

            @Override
            public String getHostname() {
                AmazonInfo info = getAmazonInfo();
                final String publicHostname = info.get(AmazonInfo.MetaDataKey.publicHostname);
                return this.isPreferIpAddress() ?
                    info.get(AmazonInfo.MetaDataKey.localIpv4) :
                    publicHostname == null ?
                        info.get(AmazonInfo.MetaDataKey.localHostname) : publicHostname;
            }

            @Override
            public String getHostName(final boolean refresh) {
                return getHostname();
            }

            @Override
            public String getHomePageUrl() {
                return super.getHomePageUrl();
            }

            @Override
            public String getStatusPageUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getStatusPageUrlPath();
            }

            @Override
            public String getHealthCheckUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getHealthCheckUrlPath();
            }
        };
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        log.info("Info: {}", info);
        instance.setDataCenterInfo(info);
        instance.setNonSecurePort(this.nonSecurePort);
        instance.setInstanceId(getDefaultInstanceId(this.env));
        if (this.managementPort != this.nonSecurePort && this.managementPort != 0) {
            if (StringUtils.hasText(this.hostname)) {
                instance.setHostname(this.hostname);
            }
        }

        return instance;
    }

}

I.e. we do a scheduled check on whether the datacenterinfo has been updated, and reset it in that case.
I'm sure there's room for cleanup here, but maybe it's a start?

@spencergibb
Copy link
Member

@herder Netflix devs have moved the functionality to a shared class that we will be able to leverage. Netflix/eureka#843

@spencergibb
Copy link
Member

This depends on #1345

@elnur
Copy link

elnur commented Oct 23, 2016

Can't wait to get this released.

@DickChesterwood
Copy link

DickChesterwood commented Feb 9, 2017

Many thanks to @herder for the suggested auto-refresh hack; working great for me.

I can't quite work out when the Eureka 1.6 upgrade will appear, will it be in the Dalston release train?

It's far too long to read but I've documented my experiments here - let me know if I've made any blunders

Edit to add that the OP noticed that not doing this refresh causes the registry to be wiped; I had the opposite experience that instances never get expired (it's not self preservation!). I can't think how that could be the case, so I'd be interested if anyone has any insight.

@spencergibb
Copy link
Member

thanks @DickChesterwood. 1.6 is part of Dalston. See spring-cloud-release/milestones

@DickChesterwood
Copy link

Lovely thanks Spencer!

@gadamsciv
Copy link

@spencergibb Is this still an issue? I'm experiencing the same issue using Edgware.RELEASE. Is the scheduled task workaround still necessary?

@spencergibb
Copy link
Member

@gadamsciv it is still open, so yes.

@harmoney-ryanli
Copy link

FYI, I came across this question as well, and I tried to add the scheduled task to refresh instance info. But the task doesn't start. At last, found out that if the scheduled task is in a configuration class, need to add the annotation EnableScheduling to run the task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests