Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Fabio routing table not updated after service failure #216

Closed
gagan2u2002 opened this issue Jan 17, 2017 · 21 comments
Closed

Q: Fabio routing table not updated after service failure #216

gagan2u2002 opened this issue Jan 17, 2017 · 21 comments
Labels
Milestone

Comments

@gagan2u2002
Copy link

Use-case:
When i run multiple instance of my service and forcefully i stop one instance
then Fabio not able to manage load properly and because of this my service getting failed after some time and return me 404 error , as i understand fabio routing table not updated in case of service fail-over.

service error msg -
error

fabio routing UI screen after i stop service which is running on port 8080-
fabio_routing_table

i have attach fabio and consul log as well . Please refer the same.
consul_log.txt
fabio_log.txt

@gagan2u2002 gagan2u2002 changed the title Fabio routing table not updated in case of service failure and because of this my service get failed after some time even few instance is still up Fabio routing table not updated in case of service failure and because of this my service get failed after some time even few service instance is still up Jan 17, 2017
@magiconair
Copy link
Contributor

The fabio routing table is updated automatically every time the consul state changes. This can be triggered either by a new instance of your service appearing, an existing instance disappearing or the health status of an existing instance changing.

@gagan2u2002
Copy link
Author

gagan2u2002 commented Jan 17, 2017

But in my case if you are looking fabio screen no service removed from UI .
service registered on 8080 port is not UP on my box and it still shows its there.

Test case is : i have run 3 instance of my service on my local box with port 8080, 8081 and 8082 and when i stopped service on port 8080 then after some time i am getting response from http://localhost:9999/ - in my case service name is world/myworld
as 404 because it divert me that port which is not UP on my box.

@magiconair
Copy link
Contributor

What I can see from the fabio logs is

  1. add port 8081
  2. add port 8080
  3. add port 8082
  4. remove port 8081
  5. remove port 8080
  6. remove port 8082

Finally, the routing table empty and all requests fail with 404 Not Found which is exactly what should happen. Also note, that a 404 response can come from both fabio and your service. Just because fabio routes the request to the service does not mean it has a handler to serve it. Check whether http://C40-BF91.india.rsystems.com:<port>/myworld provides a valid response for all three ports.

2017/01/17 12:16:02 [INFO] consul: Health changed to #3818
2017/01/17 12:16:03 [INFO] consul: Health changed to #3819
2017/01/17 12:16:03 [INFO] Updated config to
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
2017/01/17 12:16:10 [INFO] consul: Health changed to #3820
2017/01/17 12:16:10 [INFO] Updated config to
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
2017/01/17 12:16:12 [INFO] consul: Health changed to #3821
2017/01/17 12:16:12 [INFO] Updated config to
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8082/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8082/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
2017/01/17 12:16:35 [INFO] consul: Health changed to #3824
2017/01/17 12:16:35 [INFO] Updated config to
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8082/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8082/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
2017/01/17 12:17:28 [INFO] consul: Health changed to #3829
2017/01/17 12:17:28 [INFO] Updated config to
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8082/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8082/ tags "urlprefix-/myworld,urlprefix-/world"
2017/01/17 12:17:33 [INFO] consul: Health changed to #3833
2017/01/17 12:17:33 [INFO] Updated config to
2017/01/17 12:17:33 [WARN] No route for localhost:9999/myworld
2017/01/17 12:17:33 [WARN] No route for localhost:9999/myworld
2017/01/17 12:17:34 [WARN] No route for localhost:9999/myworld
2017/01/17 12:18:03 [INFO] consul: Health changed to #3835
2017/01/17 12:18:10 [WARN] No route for localhost:9999/myworld

@magiconair magiconair changed the title Fabio routing table not updated in case of service failure and because of this my service get failed after some time even few service instance is still up Q: Fabio routing table not updated after service failure Jan 17, 2017
@gagan2u2002
Copy link
Author

Hi ,
i am going to share fresh logs file with you ,
Reproduce steps

STEP 1: i have run 3 instance of my service on port 8080, 8081 and 8082 , [ my service code is already on git you can use this (https://github.com/gagan2u2002/springboot-consul-Fabio-Integration-example)]

STEP 2: when i close my service instance on 8080 then fabio is not responding even my service is responding me on port 8081 and 8082. Please refer the screen below -
error_screen_18_01_2017

refer today (01/18/2017) log of consul and fabio-
consul_log_18_01_2017.txt
Fabio_log_18_01_2017.txt

Even if you want i can have skype session with you so that i will show you the error demo but i think you can reproduce it your own to follow steps which i mention.

@gagan2u2002
Copy link
Author

i think i have resolve your query can we change issue label to Bug if you are okay with it .

@magiconair
Copy link
Contributor

@gagan2u2002 I am not sure I understand. If you have resolved the issue then it isn't a bug. If it is a bug can you point me to what you think the issue is?

@magiconair
Copy link
Contributor

Also, fabio just pulls information about running services from consul. If the consul service registration isn't updated properly then fabio will have an inconsistent routing table. Please make sure that you check that first. If I look at the fabio log then this doesn't look like you are shutting down the instance on port 8080 but the one on 8082.

2017/01/18 11:30:12 [INFO] Updated config to
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8082/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8082/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
2017/01/18 11:30:12 [INFO] consul: Health changed to #129
2017/01/18 11:32:40 [WARN] No route for localhost:9999/route
2017/01/18 11:33:04 [INFO] consul: Health changed to #140
2017/01/18 11:35:59 [INFO] consul: Health changed to #152
2017/01/18 11:36:27 [INFO] consul: Health changed to #154
2017/01/18 11:36:27 [INFO] Updated config to
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8081/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
2017/01/18 11:36:29 [INFO] consul: Health changed to #155
2017/01/18 11:36:29 [INFO] Updated config to
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"
2017/01/18 11:36:29 [INFO] consul: Health changed to #156

Are you sure your instances are announcing the correct ports to consul? Is the service running on port 8080 actually registering to consul with port 8080?

@gagan2u2002
Copy link
Author

@magiconair
if you looking consul log then it say 8080 service is stopped
2017/01/18 11:44:02 [WARN] agent: http request failed 'http://127.0.0.1:8080/health': Get http://127.0.0.1:8080/health: dial tcp 127.0.0.1:8080: connectex: No connection could be made because the target machine actively refused it.
but if you looking Fabio log then it say its there
route add slpconsulDemo /world http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world" route add slpconsulDemo /myworld http://C40-BF91.india.rsystems.com:8080/ tags "urlprefix-/myworld,urlprefix-/world"

My test cases are very simple , i have run 3 instances as i already described you earlier and i have stopped 1 instances , if i am hitting my fabio url then after some time it will through error but still my service is UP and responding in my box and same i can see in consul but fabio routing table is not updated on it's UI might be this is the root course of the problem.

@gagan2u2002
Copy link
Author

And yes still this issue is open at my end . So if you can close this , this would be good at my end .
if any of the assistance needed please let me know.

@magiconair
Copy link
Contributor

This looks like a consul issue. How do you register the health checks in consul?

@gagan2u2002
Copy link
Author

@magiconair
spring.cloud.consul.discovery.health-check-url=http://127.0.0.1:8080/health
this code help me to register health check in consul.

But i don't think so this could be consul issue because fabio routing table need to update by fabio itself. Consul doing its job as expected.

@ak66982
Copy link

ak66982 commented Jan 25, 2017

@gagan2u2002
Sorry to interfere, but I had exactly the same issue. The cause was health checks configuration in consul. Make sure that each service and each health check that you defined has a unique id.

@magiconair
Copy link
Contributor

@gagan2u2002 My guess is that you don't wait long enough after you've killed the service. Spring cloud by default checks every 10 seconds and has by default no timeout for the service check. You might want to set healthCheckInterval and healthCheckTimeout to 1s to see if this changes the behavior. These are the settings the demo/server uses and with this I was not able to reproduce your problem. I've started consul agent -server -dev, consul monitor -log-level=debug, server -prefix /foo and fabio before running kill -9 on the server process. Both consul and fabio picked up the change within a second.

https://github.com/spring-cloud/spring-cloud-consul/blob/master/spring-cloud-consul-discovery/src/main/java/org/springframework/cloud/consul/discovery/ConsulDiscoveryProperties.java

@gagan2u2002
Copy link
Author

@magiconair i have even wait for long than 10 seconds after killing 1 st instance of services on 8080 but Fabio not able to route services on 8081 and 8082 port , also i observed Fabio is not updating its routing table too but my services return me the results on my dev environment on port 8081 and 8082.
i have attach log of fabio and consul for your reference. Please refer the same.

consul UI screen after killing 8080 instance -
consul_screen

consul_log_27_01.txt
Fabio_log_27_01.txt

@gagan2u2002
Copy link
Author

@magiconair i think i have resolve your query , is there is any update regarding this .... or we can mark this issue as BUG .....if you validate this issue at your end.
If this is not clear i am okay to have skype call with you as i already told you earlier.

@magiconair
Copy link
Contributor

@gagan2u2002 in the screenshot the instances running on port 8081 and 8082 are marked as critical. That's why fabio will not include them in the routing table. The Serf Health Check is ignored by fabio since it only states whether the consul agent where the service registered is healthy. This is no good indication of the service being healthy itself.

If this is the case then this isn't a bug but how fabio is designed to work. You need to check why the instances on port 8081 and 8082 produce a critical health check if one of the other instances went down.

@gagan2u2002
Copy link
Author

@magiconair when i down first instance of service then other instances of that service become critical but when i check health of each instance then they will return me Response code 200.
but as per your statement like Fabio even not pick those service they are critical that seems more dangerous for implementation perspective in Production because practically this is possible first instance of service will down and if Fabio will not pick other instances then my whole system will break down in this case. So as per my view this is a clear BUG and it should be resolved , what you suggest.

@magiconair
Copy link
Contributor

If you shut down one instance and the other instances become critical in consul then there is a problem with your service registration. My guess is that you use the same value for spring.cloud.consul.discovery.health-check-url for all instances.

Can you post the output of curl localhost:8500/v1/health/service/slpconsulDemo?pretty or whatever your service name is, please?

@magiconair magiconair added this to the Unplanned milestone Oct 10, 2017
@alvaroaleman
Copy link
Contributor

So, I just found this after testing Fabio and finding out it doesn't remove routes for unhealthy services which was caused by non-unique service IDs. To quote the fabio docs regarding that:

Make sure that each instance registers with a unique ServiceID and a service name without spaces.

This directly contradicts the Consul docs which state:

ID (string: "") - Specifies a unique ID for this service. This must be unique per agent

@magiconair Can you explain what is the reason fabio requires the service id to be unique per consul cluster?
Also I think it would be smart to have Fabio at least emit a warning in case multiple services instances with the same id exist, because that is actually the way it is supposed to be done in Consul.

@magiconair
Copy link
Contributor

@alvaroaleman the reason it is like this was that I've built it like that in 2015 since that was my understanding on how this worked. So far this hasn't been a big issue. We've constructed our service ids as servicename-host:port which should make them unique cluster wide with little effort. #383 is probably along the same lines.

Can you elaborate why cluster wide unique service ids like I've described are not achievable? I'm curious for the motivation.

I'll have a look at this again since I'm only interested for this to work not this specific implementation.

@ygersie
Copy link

ygersie commented Dec 21, 2017

@magiconair this can be closed as well, fixed in #414

@magiconair magiconair modified the milestones: Unplanned, 1.5.5 Dec 21, 2017
@magiconair magiconair added this to the 1.5.6 milestone Dec 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants