Best practices for using Eureka for 0-downtime Blue/Green deployments #1290

Closed
william-tran opened this Issue Aug 23, 2016 · 30 comments

Comments

Projects
None yet
8 participants
@william-tran
Contributor

william-tran commented Aug 23, 2016

This issue is more of a discussion that I want to open up to the community so we can figure this out. The goal is to use Eureka to facilitate blue-green deployments with 0 downtime.

Example 1: From application code

Say you have 2 instances of an app deployed using two distinct uris:

spring.application.name: api

---
spring.profiles: blue
eureka.instance.hostname: api-blue.example.com

---
spring.profiles: green
eureka.instance.hostname: api-green.example.com

Let's say both are up and healthy. You have a client app my-app that needs to make requests to api:

@RestController
@SpringBootApplication
@EnableDiscoveryClient
public class MyApp {

    @Bean
    @LoadBalanced
    public RestTemplate restTemplate() {
        return new RestTemplate();
    }

    @RequestMapping("/hit-api")
    public Object hitApi() {
        return restTemplate().getForObject("https://api/something", Object.class);
    }

When green-api.example.com goes down for upgrade, it gracefully shuts down and deletes its registration from Eureka. my-app won't know this happened until it fetches the registry again, which happens every 30 seconds by default. So for up to 30 seconds, the @LoadBalanced RestTemplate will send requests to api-green.example.com even though its down.

There are a few things we can do:

Use Ribbon's support for retry

This doesn't work out of the box because the @LoadBalanced RestTemplate doesn't use Netflix's RestClient and related classes, which provide the retry functionality. One of the reasons for not using it is because it's deprecated, and doesn't support PATCH.

Use Spring Retry

By adding the following to my project:

    compile("org.springframework.boot:spring-boot-starter-aop")
    compile("org.springframework.retry:spring-retry")
// ...
@EnableRetry
public class MyApp {
// ...
    @Retryable
    @RequestMapping("/hit-api")
    public Object hitApi() {
// ...

A request made to api-green.example.com will error out, the RestTemplate with throw an exception, that exception breaks out of the @Retryable method and the method is executed again. Because the @LoadBalanced RestTemplate round robins, the next method execution tries a different server, api-blue.example.com which is still up. This works great.

The default is to retry on any exception, but you may need to narrow down which exceptions should be retried. In CloudFoundry, you get a 404 when the app is down, so to zero-in on such a HttpClientErrorException, it looks like you'll need to write a RetryPolicy which I haven't attempted.

Example 2: From Zuul

Let's say we've configured Zuul to proxy to apps registered with Eureka, and are using its retry support provided by Netflix classes:

@SpringBootApplication
@EnableDiscoveryClient
@EnableZuulProxy
public class ApiGatewayApplication {

    public static void main(String[] args) {
        SpringApplication.run(ApiGatewayApplication.class, args);
    }
}
zuul:
  routes:
    api:
      path: /api/**
      serviceId: api
      retryable: true

Here's where I got stuck, at least in CloudFoundry, because the 404 doesn't get handled by the retry logic in RequestSpecificRetryHandler and I don't see a way of customizing this as well.

Does anyone else have experiences to share in doing blue/green with Spring Cloud Netflix projects?

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Aug 23, 2016

Contributor

Thanks for starting this discussion @william-tran. It would be great if we could gather feedback here that we can either use to incorporate into a best practices article and/or use to make enhancements to Spring Cloud to make blue/green deployments easier.

You mention we don't use Netflix's RestCient in our LoadBalanced RestTemplate so you can't take advantage of the the retry logic, but yet when proxying through Zuul we can use the retry logic. Zuul is using ribbon under the covers so have you found that we are using Netflix's RestClient with Zuul then? I would imagine we would want to be consistent.

I also think in real world blue/green deployments there are probably a couple of other "phases" you go through as you do the deployment. In my mind you have 3 distinct steps in rolling out a new instance of a service.

  1. Deploy the new version of the service. At this point the service is running in a production environment but is not reachable by any clients. As the service owner you will want to verify the service is working as you expect before allowing any clients access to the service so you might be running various tests against the new service version to make sure things work as expected. If the new version is not working as expected than you can kill it.
  2. Assuming step 1 goes OK, than you may want to allow clients access to the service. In all likelihood you are going to want to do this slowly only allowing a portion of the requests to hit the new service version. You will want to monitor the performance of the new service version to make sure it is behaving as expected. If everything seems to be working as expected you will most likely increase traffic. If not than you will want to take the service instance offline and address any issues (this will reflect the scenario you describe above).
  3. At some point once you are satisfied the new service version is working fine you will want to take the old version offline and direct all traffic to the new version (and perhaps scale up the number of instances). Again this is the scenario you describe above.

If there are other important steps in the process I am missing please let me know.

Contributor

ryanjbaxter commented Aug 23, 2016

Thanks for starting this discussion @william-tran. It would be great if we could gather feedback here that we can either use to incorporate into a best practices article and/or use to make enhancements to Spring Cloud to make blue/green deployments easier.

You mention we don't use Netflix's RestCient in our LoadBalanced RestTemplate so you can't take advantage of the the retry logic, but yet when proxying through Zuul we can use the retry logic. Zuul is using ribbon under the covers so have you found that we are using Netflix's RestClient with Zuul then? I would imagine we would want to be consistent.

I also think in real world blue/green deployments there are probably a couple of other "phases" you go through as you do the deployment. In my mind you have 3 distinct steps in rolling out a new instance of a service.

  1. Deploy the new version of the service. At this point the service is running in a production environment but is not reachable by any clients. As the service owner you will want to verify the service is working as you expect before allowing any clients access to the service so you might be running various tests against the new service version to make sure things work as expected. If the new version is not working as expected than you can kill it.
  2. Assuming step 1 goes OK, than you may want to allow clients access to the service. In all likelihood you are going to want to do this slowly only allowing a portion of the requests to hit the new service version. You will want to monitor the performance of the new service version to make sure it is behaving as expected. If everything seems to be working as expected you will most likely increase traffic. If not than you will want to take the service instance offline and address any issues (this will reflect the scenario you describe above).
  3. At some point once you are satisfied the new service version is working fine you will want to take the old version offline and direct all traffic to the new version (and perhaps scale up the number of instances). Again this is the scenario you describe above.

If there are other important steps in the process I am missing please let me know.

@william-tran

This comment has been minimized.

Show comment
Hide comment
@william-tran

william-tran Aug 23, 2016

Contributor

On the client being used in RestTempalte vs Zuul, here's a conversation from the #spring-cloud Pivotal Slack channel on Aug 12

William Tran
I’m trying to figure out how to get Ribbon retry to work with a @LoadBalanced RestTemplate
But I can’t seem to connect the dots to an instance of com.netflix.client.RetryHandler

Spencer Gibb [1:18 PM]
by default we don’t use the ribbon client (it doesn’t support patch for example), therefore it's retry doesn’t work by default.

William Tran [1:19 PM]
ahh ok thanks. Is retry supported on SC-Zuul?

Spencer Gibb [1:21 PM]
in brixton restclient is still the default I think, so yes. In camden it is no longer the default. It should still retry though

About the phases of blue-green, being able to manage all those phases would be ideal, but I'm really looking for support of the most basic slice, where an app instance goes down (planned or unplanned), and you want to fail over to other instances without service interruption. I guess this fail-over feature that I'm asking for isn't only useful for blue green deployments.

Contributor

william-tran commented Aug 23, 2016

On the client being used in RestTempalte vs Zuul, here's a conversation from the #spring-cloud Pivotal Slack channel on Aug 12

William Tran
I’m trying to figure out how to get Ribbon retry to work with a @LoadBalanced RestTemplate
But I can’t seem to connect the dots to an instance of com.netflix.client.RetryHandler

Spencer Gibb [1:18 PM]
by default we don’t use the ribbon client (it doesn’t support patch for example), therefore it's retry doesn’t work by default.

William Tran [1:19 PM]
ahh ok thanks. Is retry supported on SC-Zuul?

Spencer Gibb [1:21 PM]
in brixton restclient is still the default I think, so yes. In camden it is no longer the default. It should still retry though

About the phases of blue-green, being able to manage all those phases would be ideal, but I'm really looking for support of the most basic slice, where an app instance goes down (planned or unplanned), and you want to fail over to other instances without service interruption. I guess this fail-over feature that I'm asking for isn't only useful for blue green deployments.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Aug 23, 2016

Contributor

Great thanks for the reminder! We certainly need a reliable failover routine for when services are down and when others are available. I am in the middle of trying various combinations of scenarios to see how SC behaves. We can use this issue to track the results and enhancements/bug fixes we need to make.

Contributor

ryanjbaxter commented Aug 23, 2016

Great thanks for the reminder! We certainly need a reliable failover routine for when services are down and when others are available. I am in the middle of trying various combinations of scenarios to see how SC behaves. We can use this issue to track the results and enhancements/bug fixes we need to make.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Aug 24, 2016

Contributor

Did a lot of investigation into retry logic in FeignClients, Ribbon, and Zuul today. Here is the basic breakdown of how retry logic works in both Brixton and Camden. (Most of it is the same as what @william-tran stated above.)

In Brixton:
Feign has its own retry logic
Ribbon has no retry logic
Zuul uses the Ribbon http client retry logic

In Camden:
Feign has its own retry logic
Ribbon has no retry logic
Zuul has no retry logic bc it is not using the ribbon http client

When it comes to customizing how retry logic works it depends on how your client is making the request.

Ribbon

Since we do not use the Ribbon HTTP Client there is no retry logic by default. You can customize it using your own HTTP Client, or by using Spring Retry

Feign

Since Feign has its own retry logic, most of the ribbon.* properties regarding timeouts and retries have little effect on FeignClients. There are some special cases where some properties might effect your FeignClients.

  1. If your FeignClient is making a GET request, you can use ribbon.OkToRetryOnAllOperations to disable retries on opperations other than connection exceptions. See the (crazy) logic here https://github.com/spring-cloud/spring-cloud-netflix/blob/1.1.x/spring-cloud-netflix-core/src/main/java/org/springframework/cloud/netflix/feign/ribbon/FeignLoadBalancer.java#L83.
  2. ribbon.ConnectTimeout, ribbon.ReadTimeout, ribbon.FollowRedirects, can also be used to change the behavior of FeignClients.

The default FeignClient retry logic can be found in Retryer.Default. This logic will retry the request 5 times. There is also a NEVER_RETRY class included that you can use to never retry the requests.

Note: By default Hystrix is involved in the picture as well since all FeignClient requests are wrapped in circuit breakers, keep that in mind when dealing with timeouts. A timeout can occur at the hystrix level as well as at the client level.

Zuul

Brixton

In Brixton we use the Ribbon HTTP Client when proxying requests through Zuul. This means you can use all ribbon.* properties to configure retry and connection logic with Zuul.

Camden

In Camden we no longer use the Ribbon HTTP Client so none of the ribbon.* properties will help you control the retry logic.

See #1295 for tracking standardization of retry logic.

Contributor

ryanjbaxter commented Aug 24, 2016

Did a lot of investigation into retry logic in FeignClients, Ribbon, and Zuul today. Here is the basic breakdown of how retry logic works in both Brixton and Camden. (Most of it is the same as what @william-tran stated above.)

In Brixton:
Feign has its own retry logic
Ribbon has no retry logic
Zuul uses the Ribbon http client retry logic

In Camden:
Feign has its own retry logic
Ribbon has no retry logic
Zuul has no retry logic bc it is not using the ribbon http client

When it comes to customizing how retry logic works it depends on how your client is making the request.

Ribbon

Since we do not use the Ribbon HTTP Client there is no retry logic by default. You can customize it using your own HTTP Client, or by using Spring Retry

Feign

Since Feign has its own retry logic, most of the ribbon.* properties regarding timeouts and retries have little effect on FeignClients. There are some special cases where some properties might effect your FeignClients.

  1. If your FeignClient is making a GET request, you can use ribbon.OkToRetryOnAllOperations to disable retries on opperations other than connection exceptions. See the (crazy) logic here https://github.com/spring-cloud/spring-cloud-netflix/blob/1.1.x/spring-cloud-netflix-core/src/main/java/org/springframework/cloud/netflix/feign/ribbon/FeignLoadBalancer.java#L83.
  2. ribbon.ConnectTimeout, ribbon.ReadTimeout, ribbon.FollowRedirects, can also be used to change the behavior of FeignClients.

The default FeignClient retry logic can be found in Retryer.Default. This logic will retry the request 5 times. There is also a NEVER_RETRY class included that you can use to never retry the requests.

Note: By default Hystrix is involved in the picture as well since all FeignClient requests are wrapped in circuit breakers, keep that in mind when dealing with timeouts. A timeout can occur at the hystrix level as well as at the client level.

Zuul

Brixton

In Brixton we use the Ribbon HTTP Client when proxying requests through Zuul. This means you can use all ribbon.* properties to configure retry and connection logic with Zuul.

Camden

In Camden we no longer use the Ribbon HTTP Client so none of the ribbon.* properties will help you control the retry logic.

See #1295 for tracking standardization of retry logic.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Sep 9, 2016

Contributor

@william-tran regarding your Zuul use case....

I think you can override RequestSpecificRetryHandler by creating your own RestClient bean and overriding getRequestSpecificRetryHandler. I proved this out locally in my own dev environment.

However I dont think doing that is actually going to help out to much because isRetriableException will only be called when the RestClient encounters an exception making the request. To be more specific, it will only be called if RestClient.execute throws an exception. That happens today for things like timeouts. However if a request is made and any type of response is returned (good or bad) that is considered a "success" from the client perspective and no retry logic is run. Now of course you can override RestClient.execute in your RestClient bean so that if anything other than a 20x is returned you throw an exception and then can begin handling those to force retry logic to occur. I have yet to try this but in theory it seems like it would work.

Another option might be to wrap RestClient.execute with Spring Retry logic like you did with just plain Ribbon. Again I have not tried that either just a thought.

I will continue to experiment with some solutions.

I also have a number of notes I need to organize around the logic of retrying requests in Zuul that I will post here as well.

Contributor

ryanjbaxter commented Sep 9, 2016

@william-tran regarding your Zuul use case....

I think you can override RequestSpecificRetryHandler by creating your own RestClient bean and overriding getRequestSpecificRetryHandler. I proved this out locally in my own dev environment.

However I dont think doing that is actually going to help out to much because isRetriableException will only be called when the RestClient encounters an exception making the request. To be more specific, it will only be called if RestClient.execute throws an exception. That happens today for things like timeouts. However if a request is made and any type of response is returned (good or bad) that is considered a "success" from the client perspective and no retry logic is run. Now of course you can override RestClient.execute in your RestClient bean so that if anything other than a 20x is returned you throw an exception and then can begin handling those to force retry logic to occur. I have yet to try this but in theory it seems like it would work.

Another option might be to wrap RestClient.execute with Spring Retry logic like you did with just plain Ribbon. Again I have not tried that either just a thought.

I will continue to experiment with some solutions.

I also have a number of notes I need to organize around the logic of retrying requests in Zuul that I will post here as well.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Sep 13, 2016

Contributor

These are some notes I have gathered as it relates to using Ribbon to retry requests in Spring Cloud Brixton.

Ribbon Rest Client Load Balancer

  • by default uses Round Robin Load balancing
  • If instance is marked as down we continue to use that instance until list is refreshed on the client
  • In this case no retry logic is used so we continue the round robin approach and every time we hit the no longer functioning instance we get an error

You can use Spring Retry in your app to retry requests using Ribbon. Here is a sample app which demonstrates how to do this.
https://github.com/ryanjbaxter/ribbon-retryer

Zuul Retry Logic
In Brixton, we use the Ribbon RestClient to proxy requests. By default the Ribbon RestClient will only retry an operation

  • If an exception is thrown from RestClient.execute
  • The exception that is thrown is deemed retryable. An exception is retryable if
    • okToRetryOnAllErrors is set to true in RequestSpecificRetryHandler
    • If the exception is a ClientException AND the status code is a 429 (any other ClientException with a different status code will not be retryable) and we are not trying to retry on the same server (if we are trying to retry on the same server than we will not retry the request since the server is telling us to throttle back the requests)
    • okRetryOnConnectionErrors is true in RequestSpecificRetryHandler and the exception is a SocketException

RequestSpecificRetryHandler is created in RestClient.getRequestSpecificRetryHandler. Variables like okToRetryOnAllErrors and okRetryOnConnectionErrors are set in this method depending on how Zuul is configured and the HTTP Request Method. With all default configuration in place

  • GET requests will be retried on any type of errors (okToRetryOnAllErrors is set to true in RequestSpecificRetryHandler)
  • Everything else besides a GET will only retry on a SocketException
  • If you set ribbon.OkToRetryOnAllOperations to true then all HTTP methods will be retried on any error (both okToRetryOnAllErrors and okToRetryOnConnectionErrors is set to true in RequestSpecificRetryHandler)

To disable all retry logic from happening you can set zuul.retryable to false.

If you want to disable retry logic for a specific route you can set zuul.routes.serviceId.retryable to false. You want to set zuul.routes.serviceId.retryable to true for non GET requests that you want to retry on connection errors.

Customizing Your Retry Logic

If you want to customize the retry logic used by Ribbon you will need to provide your own RestClient implementation and RetryHandler. To do this you need to provide a customized Ribbon Client Configuration.

You can provide a default configuration for all clients by specifying the following annotation on your application class.

@RibbonClients(defaultConfiguration = MyRibbonConfiguration.class)

You can also specify configuration for specific clients by adding the @RibbonClient annotation on the configuration class itself

@RibbonClient(name=”foo”, configuration = MyRibbonConfiguration.class)

In addition you want to make sure that Ribbon configuration classes are not included in the main application context. That means if you are using @ComponentScan or @SpringBootApplicationyou need to make sure these configuration classes are excluded. For example, if you add the below @ComponentScan annotation to your application class

@ComponentScan(basePackages = "com.ryanjbaxter.spring.cloud.ocr.web",
        excludeFilters =@ComponentScan.Filter(
                type = FilterType.REGEX,
                pattern = "com.ryanjbaxter.spring.cloud.ocr.web.exclude.*"))

You would want to place any Ribbon configuration classes in the package com.ryanjbaxter.spring.cloud.ocr.web.exclude.

Here is a sample project that demonstrates custom retry logic
https://github.com/ryanjbaxter/zuul-retryer

Contributor

ryanjbaxter commented Sep 13, 2016

These are some notes I have gathered as it relates to using Ribbon to retry requests in Spring Cloud Brixton.

Ribbon Rest Client Load Balancer

  • by default uses Round Robin Load balancing
  • If instance is marked as down we continue to use that instance until list is refreshed on the client
  • In this case no retry logic is used so we continue the round robin approach and every time we hit the no longer functioning instance we get an error

You can use Spring Retry in your app to retry requests using Ribbon. Here is a sample app which demonstrates how to do this.
https://github.com/ryanjbaxter/ribbon-retryer

Zuul Retry Logic
In Brixton, we use the Ribbon RestClient to proxy requests. By default the Ribbon RestClient will only retry an operation

  • If an exception is thrown from RestClient.execute
  • The exception that is thrown is deemed retryable. An exception is retryable if
    • okToRetryOnAllErrors is set to true in RequestSpecificRetryHandler
    • If the exception is a ClientException AND the status code is a 429 (any other ClientException with a different status code will not be retryable) and we are not trying to retry on the same server (if we are trying to retry on the same server than we will not retry the request since the server is telling us to throttle back the requests)
    • okRetryOnConnectionErrors is true in RequestSpecificRetryHandler and the exception is a SocketException

RequestSpecificRetryHandler is created in RestClient.getRequestSpecificRetryHandler. Variables like okToRetryOnAllErrors and okRetryOnConnectionErrors are set in this method depending on how Zuul is configured and the HTTP Request Method. With all default configuration in place

  • GET requests will be retried on any type of errors (okToRetryOnAllErrors is set to true in RequestSpecificRetryHandler)
  • Everything else besides a GET will only retry on a SocketException
  • If you set ribbon.OkToRetryOnAllOperations to true then all HTTP methods will be retried on any error (both okToRetryOnAllErrors and okToRetryOnConnectionErrors is set to true in RequestSpecificRetryHandler)

To disable all retry logic from happening you can set zuul.retryable to false.

If you want to disable retry logic for a specific route you can set zuul.routes.serviceId.retryable to false. You want to set zuul.routes.serviceId.retryable to true for non GET requests that you want to retry on connection errors.

Customizing Your Retry Logic

If you want to customize the retry logic used by Ribbon you will need to provide your own RestClient implementation and RetryHandler. To do this you need to provide a customized Ribbon Client Configuration.

You can provide a default configuration for all clients by specifying the following annotation on your application class.

@RibbonClients(defaultConfiguration = MyRibbonConfiguration.class)

You can also specify configuration for specific clients by adding the @RibbonClient annotation on the configuration class itself

@RibbonClient(name=”foo”, configuration = MyRibbonConfiguration.class)

In addition you want to make sure that Ribbon configuration classes are not included in the main application context. That means if you are using @ComponentScan or @SpringBootApplicationyou need to make sure these configuration classes are excluded. For example, if you add the below @ComponentScan annotation to your application class

@ComponentScan(basePackages = "com.ryanjbaxter.spring.cloud.ocr.web",
        excludeFilters =@ComponentScan.Filter(
                type = FilterType.REGEX,
                pattern = "com.ryanjbaxter.spring.cloud.ocr.web.exclude.*"))

You would want to place any Ribbon configuration classes in the package com.ryanjbaxter.spring.cloud.ocr.web.exclude.

Here is a sample project that demonstrates custom retry logic
https://github.com/ryanjbaxter/zuul-retryer

@william-tran

This comment has been minimized.

Show comment
Hide comment
@william-tran

william-tran Sep 13, 2016

Contributor

Thanks for digging into this @ryanjbaxter. I took a look at the sample project and that would seem to provide the hooks needed to catch the Go Router's 404, if for example we were on a Cloud Foundry environment and wanted to retry the request on a different server because that server was down.

The customization does seem a bit invasive though, and relies on an API that will change in Camden. What about a Zuul route filter that uses Spring Retry? If you're already using Spring Retry to wrap calls to a @LoadBalanced RestTemplate then you might be able to reuse the RetryPolicy in this filter.

Contributor

william-tran commented Sep 13, 2016

Thanks for digging into this @ryanjbaxter. I took a look at the sample project and that would seem to provide the hooks needed to catch the Go Router's 404, if for example we were on a Cloud Foundry environment and wanted to retry the request on a different server because that server was down.

The customization does seem a bit invasive though, and relies on an API that will change in Camden. What about a Zuul route filter that uses Spring Retry? If you're already using Spring Retry to wrap calls to a @LoadBalanced RestTemplate then you might be able to reuse the RetryPolicy in this filter.

@tkvangorder

This comment has been minimized.

Show comment
Hide comment
@tkvangorder

tkvangorder Sep 21, 2016

I wanted to point you both at a pull request I submitted a while back. We had quite a discussion on getting the RestTemplate to use the Netflix RestClient.

#945

Eventually, Spencer did merge this request, but you must OPT IN to tell Spring cloud to use the legacy netflix rest client.

See this: #961

Bottom line (in Brixton) to get the RestTemplate to use a load balanced Netflix RestClient:

  1. you need to set the property "ribbon.http.client.enabled" to "true"
  2. you need to annotate your rest template with @LoadBalanced

As for the retry handler, you can configure a custom retry handler as part of your ribbon client configuration:

@Bean public RetryHandler buildRetryHandler() { return new BuildRetryHandler(); }

It looks like for Camden all of this will be standardized to use Spring Retry.

I wanted to point you both at a pull request I submitted a while back. We had quite a discussion on getting the RestTemplate to use the Netflix RestClient.

#945

Eventually, Spencer did merge this request, but you must OPT IN to tell Spring cloud to use the legacy netflix rest client.

See this: #961

Bottom line (in Brixton) to get the RestTemplate to use a load balanced Netflix RestClient:

  1. you need to set the property "ribbon.http.client.enabled" to "true"
  2. you need to annotate your rest template with @LoadBalanced

As for the retry handler, you can configure a custom retry handler as part of your ribbon client configuration:

@Bean public RetryHandler buildRetryHandler() { return new BuildRetryHandler(); }

It looks like for Camden all of this will be standardized to use Spring Retry.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Sep 21, 2016

Contributor

@tkvangorder thanks for pointing this out. It is certainly an option as well.

Contributor

ryanjbaxter commented Sep 21, 2016

@tkvangorder thanks for pointing this out. It is certainly an option as well.

@william-tran

This comment has been minimized.

Show comment
Hide comment
@william-tran

william-tran Sep 30, 2016

Contributor

@ryanjbaxter This just dawned on me, using the status override endpoint: https://github.com/Netflix/eureka/wiki/Understanding-eureka-client-server-communication#about-instance-statuses

I still need to test this out. I'd imagine one would PUT /eureka/v2/apps/appID/instanceID/status?value=OUT_OF_SERVICE, wait for that to propagate to all the apps that use that service, and then take it down.

https://github.com/Netflix/eureka/wiki/Eureka-REST-operations

Contributor

william-tran commented Sep 30, 2016

@ryanjbaxter This just dawned on me, using the status override endpoint: https://github.com/Netflix/eureka/wiki/Understanding-eureka-client-server-communication#about-instance-statuses

I still need to test this out. I'd imagine one would PUT /eureka/v2/apps/appID/instanceID/status?value=OUT_OF_SERVICE, wait for that to propagate to all the apps that use that service, and then take it down.

https://github.com/Netflix/eureka/wiki/Eureka-REST-operations

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Sep 30, 2016

Contributor

@william-tran yes for sure. We also integrate with the /health endpoint with Actuator. So for example when a new version of the app is deployed you can leave it in an Unknown state and only change its status to UP once you are ready for it to be used by clients. Something to play around with for sure.

Contributor

ryanjbaxter commented Sep 30, 2016

@william-tran yes for sure. We also integrate with the /health endpoint with Actuator. So for example when a new version of the app is deployed you can leave it in an Unknown state and only change its status to UP once you are ready for it to be used by clients. Something to play around with for sure.

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Sep 30, 2016

Member

@william-tran @ryanjbaxter I've created an actuator endpoint to do exactly that. It will be part of Dalston.

Member

spencergibb commented Sep 30, 2016

@william-tran @ryanjbaxter I've created an actuator endpoint to do exactly that. It will be part of Dalston.

@william-tran

This comment has been minimized.

Show comment
Hide comment
@william-tran

william-tran Sep 30, 2016

Contributor

I've tested this and it works OOTB in Brixton SR6.

Contributor

william-tran commented Sep 30, 2016

I've tested this and it works OOTB in Brixton SR6.

@spencergibb

This comment has been minimized.

Show comment
Hide comment
@spencergibb

spencergibb Oct 3, 2016

Member

@william-tran Of note you can use eureka.instance.initialStatus=OUT_OF_SERVICE to the instances come up initially that way, then use the endpoint mentioned above to change the status to UP.

Member

spencergibb commented Oct 3, 2016

@william-tran Of note you can use eureka.instance.initialStatus=OUT_OF_SERVICE to the instances come up initially that way, then use the endpoint mentioned above to change the status to UP.

@william-tran

This comment has been minimized.

Show comment
Hide comment
@william-tran

william-tran Oct 3, 2016

Contributor

Thanks @spencergibb I thought eureka.instance.instanceEnabledOnit would work at first but the value of eureka.instance.initialStatus will aways be used.

Contributor

william-tran commented Oct 3, 2016

Thanks @spencergibb I thought eureka.instance.instanceEnabledOnit would work at first but the value of eureka.instance.initialStatus will aways be used.

@pradeepkusingh

This comment has been minimized.

Show comment
Hide comment
@pradeepkusingh

pradeepkusingh Oct 20, 2016

In camden how to as kRibbon to retry , above I read ribbon doesn't retry and need to start using Spring retry.

In camden how to as kRibbon to retry , above I read ribbon doesn't retry and need to start using Spring retry.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Oct 21, 2016

Contributor

@pradeepkusingh correct, I have a couple of pull requests that will address this.
See #1375 and spring-cloud/spring-cloud-commons#137

Contributor

ryanjbaxter commented Oct 21, 2016

@pradeepkusingh correct, I have a couple of pull requests that will address this.
See #1375 and spring-cloud/spring-cloud-commons#137

@pradeepsingh1234

This comment has been minimized.

Show comment
Hide comment
@pradeepsingh1234

pradeepsingh1234 Oct 21, 2016

any idea when above pull request will be available to use in camden ?

pradeepsingh1234 commented Oct 21, 2016

any idea when above pull request will be available to use in camden ?

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Oct 21, 2016

Contributor

Waiting for my teammates to finish their reviews.

Contributor

ryanjbaxter commented Oct 21, 2016

Waiting for my teammates to finish their reviews.

@Sunrry

This comment has been minimized.

Show comment
Hide comment
@Sunrry

Sunrry Nov 13, 2016

@ryanjbaxter So I'm confused by the comment made by pradeepkusingh. Because I actually did some experiments about the retry logic of zuul and ribbon:
I ran two user services and set the thread to sleep 5 seconds to make the timeout for one of the two user services, below is my configuration of api gateway(zuul):
`zuul:
routes:
user:
path: /user/**
serviceId: user-service

ribbon:
ReadTimeout: 5000

hystrix:
command:
default:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 100000`

Here I have two questions:

  1. when zuul used with ribbon, Is the retry made by ribbon? does zuul have the retry functionality itself? If it has, so what is the retry logic?
  2. I found the zuul (or ribbon ?) would retry to request the user service when the ribbon timeout occured no matter what the spring cloud version I was use ( Brotrix SR4 or Camden SR2)? So I felt so confused with the comment: "In camden how to as kRibbon to retry , above I read ribbon doesn't retry and need to start using Spring retry." above.

So does the zuul ( or ribbon ) will retry when using camden SR2 ? and when it will retry ?
Thanks a lot !

Sunrry commented Nov 13, 2016

@ryanjbaxter So I'm confused by the comment made by pradeepkusingh. Because I actually did some experiments about the retry logic of zuul and ribbon:
I ran two user services and set the thread to sleep 5 seconds to make the timeout for one of the two user services, below is my configuration of api gateway(zuul):
`zuul:
routes:
user:
path: /user/**
serviceId: user-service

ribbon:
ReadTimeout: 5000

hystrix:
command:
default:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 100000`

Here I have two questions:

  1. when zuul used with ribbon, Is the retry made by ribbon? does zuul have the retry functionality itself? If it has, so what is the retry logic?
  2. I found the zuul (or ribbon ?) would retry to request the user service when the ribbon timeout occured no matter what the spring cloud version I was use ( Brotrix SR4 or Camden SR2)? So I felt so confused with the comment: "In camden how to as kRibbon to retry , above I read ribbon doesn't retry and need to start using Spring retry." above.

So does the zuul ( or ribbon ) will retry when using camden SR2 ? and when it will retry ?
Thanks a lot !

@pradeepkusingh

This comment has been minimized.

Show comment
Hide comment
@pradeepkusingh

pradeepkusingh Nov 14, 2016

@ryanjbaxter : is this issue(#892) fixed with CAMDEN SR2 ?

@ryanjbaxter : is this issue(#892) fixed with CAMDEN SR2 ?

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Nov 14, 2016

Contributor

@Sunrry see this comment about Zuul in Camden
#1290 (comment)

You cannot control the retry logic of Zuul using the Ribbon properties. However it will still retry failed requests. Zuul will retry on any type of SocketException (for example a timeout).

Contributor

ryanjbaxter commented Nov 14, 2016

@Sunrry see this comment about Zuul in Camden
#1290 (comment)

You cannot control the retry logic of Zuul using the Ribbon properties. However it will still retry failed requests. Zuul will retry on any type of SocketException (for example a timeout).

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Nov 14, 2016

Contributor

@pradeepkusingh if it was it would have been closed.

Contributor

ryanjbaxter commented Nov 14, 2016

@pradeepkusingh if it was it would have been closed.

@pradeepsingh1234

This comment has been minimized.

Show comment
Hide comment
@pradeepsingh1234

pradeepsingh1234 Nov 14, 2016

Thank you. I will test. Another question is how to use spring retry in Camden sr2 with Zuul/ribbon ? Any specific configuration.

Sent from my iPhone

On Nov 14, 2016, at 9:02 AM, Ryan Baxter <notifications@github.commailto:notifications@github.com> wrote:

@pradeepkusinghhttps://github.com/pradeepkusingh if it was it would have been closed.

You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/spring-cloud/spring-cloud-netflix/issues/1290#issuecomment-260376687, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOuX0SMZjTjGgHMNKg-xggkiDnI6MLjrks5q-IYYgaJpZM4JrHyD.

Thank you. I will test. Another question is how to use spring retry in Camden sr2 with Zuul/ribbon ? Any specific configuration.

Sent from my iPhone

On Nov 14, 2016, at 9:02 AM, Ryan Baxter <notifications@github.commailto:notifications@github.com> wrote:

@pradeepkusinghhttps://github.com/pradeepkusingh if it was it would have been closed.

You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/spring-cloud/spring-cloud-netflix/issues/1290#issuecomment-260376687, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOuX0SMZjTjGgHMNKg-xggkiDnI6MLjrks5q-IYYgaJpZM4JrHyD.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Nov 14, 2016

Contributor

there are no changes in Camden.SR2 related to Zuul/Ribbon.

Contributor

ryanjbaxter commented Nov 14, 2016

there are no changes in Camden.SR2 related to Zuul/Ribbon.

@pradeepsingh1234

This comment has been minimized.

Show comment
Hide comment
@pradeepsingh1234

pradeepsingh1234 Nov 14, 2016

I thought all retry logic is changed. Which release we are targeting?

Sent from my iPhone

On Nov 14, 2016, at 9:19 AM, Ryan Baxter <notifications@github.commailto:notifications@github.com> wrote:

there are no changes in Camden.SR2 related to Zuul/Ribbon.

You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/spring-cloud/spring-cloud-netflix/issues/1290#issuecomment-260381823, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOuX0aigaSRjXm0Mtak5X55FcZEdCBKoks5q-IobgaJpZM4JrHyD.

I thought all retry logic is changed. Which release we are targeting?

Sent from my iPhone

On Nov 14, 2016, at 9:19 AM, Ryan Baxter <notifications@github.commailto:notifications@github.com> wrote:

there are no changes in Camden.SR2 related to Zuul/Ribbon.

You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/spring-cloud/spring-cloud-netflix/issues/1290#issuecomment-260381823, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOuX0aigaSRjXm0Mtak5X55FcZEdCBKoks5q-IobgaJpZM4JrHyD.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Nov 14, 2016

Contributor

The only thing that changed related to retrying requests in Camden.SR2 had to do with load balanced RestTemplates using Ribbon.

Contributor

ryanjbaxter commented Nov 14, 2016

The only thing that changed related to retrying requests in Camden.SR2 had to do with load balanced RestTemplates using Ribbon.

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Feb 23, 2017

Contributor

I am going to close this issue. Load Balanced RestTemplates, Feign, and Zuul now allow you to use Spring Retry in both Camden.BUILD-SNAPSHOT and Dalston.BUILD-SNAPSHOT. The next release of both streams will contain all the necessary functionality. This should facilitate a consistent Blue/Green deployment with 0 downtime.

Contributor

ryanjbaxter commented Feb 23, 2017

I am going to close this issue. Load Balanced RestTemplates, Feign, and Zuul now allow you to use Spring Retry in both Camden.BUILD-SNAPSHOT and Dalston.BUILD-SNAPSHOT. The next release of both streams will contain all the necessary functionality. This should facilitate a consistent Blue/Green deployment with 0 downtime.

@hodeh

This comment has been minimized.

Show comment
Hide comment
@hodeh

hodeh Aug 7, 2017

@ryanjbaxter thx a lot for the analysis.
I am using Dalston.BUILD-SNAPSHOT version along with 1.5.3.RELEASE Spring Boot Version.
and I have set the following properties

ribbon:
 MaxAutoRetries: 5
  MaxAutoRetriesNextServer: 5
  OkToRetryOnAllOperations: true
  OkToRetryOnAllErrors: true
zuul:
  retryable: true

the above properties are being reflected in the LoadBalancerCommand.retryHandler.
but they are not being reflected (applied) in RequestSpecificRetryHandler, I am not trying to customize Zuul retry logic at least for now ... I only want the retry mechanisim to kick in upon Connect failures ! but while debugging RequestSpecificRetryHandler variables (okToRetryOnAllErrors, okToRetryOnConnectionErrors) are always set to false the constructor of RibbonLoadBalancingHttpClient is always invoked (
return new RequestSpecificRetryHandler(false, false, RetryHandler.DEFAULT, null);

hodeh commented Aug 7, 2017

@ryanjbaxter thx a lot for the analysis.
I am using Dalston.BUILD-SNAPSHOT version along with 1.5.3.RELEASE Spring Boot Version.
and I have set the following properties

ribbon:
 MaxAutoRetries: 5
  MaxAutoRetriesNextServer: 5
  OkToRetryOnAllOperations: true
  OkToRetryOnAllErrors: true
zuul:
  retryable: true

the above properties are being reflected in the LoadBalancerCommand.retryHandler.
but they are not being reflected (applied) in RequestSpecificRetryHandler, I am not trying to customize Zuul retry logic at least for now ... I only want the retry mechanisim to kick in upon Connect failures ! but while debugging RequestSpecificRetryHandler variables (okToRetryOnAllErrors, okToRetryOnConnectionErrors) are always set to false the constructor of RibbonLoadBalancingHttpClient is always invoked (
return new RequestSpecificRetryHandler(false, false, RetryHandler.DEFAULT, null);

@ryanjbaxter

This comment has been minimized.

Show comment
Hide comment
@ryanjbaxter

ryanjbaxter Aug 7, 2017

Contributor

@hodeh please open a separate issue

Contributor

ryanjbaxter commented Aug 7, 2017

@hodeh please open a separate issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment