Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/ribbon load balanced retry policy update server stats #2334

Conversation

pway99
Copy link
Contributor

@pway99 pway99 commented Sep 29, 2017

This pr is associated with #1878 where the RibbonLoadBalancedRetryPolicy does not update the LoadBalancer ServerStats when registering CircuitTrippingExceptions.

note the RetryableRibbonLoadBalancingHttpClient has its own RetryPolicy which extends the FeignRetryPolicy where the ServiceInstance does not have a reference the Server. For this reason obtaining a reference to LoadBalancer ServerStats requires looping through the load balancers server list.

@spencergibb
Copy link
Member

Not sure I like having to loop through all servers. That could be expensive with large lists.

@codecov-io
Copy link

codecov-io commented Sep 29, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@df7f310). Click here to learn what that means.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2334   +/-   ##
=========================================
  Coverage          ?   68.42%           
=========================================
  Files             ?      231           
  Lines             ?     8165           
  Branches          ?     1015           
=========================================
  Hits              ?     5587           
  Misses            ?     2196           
  Partials          ?      382
Impacted Files Coverage Δ
...bon/okhttp/RetryableOkHttpLoadBalancingClient.java 80.55% <ø> (ø)
...apache/RetryableRibbonLoadBalancingHttpClient.java 87.93% <ø> (ø)
...l/filters/route/support/AbstractRibbonCommand.java 81.63% <100%> (ø)
.../netflix/ribbon/RibbonLoadBalancedRetryPolicy.java 85.71% <92.3%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df7f310...3682572. Read the comment docs.

@pway99
Copy link
Contributor Author

pway99 commented Sep 29, 2017

@spencergibb I agree with your concern on looping through all of the servers. If there is interest in accepting this PR I would not mind fixing that. Thanks!

@ryanjbaxter
Copy link
Contributor

We are certainly interested in the PR, so go ahead and make the changes.

+ " CirtuitBreakerTripped:" + serverStats.isCircuitBreakerTripped());
break;
}
Server lbServer = ribbonServerListChangeListener.getServer(serviceInstance.getHost() + ":" + serviceInstance.getPort());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see this code encapsulated in its own method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I will make that change

@spencergibb
Copy link
Member

I'm still super hesitant about this. You're doing all this work because of an issue with Feign. Maybe it would be beneficial to fix the issue with feign rather than introducing all of this complexity into ribbon.

@pway99
Copy link
Contributor Author

pway99 commented Oct 4, 2017

Thanks for your concern and attention to this issue. I will spend some time looking for a solution to the feign issue.

@spencergibb
Copy link
Member

Can you more precisely describe the issue with feign? Maybe you can avoid some unneeded work.

@spencergibb
Copy link
Member

I can't find the connection between RetryableRibbonLoadBalancingHttpClient and feign.

@pway99
Copy link
Contributor Author

pway99 commented Oct 4, 2017

there is quite a bit going in with this issue following is a description of why ribbon is was chosen for solving this problem.

zuul creates a retryable ribbon load balancing client which chooses a server and sets the requests host and port. Then the request is the captured by the RetryTemplate where it is executed by its doWithRetry method. Since the RetryTemplates RestClient is executing the request and capturing the exceptions and then retrying it is responsible for notifying the LoadBalancer of circuit tripping exceptions.

In this case the original server was chosen by the RibbonCommand which updates request's host and port. Since the RetryTemplate does not have a reference to the original Server Instance it is a challenge for it to update LoadBalancers ServerStats.

to the best of my knowledge a solution not involving the RibbonLoadBalancerRetryPolicy would require a change to the LoadBalancerCommand so that it allows the RetryTemplate to choose the server.

Feing is only involved to the extent that the RetryableRibbonLoadBalancingHttpClient has its own RetryPolicy which extends the FeignRetryPolicy. I did look into a solution where the RetryableRibbonLoadBalancingHttpClient utilized a different InterceptorRetryPolicy and would set the server instance, unfortunately it does not have a reference to the server.

thanks again for your help with this issue

@spencergibb
Copy link
Member

We need to find an alternative way to pass the server through. I don't want to maintain another list of servers outside of what ribbon already does.

@pway99
Copy link
Contributor Author

pway99 commented Oct 4, 2017

I agree on not maintaining another list of servers and will look for a better solution. thanks again we really appreciate your attention to this issue

@pway99
Copy link
Contributor Author

pway99 commented Oct 5, 2017

@spencergibb your hesitation for the previous solution was good since it was not fixing the root cause. After a more thorough review I found that modifying the AbstractRibbonCommand to use the execute method vs executeWithLoadBalancer when the request is retriable allows the RetryTemplate to choose the server and solves the problem.

@pway99
Copy link
Contributor Author

pway99 commented Oct 5, 2017

@spencergibb and @ryanjbaxter I was able to resolve this issue by updating the AbstractRibbonCommand so it uses the request.isRetryable() attribute for deciding between the spring-retry and netflix-loadbalancer libraries for retry and loadbalancing concerns. I am curious if you have any concerns regarding this approach. Thanks again for you help and attention to this issue and PR.

Copy link
Member

@spencergibb spencergibb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better. I asked a few clarifying questions. If I can get some satisfactory answers, I'll merge.

@@ -36,6 +36,8 @@
import org.springframework.cloud.netflix.ribbon.RibbonLoadBalancerClient.RibbonServer;

import java.io.IOException;
import java.net.SocketException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a unit test was lost during the merge I will re-add that test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add this test back soon, I can merge this.

@@ -247,7 +247,7 @@ public void testRetrySameServerOnly() throws Exception {
doReturn(uriRequest).when(request).toRequest(any(RequestConfig.class));
RibbonApacheHttpResponse returnedResponse = client.execute(request, null);
verify(delegate, times(2)).execute(any(HttpUriRequest.class));
verify(lb, times(0)).chooseServer(eq(serviceName));
verify(lb, times(1)).chooseServer(eq(serviceName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned that the number of times chooseServer() is called has increased. We've seen issues where, when the number of servers is small (2), one is always skipped since choose was called multiple times. Can you explain why this changed (and for the 4 below as well)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the number of servers chosen by the LoadBalancer is the same. Prior to this change the first server instance was chosen by the LoadBalancerCommand. Now that the AbstractLoadBalancerAwareClient is invoking the execute method for retriable requests the first server is chosen by the InterceptorRetryPolicy.

This can be verified by resetting the RetryPolicy for both the RetryableRibbonLoadBalancingHttpClient and RetryableOkHttpLoadBalancingClient back to the FeignRetryPolicy. In that scenario the first request will fail because a server has not been chosen and the host and port on the request are undefined.


RS response;
// Note: using isRetriable to determine if spring-retry or netflix is are managing loadbalancing concerns.
if (request != null && request.isRetriable()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in a bit more detail about why isRetriable() says it's spring retry vs not? As far as I know, it's if you are using ribbon RestClient or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt see any clarification on this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@pway99 pway99 Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencergibb and @ryanjbaxter this is a good question. There are 3 configurations I have found that affect the restclient configuration.

1 Having a RetryTemplate defined determines if a RetryableRibbonLoadBalancingHttpClient is used otherwise its a RibbonLoadBalancingHttpClient.
2 The zuul.retryable property is used as a flag in several locations for defining a request as retryable or not.
3 The ribbon.httpclient.enabled property enables the HttpClientRibbonConfiguration

looking at these three configurations I would say that the ribbon.httpclient.enabled property most likely defines the RestClient as ribbon or not. Is this correct?

overall its a challenging configuration, I will give some more thought to inline comments.

Thanks again

@pway99
Copy link
Contributor Author

pway99 commented Oct 23, 2017 via email

@spencergibb
Copy link
Member

@pway99 looks like some stray commits made it in via a rebase, can you fix them? I was about to merge.

@pway99
Copy link
Contributor Author

pway99 commented Oct 23, 2017

@spencergibb yes I will look for the stray commits and fix them now. Thanks again

@spencergibb
Copy link
Member

NP, looking forward to getting this merged.

@spencergibb
Copy link
Member

any update @pway99?

@pway99
Copy link
Contributor Author

pway99 commented Oct 23, 2017

@spencergibb still working on it. sorry its taking a bit of time doing my best

@pway99 pway99 force-pushed the bugfix/RibbonLoadBalancedRetryPolicy-Update-ServerStats branch from 3e6eac2 to 03fea29 Compare October 23, 2017 22:51
Dave Syer and others added 25 commits October 23, 2017 16:56
Port information was missing from data sent to eureka.

fixes spring-cloudgh-2372
Extract separate class TurbineClustersProvider to be able to provide custom solution for retrieving configured clusters (spring-cloud#2219)
* Add new Zuul filters endpoint and refactor routes endpoint
* Refactor FiltersEndpoint
- Extend from AbstractEndpoint and simplify code
- Move tests classes into package endpoint
- Remove usage of Lombok
* Conditionally enable Zuul routes and filters endpoints
* Provide FilterRegistry to FiltersEndpoint via constructor
…nabled

- The eureka instance auto registration also changes the secure port when the server port is random (0)

Fixes spring-cloudgh-2303
@pway99 pway99 force-pushed the bugfix/RibbonLoadBalancedRetryPolicy-Update-ServerStats branch from 12998ac to 3682572 Compare October 24, 2017 00:24
@pway99 pway99 closed this Oct 24, 2017
@pway99
Copy link
Contributor Author

pway99 commented Oct 24, 2017

having issues with git merge and rebase. Going to create a new clean PR

@spencergibb
Copy link
Member

Sounds good

@pway99
Copy link
Contributor Author

pway99 commented Oct 24, 2017

@spencergibb and @ryanjbaxter In an effort to get past the merge issues and stray commits I have closed this pr and opened a new PR (#2390). I am currently looking into some unit test failures on new pr.

Please let me know if there any action is required on either of these PRs.

Thanks again for your assistance,
Pat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants