New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/ribbon load balanced retry policy update server stats #2334
Bugfix/ribbon load balanced retry policy update server stats #2334
Conversation
Not sure I like having to loop through all servers. That could be expensive with large lists. |
Codecov Report
@@ Coverage Diff @@
## master #2334 +/- ##
=========================================
Coverage ? 68.42%
=========================================
Files ? 231
Lines ? 8165
Branches ? 1015
=========================================
Hits ? 5587
Misses ? 2196
Partials ? 382
Continue to review full report at Codecov.
|
@spencergibb I agree with your concern on looping through all of the servers. If there is interest in accepting this PR I would not mind fixing that. Thanks! |
We are certainly interested in the PR, so go ahead and make the changes. |
+ " CirtuitBreakerTripped:" + serverStats.isCircuitBreakerTripped()); | ||
break; | ||
} | ||
Server lbServer = ribbonServerListChangeListener.getServer(serviceInstance.getHost() + ":" + serviceInstance.getPort()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see this code encapsulated in its own method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I will make that change
I'm still super hesitant about this. You're doing all this work because of an issue with Feign. Maybe it would be beneficial to fix the issue with feign rather than introducing all of this complexity into ribbon. |
Thanks for your concern and attention to this issue. I will spend some time looking for a solution to the feign issue. |
Can you more precisely describe the issue with feign? Maybe you can avoid some unneeded work. |
I can't find the connection between |
there is quite a bit going in with this issue following is a description of why ribbon is was chosen for solving this problem. zuul creates a retryable ribbon load balancing client which chooses a server and sets the requests host and port. Then the request is the captured by the RetryTemplate where it is executed by its doWithRetry method. Since the RetryTemplates RestClient is executing the request and capturing the exceptions and then retrying it is responsible for notifying the LoadBalancer of circuit tripping exceptions. In this case the original server was chosen by the RibbonCommand which updates request's host and port. Since the RetryTemplate does not have a reference to the original Server Instance it is a challenge for it to update LoadBalancers ServerStats. to the best of my knowledge a solution not involving the RibbonLoadBalancerRetryPolicy would require a change to the LoadBalancerCommand so that it allows the RetryTemplate to choose the server. Feing is only involved to the extent that the RetryableRibbonLoadBalancingHttpClient has its own RetryPolicy which extends the FeignRetryPolicy. I did look into a solution where the RetryableRibbonLoadBalancingHttpClient utilized a different InterceptorRetryPolicy and would set the server instance, unfortunately it does not have a reference to the server. thanks again for your help with this issue |
We need to find an alternative way to pass the server through. I don't want to maintain another list of servers outside of what ribbon already does. |
I agree on not maintaining another list of servers and will look for a better solution. thanks again we really appreciate your attention to this issue |
@spencergibb your hesitation for the previous solution was good since it was not fixing the root cause. After a more thorough review I found that modifying the AbstractRibbonCommand to use the execute method vs executeWithLoadBalancer when the request is retriable allows the RetryTemplate to choose the server and solves the problem. |
@spencergibb and @ryanjbaxter I was able to resolve this issue by updating the AbstractRibbonCommand so it uses the request.isRetryable() attribute for deciding between the spring-retry and netflix-loadbalancer libraries for retry and loadbalancing concerns. I am curious if you have any concerns regarding this approach. Thanks again for you help and attention to this issue and PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better. I asked a few clarifying questions. If I can get some satisfactory answers, I'll merge.
@@ -36,6 +36,8 @@ | |||
import org.springframework.cloud.netflix.ribbon.RibbonLoadBalancerClient.RibbonServer; | |||
|
|||
import java.io.IOException; | |||
import java.net.SocketException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a unit test was lost during the merge I will re-add that test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add this test back soon, I can merge this.
@@ -247,7 +247,7 @@ public void testRetrySameServerOnly() throws Exception { | |||
doReturn(uriRequest).when(request).toRequest(any(RequestConfig.class)); | |||
RibbonApacheHttpResponse returnedResponse = client.execute(request, null); | |||
verify(delegate, times(2)).execute(any(HttpUriRequest.class)); | |||
verify(lb, times(0)).chooseServer(eq(serviceName)); | |||
verify(lb, times(1)).chooseServer(eq(serviceName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned that the number of times chooseServer()
is called has increased. We've seen issues where, when the number of servers is small (2), one is always skipped since choose was called multiple times. Can you explain why this changed (and for the 4 below as well)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the number of servers chosen by the LoadBalancer is the same. Prior to this change the first server instance was chosen by the LoadBalancerCommand. Now that the AbstractLoadBalancerAwareClient is invoking the execute method for retriable requests the first server is chosen by the InterceptorRetryPolicy.
This can be verified by resetting the RetryPolicy for both the RetryableRibbonLoadBalancingHttpClient and RetryableOkHttpLoadBalancingClient back to the FeignRetryPolicy. In that scenario the first request will fail because a server has not been chosen and the host and port on the request are undefined.
|
||
RS response; | ||
// Note: using isRetriable to determine if spring-retry or netflix is are managing loadbalancing concerns. | ||
if (request != null && request.isRetriable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain in a bit more detail about why isRetriable()
says it's spring retry vs not? As far as I know, it's if you are using ribbon RestClient
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt see any clarification on this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ryanjbaxter. @pway99?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencergibb and @ryanjbaxter this is a good question. There are 3 configurations I have found that affect the restclient configuration.
1 Having a RetryTemplate defined determines if a RetryableRibbonLoadBalancingHttpClient is used otherwise its a RibbonLoadBalancingHttpClient.
2 The zuul.retryable property is used as a flag in several locations for defining a request as retryable or not.
3 The ribbon.httpclient.enabled property enables the HttpClientRibbonConfiguration
looking at these three configurations I would say that the ribbon.httpclient.enabled property most likely defines the RestClient as ribbon or not. Is this correct?
overall its a challenging configuration, I will give some more thought to inline comments.
Thanks again
thanks spencer I will do that right now
…On Mon, Oct 23, 2017 at 11:09 AM, Spencer Gibb ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In spring-cloud-netflix-core/src/test/java/org/springframework/
cloud/netflix/ribbon/RibbonLoadBalancedRetryPolicyFactoryTests.java
<#2334 (comment)>
:
> @@ -36,6 +36,8 @@
import org.springframework.cloud.netflix.ribbon.RibbonLoadBalancerClient.RibbonServer;
import java.io.IOException;
+import java.net.SocketException;
If you add this test back soon, I can merge this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2334 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVSuzEnGRSA58DoiO3IVhJKvvXFIij-Sks5svNZNgaJpZM4PpTc1>
.
--
*Patrick Way*
Senior Java Developer
Build.com <http://www.build.com>
*P: *800.375.3403 x | *F:* 530.566.1893
patrick.way@build.com
<http://facebook.com/builddotcom> <http://twitter.com/buildcom>
<http://instagram.com/buildcom> <http://youtube.com/build>
<http://linkedin.com/company/build-com>
Your app for every home improvement project.
<https://link.build.com/OVmy/E3jtmhSMnH>
<https://link.build.com/OVmy/E3jtmhSMnH?source=em~eml~35~network-General~10_17_2017_AppBadgeSig>
<https://link.build.com/OVmy/E3jtmhSMnH>
<https://link.build.com/OVmy/E3jtmhSMnH?source=em~eml~35~network-General~10_17_2017_AppBadgeSig>
|
@pway99 looks like some stray commits made it in via a rebase, can you fix them? I was about to merge. |
@spencergibb yes I will look for the stray commits and fix them now. Thanks again |
NP, looking forward to getting this merged. |
any update @pway99? |
@spencergibb still working on it. sorry its taking a bit of time doing my best |
3e6eac2
to
03fea29
Compare
* Allow init params to be set on ProxyStreamServlet. Fixes spring-cloudgh-2301 * Added unit test for #spring-cloudgh-2301
The message has to be sent after the endpoint starts up, which only happens after the first request is accepted, so we put that logic in the response extractor.
Port information was missing from data sent to eureka. fixes spring-cloudgh-2372
Extract separate class TurbineClustersProvider to be able to provide custom solution for retrieving configured clusters (spring-cloud#2219)
* Add new Zuul filters endpoint and refactor routes endpoint * Refactor FiltersEndpoint - Extend from AbstractEndpoint and simplify code - Move tests classes into package endpoint - Remove usage of Lombok * Conditionally enable Zuul routes and filters endpoints * Provide FilterRegistry to FiltersEndpoint via constructor
Update documentation to reflect spring-cloud#1048
…nabled - The eureka instance auto registration also changes the secure port when the server port is random (0) Fixes spring-cloudgh-2303
…uit tripping exceptions
12998ac
to
3682572
Compare
having issues with git merge and rebase. Going to create a new clean PR |
Sounds good |
@spencergibb and @ryanjbaxter In an effort to get past the merge issues and stray commits I have closed this pr and opened a new PR (#2390). I am currently looking into some unit test failures on new pr. Please let me know if there any action is required on either of these PRs. Thanks again for your assistance, |
This pr is associated with #1878 where the RibbonLoadBalancedRetryPolicy does not update the LoadBalancer ServerStats when registering CircuitTrippingExceptions.
note the RetryableRibbonLoadBalancingHttpClient has its own RetryPolicy which extends the FeignRetryPolicy where the ServiceInstance does not have a reference the Server. For this reason obtaining a reference to LoadBalancer ServerStats requires looping through the load balancers server list.