Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zone with lots of records errors out #368

Closed
shahman opened this issue Jul 1, 2015 · 8 comments
Closed

Zone with lots of records errors out #368

shahman opened this issue Jul 1, 2015 · 8 comments

Comments

@shahman
Copy link

shahman commented Jul 1, 2015

I am using the latest denominator version. A zone with lots of records errors out, is there some other way for me to uniformly fetch records? Thanks.

For example, the zone "testOcp.io" with only 6 records succeeds fine

Iterator<ResourceRecordSet<?>> rsetIterator = dnsApiManager.api().recordSetsInZone("testOcp.io").iterator();

but replacing it with a zone that has 100s of records gives the error below

feign.RetryableException: [OPERATION_FAILED: token: This session already has a job running] at denominator.dynect.DynECTErrorDecoder.decode(DynECTErrorDecoder.java:78) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:121) at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:71) at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:94) at com.sun.proxy.$Proxy147.rrsets(Unknown Source) at denominator.dynect.DynECTResourceRecordSetApi.iterator(DynECTResourceRecordSetApi.java:45) at denominator.ResourceRecordSetApi$iterator.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45) at

@shahman
Copy link
Author

shahman commented Jul 1, 2015

This is probably related to the fact that on taking longer than 5 seconds, you end up with a redirect and a job id which is not then followed up on to retrieve the underlying records. Not sure how this worked in denominator version 1.x

@codefromthecrypt
Copy link
Contributor

this is unfortunately common in this particular provider, but not something easy to troubleshoot. Usually the issue clears eventually. If this is consistently in error, I'd contact Dyn as the api should have no problem querying hundreds of records.

@shahman
Copy link
Author

shahman commented Jul 1, 2015

I believe the problem is that the wrapper around dynect apis is not handling this properly. For large records, dyn issues a redirect giving the URL with the job id in it. That redirected URL actually holds all the records that are needed.

Thanks for your response Adrian.

@shahman
Copy link
Author

shahman commented Jul 1, 2015

curl -i -H 'Auth-Token: xyz' -H 'Content-Type: application/json' "https://api2.dynect.net/REST/AllRecord/isp.nflxvideo.net?detail=Y"
HTTP/1.1 200 OK
Server: nginx/1.2.6
Date: Wed, 01 Jul 2015 19:19:54 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive

{"status": "incomplete", "data": null, "job_id": 1730280267}

Then, we take the job and do as below
curl -i -H 'Auth-Token: xyz' -H 'Content-Type: application/json' "https://api2.dynect.net/REST/Job/1730280267"
HTTP/1.1 200 OK
Server: nginx/1.2.6
Date: Wed, 01 Jul 2015 19:21:00 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive

@codefromthecrypt
Copy link
Contributor

Denominator has numerous workarounds to Dyn problems including both patterns you pasted. These are also tested. You can also use the CLI with verbose on to watch denominator retry.
https://github.com/Netflix/denominator/blob/master/dynect/src/test/java/denominator/dynect/DynECTTest.java#L85
https://github.com/Netflix/denominator/blob/master/dynect/src/test/java/denominator/dynect/DynECTTest.java#L74

What happened was that the server errors beyond the retry count (5 tries ~5 seconds).
https://github.com/Netflix/feign/blob/master/core/src/main/java/feign/Retryer.java#L42

I found the code that was used in denominator 1.0 had an insanely high redirect count of 100. I actually remember doing this, as even 25 redirects sometimes failed. The service improved for a while, and maybe it is back to terrible.
https://github.com/jclouds/jclouds/blob/master/providers/dynect/src/main/java/org/jclouds/dynect/v3/DynECTProviderMetadata.java#L56

My strong opinion is that you escalate to DynECT, as really high retries feign a solution (no pun intended). Note you can reproduce this using denominator's CLI, as it will output the http calls used.

If you really want to make this have a high retry count, then feel free to raise a pull request. Note that it will break tests that show that we retry 5 times. Basically, you will add a .retryer here

https://github.com/Netflix/denominator/blob/master/dynect/src/main/java/denominator/dynect/DynECTProvider.java#L175

Hope this helps

@codefromthecrypt
Copy link
Contributor

actually tell you what. If you are ok with the 100 retry option, I can put that in and push a release out tomorrow. It would be pretty easy for me to change. I'd only ask that if it fails after 100 retries, you ping dyn as I don't use the service anymore so don't have a support context. deal?

@shahman
Copy link
Author

shahman commented Jul 1, 2015

I am not sure that would fix it. It always worked for me when I issued a request and then used the job id to access the records.

We are going to try and delete a bunch of our old records and that should hopefully bring it down under 5 seconds so that we don't have to follow the job id route.

If that doesnt work, we can try the 100 retry option. Thank you for volunteering!

@codefromthecrypt
Copy link
Contributor

np

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants