Zone with lots of records errors out #368

shahman · 2015-07-01T01:26:09Z

I am using the latest denominator version. A zone with lots of records errors out, is there some other way for me to uniformly fetch records? Thanks.

For example, the zone "testOcp.io" with only 6 records succeeds fine

Iterator<ResourceRecordSet<?>> rsetIterator = dnsApiManager.api().recordSetsInZone("testOcp.io").iterator();

but replacing it with a zone that has 100s of records gives the error below

feign.RetryableException: [OPERATION_FAILED: token: This session already has a job running] at denominator.dynect.DynECTErrorDecoder.decode(DynECTErrorDecoder.java:78) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:121) at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:71) at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:94) at com.sun.proxy.$Proxy147.rrsets(Unknown Source) at denominator.dynect.DynECTResourceRecordSetApi.iterator(DynECTResourceRecordSetApi.java:45) at denominator.ResourceRecordSetApi$iterator.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45) at

shahman · 2015-07-01T05:15:36Z

This is probably related to the fact that on taking longer than 5 seconds, you end up with a redirect and a job id which is not then followed up on to retrieve the underlying records. Not sure how this worked in denominator version 1.x

codefromthecrypt · 2015-07-01T19:13:40Z

this is unfortunately common in this particular provider, but not something easy to troubleshoot. Usually the issue clears eventually. If this is consistently in error, I'd contact Dyn as the api should have no problem querying hundreds of records.

shahman · 2015-07-01T19:18:05Z

I believe the problem is that the wrapper around dynect apis is not handling this properly. For large records, dyn issues a redirect giving the URL with the job id in it. That redirected URL actually holds all the records that are needed.

Thanks for your response Adrian.

shahman · 2015-07-01T19:21:50Z

curl -i -H 'Auth-Token: xyz' -H 'Content-Type: application/json' "https://api2.dynect.net/REST/AllRecord/isp.nflxvideo.net?detail=Y"
HTTP/1.1 200 OK
Server: nginx/1.2.6
Date: Wed, 01 Jul 2015 19:19:54 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive

{"status": "incomplete", "data": null, "job_id": 1730280267}

Then, we take the job and do as below
curl -i -H 'Auth-Token: xyz' -H 'Content-Type: application/json' "https://api2.dynect.net/REST/Job/1730280267"
HTTP/1.1 200 OK
Server: nginx/1.2.6
Date: Wed, 01 Jul 2015 19:21:00 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive

codefromthecrypt · 2015-07-01T19:52:17Z

Denominator has numerous workarounds to Dyn problems including both patterns you pasted. These are also tested. You can also use the CLI with verbose on to watch denominator retry.
https://github.com/Netflix/denominator/blob/master/dynect/src/test/java/denominator/dynect/DynECTTest.java#L85
https://github.com/Netflix/denominator/blob/master/dynect/src/test/java/denominator/dynect/DynECTTest.java#L74

What happened was that the server errors beyond the retry count (5 tries ~5 seconds).
https://github.com/Netflix/feign/blob/master/core/src/main/java/feign/Retryer.java#L42

I found the code that was used in denominator 1.0 had an insanely high redirect count of 100. I actually remember doing this, as even 25 redirects sometimes failed. The service improved for a while, and maybe it is back to terrible.
https://github.com/jclouds/jclouds/blob/master/providers/dynect/src/main/java/org/jclouds/dynect/v3/DynECTProviderMetadata.java#L56

My strong opinion is that you escalate to DynECT, as really high retries feign a solution (no pun intended). Note you can reproduce this using denominator's CLI, as it will output the http calls used.

If you really want to make this have a high retry count, then feel free to raise a pull request. Note that it will break tests that show that we retry 5 times. Basically, you will add a .retryer here

https://github.com/Netflix/denominator/blob/master/dynect/src/main/java/denominator/dynect/DynECTProvider.java#L175

Hope this helps

codefromthecrypt · 2015-07-01T20:28:52Z

actually tell you what. If you are ok with the 100 retry option, I can put that in and push a release out tomorrow. It would be pretty easy for me to change. I'd only ask that if it fails after 100 retries, you ping dyn as I don't use the service anymore so don't have a support context. deal?

shahman · 2015-07-01T20:43:02Z

I am not sure that would fix it. It always worked for me when I issued a request and then used the job id to access the records.

We are going to try and delete a bunch of our old records and that should hopefully bring it down under 5 seconds so that we don't have to follow the job id route.

If that doesnt work, we can try the 100 retry option. Thank you for volunteering!

codefromthecrypt · 2015-07-01T20:51:16Z

np

codefromthecrypt closed this as completed Jul 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zone with lots of records errors out #368

Zone with lots of records errors out #368

shahman commented Jul 1, 2015

shahman commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

shahman commented Jul 1, 2015

shahman commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

shahman commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

Zone with lots of records errors out #368

Zone with lots of records errors out #368

Comments

shahman commented Jul 1, 2015

shahman commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

shahman commented Jul 1, 2015

shahman commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

shahman commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015