Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry calling the API when a retry could succeed #12

Closed
ebdrup opened this issue Mar 2, 2016 · 10 comments
Closed

Retry calling the API when a retry could succeed #12

ebdrup opened this issue Mar 2, 2016 · 10 comments
Assignees

Comments

@ebdrup
Copy link

ebdrup commented Mar 2, 2016

We're seeing a few internal_server_error errors in production.

Could you make figo retry (maybe 3 times) on retry-able errors? Meaning any 5XX statuscode or any of these errors:

const RETRIABLE_ERRORS = [
    'ECONNRESET',
    'ENOTFOUND',
    'ESOCKETTIMEDOUT',
    'ETIMEDOUT',
    'ECONNREFUSED',
    'EHOSTUNREACH',
    'EPIPE',
    'EAI_AGAIN'
];
@mfilenko
Copy link
Contributor

mfilenko commented Mar 3, 2016

Hey @ebdrup,

Could you please provide more information on how did you face those errors so we can investigate that?

Thanks!

@ebdrup
Copy link
Author

ebdrup commented Mar 3, 2016

We see these kind of network errors (probably) on all web requests periodically, when the volume is high.

If it was a genuine 500, returned by figo, you should see it in your own logs that you hopefully monitor - I don't think it's a 500, but a network failure.

That's why we build the module request-retry-stream that we use almost everywhere we call webservices. The retries made all these errors go away. And thats nice since we are trying to implement a zero-tolerance for failing web requests.

Unfortunately we can't use request-retry-stream on your web requests as they are embedded in your own raw implementation inside your module.

Allan Ebdrup, CTO @ Debitoor

On 3. mar. 2016, at 10.22, Max Filenko notifications@github.com wrote:

Hey @ebdrup,

Could you please provide more information on how did you face those errors so we can investigate that?

Thanks!


Reply to this email directly or view it on GitHub.

@ebdrup
Copy link
Author

ebdrup commented Mar 8, 2016

@mfilenko Any news on this issue? Today we got some 502s from you. These can probably also successful if retried. Response we got from you:

<html> <head><title>502 Bad Gateway</title></head> <body bgcolor="white"> <center><h1>502 Bad Gateway</h1></center> <hr><center>nginx</center> </body> </html>

@mfilenko
Copy link
Contributor

mfilenko commented Mar 9, 2016

Hey @ebdrup,

Is it possible to use your request-retry-stream library for that or methods except GET are still truly experimental ;-)?

@mfilenko
Copy link
Contributor

Hey @ebdrup,

Thank you again for pointing out this issue.

We are working hard to provide our partners and users with the best experience (you can check our Pingdom uptime history). And we also have a zero-tolerance for failing requests.

We will do our best to eliminate the root cause of this issue on the infrastructure level, so there will be no need in workaround like automatically retry failing request in the SDK.

@ebdrup
Copy link
Author

ebdrup commented Mar 15, 2016

When you are providing a SDK that does network connections to your API, a situation where retries are never needed does simply not exist. You HAVE to build in retries to make things work reliably. There is no two ways about it, you simply have to. It is NOT a workaround. It's the only real option for distributed computing.

Give this wikipedia article a good read:
https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing

Please reconsider. We need retries in your SDK, as any seasoned developer with experience with distributed computing will be able to tell you.

@ebdrup
Copy link
Author

ebdrup commented Mar 15, 2016

@mfilenko Sorry I didn't see you question about request-retry-stream. No those are not experimental. We are actually using them in production. I'll update the readme. :-)

@ebdrup
Copy link
Author

ebdrup commented Mar 15, 2016

@mfilenko I updated the readme, also with information about errors returned. We often use err.statusCode for error handling. As I mentioned we are using it in production on debitoor. It's handling hundreds of thousands of requests every day. Since we added it all our randomly failing network requests have disappeared.

We would have added it to our requests to figo. But that was impossible since the requests are embedded in your SDK.

We are making quite a lot of http requests with it, because our application is build with a lot of microservices.

@mfilenko
Copy link
Contributor

@ebdrup, great, thanks! We will include this in the next release.

@JeremyCraigMartinez
Copy link
Contributor

Issue resolved with PR #25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants