Add API retries for octokit requests#1420
Conversation
|
Thanks for taking a look at this! The default behaviour is that |
aeisenberg
left a comment
There was a problem hiding this comment.
I think this is generally fine, but there's a possibility that the request will be sent and processed correctly, but the response will fail to be received. Since the run remote queries requests are not idempotent, this could cause multiple runs to be started.
I don't think this is a likely scenario, but it's something to consider.
That default list of statuses to not retry looks sensible to me. Those should all be user errors where retrying won't make a difference, as opposed to 5xx errors indicating something went wrong out of the user's control.
In terms of a 404, that would only mean the controller repository doesn't exist yet and I don't think we want to retry and wait for that. For repos not having databases those are filtered out and the request either succeeds with those repos filtered out or fails with a 422 if no repos have databases. Again here I don't think we want to attempt to wait for them. The time frame of how long the user probably wants to wait vs how long the thing will take to be available isn't really compatible. Does that make sense and is that what you meant? |
Hmm, that's an interesting point there are more ways that a HTTP request can fail. I think also if it fails at DNS or a similar early stage it might not have got far enough to produce a status code and thus will likely fail with some other kind of exception that we currently don't handle. I think regarding other ways a request can fail, maybe we could handle more of them, but let's try adding this simple retry mechanism first. It's still a win, and then we can see how frequently people hit other kinds of errors that aren't handled. Is that sensible? Regarding the request succeeding on the server side but being seem as a failure by the client and getting retried, that would be a waste of resources, but then also the client doesn't have anything after the first "failure" so there's not much else it can do except retry. If we didn't do it automatically the user would probably do it manually anyway and just be more annoyed about it. There is a case I've seen that could be a little dangerous where the request "times out" and returns a 5xx error, but the processing on the server side actually continues a little bit longer and completes successfully. Some middleware returns the timeout HTTP response before killing the code handling the request. If this happens we might retry the request and it'll fake-timeout again and again, leading to wasted compute resources. We could discuss not retrying timeouts, though I'm not sure how we would configure that, or just accept that it'll only retry 3 times and this is an ok cost. |
aeisenberg
left a comment
There was a problem hiding this comment.
This sounds reasonable.
The aim is to add automatic retrying for HTTP requests made with octokit. We've seen a few cases of spurious failure causing problems and this will go a long way to avoiding them. We may also want to add the ability to fully retry some operations like downloading MRVA results, but that's a separate change to this.
Is this all that's needed to use the
@octokit/plugin-retrymodule and get automatic retrying? Reading https://github.com/octokit/plugin-retry.js this appears to be it. Just not sure how to tell if it's working or not. 🤔Maybe someone could work with me to write a test.
Checklist
ready-for-doc-reviewlabel there.