Template sync triggers GitHub abuse detection mechanism #911

KevinMenden · 2021-03-18T12:26:57Z

Syncing is failing for many pipelines with this message from GitHub:

ERROR    GitHub API returned code 403:                           __main__.py:666

{                                                                      
  "message": "You have triggered an abuse detection mechanism
              and have been temporarily blocked from content creation.
              Please retry your request again later.",                                                               
  "documentation_url": "https://docs.github.com/overview/resources-in-the-rest-api#abuse-rate-limits"                       
}

Attached are the log for the chipseq sync.
logs_5699.zip

The text was updated successfully, but these errors were encountered:

KevinMenden · 2021-03-18T12:29:18Z

GitHub API resource limits are 1,000 calls when using a GitHub Token:

https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting

ewels · 2021-03-18T23:06:13Z

The docs URL in the error message is incorrect, it should be https://docs.github.com/en/rest/overview/resources-in-the-rest-api#abuse-rate-limits

For example, using the API to rapidly create content, poll aggressively instead of using webhooks, make multiple concurrent requests, or repeatedly request data that is computationally expensive may result in abuse rate limiting.

I wonder if this "abuse rate limiting" is different to the general API rate limit?

After manually going through the failed pipelines, it seemed that all the branches had been created, it was just that the PRs were not opened.

There are a few things we can do here I think:

Look into using OAuth instead of a token to up the rate limit from 1000 to 5000
Look into running the matrix jobs sequentially instead of in parallel (I don't really want to do this)
See if we can make the nf-core sync command tolerant of stuff like the branch existing with no additional changes, so that we can re-run the Actions workflow on failure and it would just submit the PRs again where they are missing.

ewels · 2021-03-18T23:22:13Z

Ah hang on, I managed to find the relatively well hidden docs about this here: https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-abuse-rate-limits

Abuse rate limits are another way we ensure the API's availability. To avoid hitting this limit, you should ensure your application follows the guidelines below.

Make authenticated requests, or use your application's client ID and secret. Unauthenticated requests are subject to more aggressive abuse rate limiting.

Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.

If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.

When you have been limited, use the Retry-After response header to slow down. The value of the Retry-After header will always be an integer, representing the number of seconds you should wait before making requests again. For example, Retry-After: 30 means you should wait 30 seconds before sending more requests.

Requests that create content which triggers notifications, such as issues, comments and pull requests, may be further limited and will not include a Retry-After header in the response. Please create this content at a reasonable pace to avoid further limiting.

We reserve the right to change these guidelines as needed to ensure availability.

This fits with the debugging messages we get in the actions workflow log. @KevinMenden - the logs above are just the GitHub actions logs - we want the uploaded artefact which has the verbose log. You get these under the Artefacts heading on this page: https://github.com/nf-core/tools/actions/runs/664439127

sync_log_chipseq.zip

However it seems that sadly the verbose log here doesn't give us anything more 😞 It would be nice to add a log.debug message that returns all of the API call headers here, that's what I was hoping for.

tools/nf_core/sync.py

Line 343 in 03ab820

    
           raise PullRequestException(f"GitHub API returned code {r.status_code}: \n{returned_data_prettyprint}")

Anyway, I digress. I think that we should be able to catch the 403 error code in combination with the Retry-After HTTP header. Then we can sleep and try again a bunch of times. Hopefully this will solve the issue.

We're using requests which handles response headers nicely: https://requests.readthedocs.io/en/master/user/quickstart/#response-headers

KevinMenden · 2021-03-19T06:45:18Z

the logs above are just the GitHub actions logs - we want the uploaded artefact which has the verbose log.

Ah, sorry. Thought it get's all dumped in there :/

Okay there's enough to try out with the syncing there (yaaaay ....)
I had a quick look yesterday for some tools that actually allow to count API calls, to get a grasp of how much we are using the API actually. But didn't find anything useful straight away.

Maybe it's even enough to just wait for 30 second before trying to do the PR.

ewels · 2021-03-19T12:50:53Z

If we start sticking the API headers in the verbose log we can check - every GitHub API call gets headers back saying how many calls are remaining before we hit the limit (from which you can figure out how many we're using).

If all tool syncs wait 30 seconds then we will still have the same problem 😉 I was thinking of picking a number randomly between 0 and 120 and waiting that long. But I think that handling the error and waiting the prescribed amount of time is much better.

KevinMenden added the bug Something isn't working label Mar 18, 2021

ewels changed the title ~~sync failures with new release~~ Template sync triggers GitHub abuse detection mechanism Mar 18, 2021

ewels added this to To do in hackathon-march-2021-framework Mar 21, 2021

ewels added the high-priority label Mar 21, 2021

ewels added this to the 1.14 milestone Mar 21, 2021

ewels self-assigned this Mar 21, 2021

ewels moved this from To do to In progress in hackathon-march-2021-framework Mar 22, 2021

ewels modified the milestones: 1.14, 1.13.2 Mar 22, 2021

ewels removed their assignment Mar 23, 2021

ewels mentioned this issue Mar 23, 2021

Sync headers abuse #949

Merged

4 tasks

ewels linked a pull request Mar 23, 2021 that will close this issue

Sync headers abuse #949

Merged

4 tasks

ewels closed this as completed Mar 23, 2021

hackathon-march-2021-framework automation moved this from In progress to Done Tuesday Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Template sync triggers GitHub abuse detection mechanism #911

Template sync triggers GitHub abuse detection mechanism #911

KevinMenden commented Mar 18, 2021 •

edited by ewels

Loading

KevinMenden commented Mar 18, 2021

ewels commented Mar 18, 2021

ewels commented Mar 18, 2021

KevinMenden commented Mar 19, 2021

ewels commented Mar 19, 2021

Template sync triggers GitHub abuse detection mechanism #911

Template sync triggers GitHub abuse detection mechanism #911

Comments

KevinMenden commented Mar 18, 2021 • edited by ewels Loading

KevinMenden commented Mar 18, 2021

ewels commented Mar 18, 2021

ewels commented Mar 18, 2021

KevinMenden commented Mar 19, 2021

ewels commented Mar 19, 2021

KevinMenden commented Mar 18, 2021 •

edited by ewels

Loading