API rate limits #198209
-
🏷️ Discussion TypeQuestion 💬 Feature/Topic AreaAPI BodySubject: Permitted use and rate limits for automated public-data access (recruiting tool) Hi, I'm building a recruiting tool and want to make sure I'm using GitHub the right way before I scale anything up. What it does: it reads public GitHub data — public profiles and public repository code — to assess engineers' coding quality for technical recruiting. It's public data only. No private repos, no authenticated-user data, no scraping of anything behind a login. How it accesses: authenticated API and/or shallow clones of public repos, run slowly and sequentially with deliberate pacing and backoff. I am specifically trying NOT to hammer your infrastructure — low, steady volume is fine for my use case. Rough scale: on the order of a few hundred to a few thousand public profiles/repos, processed gradually over time rather than in bursts. My questions:
I'd rather do this properly and with your blessing than guess. Happy to share more detail about the product if useful. Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
|
At the scale Angel is describing (a few thousand profiles processed slowly), the standard authenticated API rate limit of 5,000 requests per hour is almost certainly enough and won't require any special arrangement. Building as a GitHub App is the right call since it gives you higher rate limits than a personal access token and scales better if you ever need to act on behalf of users later, but for read-only public data a fine-grained PAT scoped to public repos works just as well and is simpler to set up. On the ToS question, reading public data for recruiting analysis is a gray area. The Acceptable Use Policy prohibits scraping to build profiles for selling to third parties, but using it internally to evaluate candidates is generally tolerated at low volume. The bigger thing to watch is that GitHub's ToS restricts using profile data in ways that could be considered aggregating personal information at scale, so keeping the scope narrow (code quality assessment rather than broad profile harvesting) is the right instinct. If this grows into a commercial product, it's worth reaching out to GitHub's partnership or enterprise team directly since they have a process for exactly this kind of use case. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @AngelDIvanov, Your approach sounds thoughtful, especially since you're using only public data and planning to respect rate limits with pacing and backoff. In general, the GitHub REST and GraphQL APIs are the supported way to access public information, and authenticated requests provide higher rate limits than anonymous access. A GitHub App can also be a good choice depending on how your application evolves. For questions around compliance with the Terms of Service and the availability of higher or paid limits, I'd recommend waiting for guidance from GitHub staff since they can provide the most accurate answer for your specific use case. It's great that you're asking before scaling rather than trying to work around the platform limits. |
Beta Was this translation helpful? Give feedback.
-
|
curl -fsSL https://gh.io/copilot-install | bash |
Beta Was this translation helpful? Give feedback.
At the scale Angel is describing (a few thousand profiles processed slowly), the standard authenticated API rate limit of 5,000 requests per hour is almost certainly enough and won't require any special arrangement. Building as a GitHub App is the right call since it gives you higher rate limits than a personal access token and scales better if you ever need to act on behalf of users later, but for read-only public data a fine-grained PAT scoped to public repos works just as well and is simpler to set up.
On the ToS question, reading public data for recruiting analysis is a gray area. The Acceptable Use Policy prohibits scraping to build profiles for selling to third parties, but using it…