-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
One of the frustrations we've long had at Google is the amount of functionality that GitHub does not expose through either the REST or graphql API, for whatever reason. Most recently, this has been around things like listing the OAuth and GitHub Apps that have been approved for a GitHub organization. There are no official APIs for this (and lots of other) functionality, so you have to just do it manually through the GitHub UI, or screen scrape.
We've built a few single-purpose screen scraping tools like this using GoogleChrome/puppeteer, but that's a bit more complicated (and heavyweight) to run in our production environment than a pure Go approach.
A long time ago, I had talked about having an experimental package in go-github that provided a clean API for some of this scraping functionality, but I never actually built it until now.
I now have a proof of concept that successfully authenticates to github.com, including performing two-factor-auth, and can then scrape some of the additional data we need. The code is reasonably clean, but probably not particularly stable, given that it relies on specific class names and IDs of elements on the GitHub site.
So my question is, where should this scraping package live?
How do folks feel about having an experimental "scrape" package in go-github? It would be forever unstable, and would probably need to be exempt from any versioning policy that we have for the library as a whole. My main thought for keeping it in go-github (versus a standalone githubscrape repo, or whatever) is just visibility, but also because it would very specifically be designed to fill the gaps in the REST API, so it does actually track with the main go-github library itself pretty closely.