Discussion: github scraping package

One of the frustrations we've long had at Google is the amount of functionality that GitHub does **not** expose through either the REST or graphql API, for whatever reason.  Most recently, this has been around things like listing the OAuth and GitHub Apps that have been approved for a GitHub organization.  There are no official APIs for this (and lots of other) functionality, so you have to just do it manually through the GitHub UI, or screen scrape.

We've built a few single-purpose screen scraping tools like this using [GoogleChrome/puppeteer](https://github.com/GoogleChrome/puppeteer), but that's a bit more complicated (and heavyweight) to run in our production environment than a pure Go approach.

A long time ago, I had talked about having an experimental package in go-github that provided a clean API for some of this scraping functionality, but I never actually built it until now.

I now have a proof of concept that successfully authenticates to github.com, including performing two-factor-auth, and can then scrape some of the additional data we need.  The code is reasonably clean, but probably not particularly stable, given that it relies on specific class names and IDs of elements on the GitHub site.

**So my question is, where should this scraping package live?**

How do folks feel about having an experimental "scrape" package in go-github?  It would be forever unstable, and would probably need to be exempt from any versioning policy that we have for the library as a whole.  My main thought for keeping it in go-github (versus a standalone githubscrape repo, or whatever) is just visibility, but also because it would very specifically be designed to fill the gaps in the REST API, so it does actually track with the main go-github library itself pretty closely.

/cc @gauntface @gmlewis @dmitshur

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion: github scraping package #1308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion: github scraping package #1308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions