Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GraphQL to scrape GitHub for packages rather than maintaining an index #29

Closed
calebkleveter opened this issue Aug 7, 2019 · 5 comments
Labels
enhancement New feature or request

Comments

@calebkleveter
Copy link

Is a reason you decided to do a hand-curated package list instead of querying GitHub for all it's available SPM packages? You can use the GraphQL API to do it: https://github.com/vapor-community/PackageCatalogAPI/blob/master/Sources/App/Models/RepoQuery.swift#L26-L65

@daveverwer
Copy link
Member

Partly as a quality thing, the fact that people can just add a Package.swift file to any repository seemed like a headache I could avoid. There's also the possibility that people will host elsewhere than GitHub, but honestly I think that's more of a theoretical advantage... 😂

But, I also didn't realise that GitHub could be queried like that with GraphQL. It's interesting. In theory the algorithm should make all the terrible packages that the query may find fade away. I'll have a play with it.

My big hope longer term is that the GitHub Package Registry will have an API, and that people will add their packages there, and I can use that as a definitive source for packages. 🤞

@helje5
Copy link
Collaborator

helje5 commented Aug 7, 2019

I would assume that the Registry API is just the GH API.
You should contact @kiliankoe about that, but i think the GH API has severe limitations on how often and how much you can query, don’t remember the details.
(btw: the above is just extracted from Kilians swift catalog ...)

@kiliankoe
Copy link
Contributor

kiliankoe commented Aug 7, 2019

Hi 👋

I used the following GraphQL query in apodidae to query for Swift packages on GitHub.

query ($query: String!) {
  search(query: $query, type: REPOSITORY, first: 100) {
  repositoryCount
  repositories: edges {
    node {
      ... on Repository {
        nameWithOwner
        description
        url
        isFork
        parent {
          nameWithOwner
        }
        isPrivate
        pushedAt
        license
        openIssues: issues(first: 0, states: OPEN) {
          totalCount
        }
        stargazers(first: 0) {
          totalCount
        }
        packageManifest: object(expression: "master:Package.swift") {
          ... on Blob {
            text
          }
        }
      }
    }
  }
}

It definitely has a big drawback though. GitHub's GraphQL schema doesn't allow me to specify any filters based on the existence of file types, so the best I can do is query for the existence of a package manifest and filter client-side. This obviously only works (I believe the limit on results was 100 repos) when also specifying Swift as the main repository language in the search query (not shown here), but that assumption doesn't hold up for all SwiftPM packages. Packages in other languages or with lots of bundled documentation will be missing entirely :/

@daveverwer
Copy link
Member

I would assume that the Registry API is just the GH API.

I think that's possible, but as the GPR is quite a big push for GitHub and that it's so ripe for needing tools built around and on top of it, I'm hoping it gets specific API methods for itself.

@daveverwer
Copy link
Member

👋 @kiliankoe!

It definitely has a big drawback though. GitHub's GraphQL schema doesn't allow me to specify any filters based on the existence of file types, so the best I can do is query for the existence of a package manifest and filter client-side. This obviously only works (I believe the limit on results was 100 repos) when also specifying Swift as the main repository language in the search query (not shown here), but that assumption doesn't hold up for all SwiftPM packages. Packages in other languages or with lots of bundled documentation will be missing entirely :/

This is such great information, thank you.

I think what I'd like to do is continue with the current JSON approach for now, at least until the Xcode 11 release in September and see what Apple/GitHub put out alongside that.

I have a hundred things I'd like to do with the actual metadata that will continue to make the search results better, and the JSON approach will work for now. I'm going to put this on ice for a bit until it's clearer what the GPR will be.

@daveverwer daveverwer changed the title Using GraphQL Use GraphQL to scrape GitHub for packages rather than maintaining an index Aug 12, 2019
@daveverwer daveverwer added the enhancement New feature or request label Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants