-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚡ GitHub GraphQL API #33
Conversation
Great, it looks wonderful. The only problem I faced was related to the scope of the Token. It's necessary to add I also saw in the graphQL query a limit on the number of mentionableUsers(first: 100), organizations(first: 100), and repositoryTopics(first: 10). Maybe this should be a parameter or mentioned in the README? Is there any similar limit in the actual gimie version? The speed achieved is impressive. Great work. |
I've been doing some research on |
Yes, the REST API is paginated so the current version of gimie would also have this issue. We could work around this by adding logic for pagination (multiple queries with limit + offset) but this would be a story for another day ;) Adding it + the required token scope in the README is a good idea! |
@caviri OK it looks like we're now getting the actual contributors. 🎉
What do you think? |
Using the paginated list of commits becomes slow when there are many commits... So we run into the original issue: The execution takes a long time due to waiting on many requests (Now querying Renku takes 19s vs 38s for the original REST version). I see two solutions:
Maybe we can just ignore it for now and keep performance optimization for an other PR. What do you think @caviri ? |
🚀 Managed to restore full speed using a combination of GraphQL and REST:
So in two queries we get deep metadata about all contributors! It takes 2.5s for Renku (vs 38s with REST). |
This PR replaces queries to GitHub's REST API by a single GraphQL query.
The REST API required additional queries for nested attributes. In particular, we needed one query per contributor to extract user information. This resulted in extremely slow performances for large open-source projects.
Improvements:
When using the GraphQL endpoint, we can specify the desired schema of the response with a single query. This provides a major speedup proportional to the number of contributors in the target repository (e.g., query time for Renku: 1.04s instead of 38.7s with REST).
Caveats (so far):
mentionableUsers
instead. In cases where the repo belongs to an organization, this will include all organization members.mentionableUsers
, userorganizations
(affiliations) andrepositoryTopics
are paginated (note this is also the case in the REST API). This is currently set to arbitrary thresholds (100 for contributors and affiliations, 10 for topics).Example pre-PR output
Example post-PR output