Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RateLimitTransport incompatible with v4 api #849

Open
lijok opened this issue Jul 5, 2021 · 20 comments
Open

RateLimitTransport incompatible with v4 api #849

lijok opened this issue Jul 5, 2021 · 20 comments
Labels
Provider Status: Needs info Full requirements are not yet known, so implementation should not be started Type: Bug Something isn't working as documented

Comments

@lijok
Copy link

lijok commented Jul 5, 2021

v4, graphql api, uses POST for all operations, including read
RateLimitTransport sleeps for 1s between each POST/PATCH/PUT/DELETE operation
Say you have 100 repos with 3 branch protection rules each, the provider will spend 5 minutes sleeping

@jcudit jcudit added Type: Bug Something isn't working as documented Provider labels Jul 21, 2021
@ayk33
Copy link

ayk33 commented Jul 27, 2021

Anything that can be done about this?

@lijok
Copy link
Author

lijok commented Jul 28, 2021

the provider needs to either not use v4 api, which is probably not wise long-term
or, rethink the RateLimitTransport

@majormoses
Copy link
Contributor

majormoses commented Sep 28, 2021

I think its important to keep the RateLimitTransport for any v3 api interactions for larger organizations where we need to rely on this to get through plans in our largest states. In the graphql api I agree we need to think about rate limits very differently.

Here is the docs from Github: https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit

What I think we want to do is create a generic rate limit transport and then let each rest and gql apis use the appropriate transport based on the calls.

@jcudit
Copy link
Contributor

jcudit commented Oct 5, 2021

The latest release contains a possible fix for your use case. Can you take a look at the new write_delay_ms configuration parameter and let us know if this can be closed?

@lijok
Copy link
Author

lijok commented Oct 5, 2021

@jcudit thanks for that
set write_delay_ms to 200, which has on average sped up our plan times 2x
image
Lowering this value further yields marginal gains (100ms = ~6m30s)

However, while the PR that introduces write_delay_ms option helps this problem, I think long term it's not the correct solution for it. I think the provider should have separate transports for v4 and v3 api's, keeping the rate limit logic as is on the v3 api, but modifying the v4 api transport to only invoke this write delay on mutations, rather than queries

Any thoughts?

@shoddyguard
Copy link

I would have to agree that while #907 certainly helps (and is very much appreciated) it's not the solution.

As @lijok mentions It would be good to differentiate between the API's and have appropriate logic between them, this is only going to become more of a problem as more things move to the newer API.

@majormoses
Copy link
Contributor

I think its important to keep the RateLimitTransport for any v3 api interactions for larger organizations where we need to rely on this to get through plans in our largest states. In the graphql api I agree we need to think about rate limits very differently.

Here is the docs from Github: https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit

What I think we want to do is create a generic rate limit transport and then let each rest and gql apis use the appropriate transport based on the calls.

If we go this route we can start with adding a no-op rate limit transport for GQL apis and use the existing rate limiter for REST.

@github-actions
Copy link

👋 Hey Friends, this issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Please add the Status: Pinned label if you feel that this issue needs to remain open/active. Thank you for your contributions and help in keeping things tidy!

@github-actions github-actions bot added the Status: Stale Used by stalebot to clean house label Nov 30, 2022
@shoddyguard
Copy link

This definitely still needs addressing

@kfcampbell kfcampbell added Priority: Normal Status: Needs info Full requirements are not yet known, so implementation should not be started and removed Status: Stale Used by stalebot to clean house labels Dec 1, 2022
@kfcampbell
Copy link
Member

Perhaps we could use separate configuration for GraphQL reads and writes, like:

provider "github" {
  owner = "example-org"
  rest_write_delay_ms = 500 # sleep for 500ms in between requests to the GitHub REST API
  graphql_write_delay_ms = 50 # sleep for 50ms in between GraphQL requests
}

I don't love the idea of leaking implementation details about APIs into the configuration schema, but I'm not sure I have a better idea.

@v-yarotsky
Copy link

I wonder if it would be feasible to convert everything to v3 API, and provide something like an nginx configuration for a self-hosted caching proxy. The caching proxy can easily utilize conditional requests, which could significantly speed up the "refresh" stage, while not hitting the actual API too hard and too fast.
It is unfortunate that the GraphQL API does not support conditional requests.

@stevehipwell
Copy link
Contributor

@kfcampbell are there any plans to address this? The GH REST and GraphQL implementations are significantly different enough to warrant adding specific configuration; it would add complexity but also make the system simpler as the out-of-the box experience could just work with the correct GraphQL delays. The current implementation requires both the knowledge that this is a problem and the knowledge to understand which API you're using in each resource; at which point you'd still need to make a compromise as a single value controls two very different systems.

Also would setting write_delay_ms lower on a large implementation using many resources have an adverse effects where the v3 API is backing resources also being written? It'd be great if there were some docs covering the setting and the context behind it's use?

@kfcampbell
Copy link
Member

@stevehipwell unfortunately GitHub's SDK team doesn't have the bandwidth to pick up this item specifically, though we're always receptive to discussion and PRs that address this issue.

@stevehipwell
Copy link
Contributor

@kfcampbell this might be an aside, but could you explain to me why the GraphQL API is being used in preference to the REST API in this provider? From the outside it looks like a Terraform provider would be better architecturally served by a REST API due to the expected calling patterns.

@kfcampbell
Copy link
Member

The provider is a patchwork of contributions from over 300 people over almost 8 (!) years now, and there hasn't been a consistent owner during that time. When people implement a resource or data source, they pick a choice and it rarely changes after that. If you have opinions on specific resources that should use the REST API, we're very receptive to PRs to change it!

@o-sama
Copy link
Contributor

o-sama commented Jan 18, 2024

Another thing to note is that not every resource has a rest endpoint and so in some cases the GraphQL API is the only way to do something.

@stevehipwell
Copy link
Contributor

@kfcampbell thanks for the information on the history and lack of conscious decision to use GraphQL. @o-sama I can understand why GraphQL has been used in some cases; my issue is that as GraphQL doesn't work correctly it either should have been implemented correctly or it shouldn't have been used for GA level resources. I also think the GitHub naming convention of calling it the v4 API makes it look like it's replacing or an improvement on the v3 REST API, which in turn makes people think they should be using it.

@kfcampbell moving forward it'd be great if you (the provider owners / GitHub) could add a policy for new resources which could then be used as a justification for refactoring the provider for consistency.

@kfcampbell
Copy link
Member

Assuming implementation viability is equivalent (which is a big "if"; the API doesn't have a clear policy on how products are implemented), resource API client implementation decisions should be guided by performance. Generally I think that makes the REST API more desirable.

Please feel free to submit a PR to our contributing.md if you'd like to see it become more formal.

Copy link

👋 Hey Friends, this issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Please add the Status: Pinned label if you feel that this issue needs to remain open/active. Thank you for your contributions and help in keeping things tidy!

@github-actions github-actions bot added the Status: Stale Used by stalebot to clean house label Oct 19, 2024
@stevehipwell
Copy link
Contributor

This isn't stale, at a bare minimum the provider should allow different configuration for the different GH API versions. Maybe something similar to #849 (comment) but focusing on the API version.

provider "github" {
  owner = "example-org"
  write_delay_ms    = 500 # default to 500ms
  v3_write_delay_ms = 500 # sleep for 500ms in between requests to the GitHub REST API
  v4_write_delay_ms = 50 # sleep for 50ms in between GraphQL requests
}

@github-actions github-actions bot removed the Status: Stale Used by stalebot to clean house label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Provider Status: Needs info Full requirements are not yet known, so implementation should not be started Type: Bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

10 participants