ThrottlingException: Rate exceeded #1344

Arisfx · 2022-02-07T13:13:27Z

Hi team, do you know how we can avoid the rate exceeded error?

Scanned states (7)      
ThrottlingException: Rate exceeded
        status code: 400, request id: 0474b16c-faee-402a-bf01-1e2a7c005714

The text was updated successfully, but these errors were encountered:

sundowndev · 2022-02-07T13:37:55Z

Hi @Arisfx, the rate limit error can occur when your cloud have a huge amount of resources, even when not managed by Terraform. This is part of known limitations of driftctl. Are you running in deep mode ? If yes, can you consider running in non-deep mode instead ? Note driftctl will no longer be able to show drifts in attributes. If you've identified which resource(s) is causing this, you can try to ignore this particular resource type using the driftignore or the filter flag. If none of those solutions fit your needs, could you give further details on your use case ?

Thanks 🙏🏻

Arisfx · 2022-02-10T14:19:38Z

Hi @sundowndev thank you for your response, seems if i run it several times, eventually it does the job, may i ask you if its possible to combine several filters? for example how can i filter for aws_route53_zone and aws_route53_record at the same time? Unfortunately, more than one filter commands can not be combined.
Thanks!

sundowndev · 2022-02-10T14:27:58Z

I think what you're looking for is a driftignore file. You will find proper examples in the docs : https://docs.driftctl.com/0.20.0/usage/filtering/driftignore

How can i filter for aws_route53_zone and aws_route53_record at the same time?

# Search for drifts except for aws_route53_zone and aws_route53_record
aws_route53_zone.*
aws_route53_record.*

# Ignore all drifts except for aws_route53_zone and aws_route53_record
*
!aws_route53_zone.*
!aws_route53_record.*

Does that help ?

Arisfx · 2022-02-10T14:51:13Z

thank you for your response, perhaps i didnt make my self clear, i meant that we need to scan for only these 2 resources, not to exclude them :)

eliecharra · 2022-02-10T16:06:48Z

thank you for your response, perhaps i didnt make my self clear, i meant that we need to scan for only these 2 resources, not to exclude them :)

You can do this with the second snippet of code that @sundowndev has posted above

# Ignore all drifts except for aws_route53_zone and aws_route53_record
*
!aws_route53_zone
!aws_route53_record

The first wildcard makes sure that we switch to an ignore everything mode but what is prefixed with !

This will save you a lot of API call and can definitively help with rate limit issues

brunzefb · 2022-04-25T21:18:41Z

@sundowndev Just ran into the throttling issue as well with driftctl. Created a support call with AWS to increase API allowed rate. They told me it would be too many rates to increase, and ask the authors to implement an exponential backoff when making AWS calls that hit the Throttling exception. While this may kill the performance of the tool, maybe that does not matter so much -- especially if you are running it as a cron job once a day.

From AWS Support:

We would generally suggest that API calls should be made with a retry and exponential backoff in order to gracefully handle throttling when it occurs [2]. When narrowing down to calls from your IAM user around the reported times, I see a very aggressive call rate which suggests to me that this tool is not implementing such a backoff and retry strategy, or if it is, it is not retrying enough, or is not backing off enough. This strategy should work well with supported providers.

aroes · 2022-05-17T15:07:27Z

This is an issue for me as well, I agree that a retry and backoff strategy should be implemented as @brunzefb 's AWS support suggests. Getting a full overview of a large account is almost impossible since the tool exits as soon as it runs into the error.

gmaghera · 2022-06-06T21:46:35Z

Neither ignore nor retries seem to address this issue directly.

Would it be possible to break what driftclt does into batches? And perhaps give the user control over batch sizes and pause before moving to the next batch?

eliecharra · 2022-06-07T09:22:50Z

Retry will address this issue. When we encounter a rate limit issue, we'll create an exponential backoff retry loop so requests will be postponed and the scan will take longed but will not be interrupted anymore. @moadibfr Is working on that, but we are also currently splitting up the enumeration from driftctl in a separate go module for a better separation of concern so it'll take time for the retry on rate limit mechanism to be implemented.

Would it be possible to break what driftclt does into batches.

That sounds complicated because the goal of driftctl is to enumerate resources, so you cannot batch a list if you do not have the list yet. We can think of another batching logic by using resources types for example, you can achieve this manually with the driftignore file, look my answer above.

We are aware that this is a very important pain for many of you and this rate limit issues is definitively on our plate 🙏🏻

brunzefb · 2022-06-13T02:11:22Z

I think how long the program runs is less important, so backoff/retry is a good thing. We are running driftctl on an EKS cluster with a Python wrapper as a pod launched by a cronjob. So if it takes an hour to run, it does not matter, if you run it every 12h. The wrapper compares the driftctl json output to expected output, and emails if there are diffs. We then have a stern talk with those AWS console users that did not use terraform for making the changes. I mostly care about IAM and security group changes, and if you limit driftctl to those, it is generally not API rate limited.

Best,
F.

gmaghera · 2022-10-13T21:42:47Z

Do you have an idea where solving this is on your roadmap?

moadibfr · 2022-10-17T09:45:53Z

hey @gmaghera we identified how we could improve that but it is tied to the extraction and update of the enumeration in driftctl.
Unfortunately, I don't think we have information about when it could happen.
Maybe @sjourdan has more insight and could clarify this.

gmaghera · 2022-10-17T18:20:22Z

Thank you for the update @moadibfr.

BTW, we moved over to using CloudQuery's drift measurement, because of the throttling issue. But they decided to stop supporting drift measurement, for reasons not known to me.

You have a special tool on your hands -- Hashicorp only recently announced drift detection support. With throttling handled, driftctl would be a sweet, sweet enterprise-level tool (it is already, albeit with some limitations).

eliecharra · 2022-10-18T09:15:20Z

Very valuable feedback @gmaghera thanks 🙏🏻

We are very sorry that we could not share any status update on that 😟
We are currently in a complicated context regarding driftctl, the company behind it (cloudskiff) has been acquired one year ago and now our focus is currently not on actively improving driftctl.
Also unfortunately we made some changes that had put driftctl in a state where it's kinda complicated to work on for newcomers, so giving that issue to the community does not sounds like an decent option.

We'll keep you updated as soon as we could on that 🙏🏻

b0bu · 2022-10-27T17:26:10Z

Similar issue here but for Scanned resources and only from "within" AWS. When running from my laptop I don't get the issue but when running from ec2 or as part of a codebuild project it consistently fails for a single region with a relatively empty account 100 resources or so maybe less. Again only from within AWS. Any ideas?

Scanned states(3)
Scanned resources
    ThrottlingException: Rate exceeded
    status code: 400, request id: 259b7f44-6e33-431f-9435-dac2a30e2db6

johnalotoski · 2023-02-03T04:32:30Z

Using cpulimit may help to slow the rate of API calls down by throttling the CPU usage of the app as a whole.

Example, limit CPU usage to 25%:

cpulimit -l 25 driftctl scan --from tfstate://*.tfstate

Using cgroups would work better than this I think at the cost of being a bit more involved to apply.

Just playing around with cpulimit a little bit, limiting to about 5% limit on my machine doubles the scan time and stops the throttle errors. Going any lower (like all the way to 1%) causes some aws authentication errors to start being thrown, presumably because the app doesn't respond quickly enough for some of the handshakes or API flows.

So it seems like there is a sweet spot with something as simple as cpulimit to help with this -- at least for my machine anyway.

bshramin · 2023-02-20T21:14:04Z

Although the main issue is definitely not solved, there are a couple of helpful flags here that you can use to limit the scope you want to monitor in order to avoid the rate limit exception.

https://docs.driftctl.com/next/usage/cmd/scan-usage/

drem-darios · 2023-03-08T23:46:36Z

I was able to get past this error by implementing exponential backoff in the repository that was triggering the throttle exception. In my case, it was API Gateway limits I was hitting. You can see here: https://docs.aws.amazon.com/apigateway/latest/developerguide/limits.html that API Gateway allows 5 requests every 2 seconds per account for GetResources and I was hitting that limit pretty frequently. There is also a 10 request per second limit across all API Gateway management operations. To work around those limits, I added some code to the api_gateway_repository.go file that would exponentially backoff the requests in the case that we received a "TooManyRequestsException" error. I set the bar at 2 seconds since that was the limit we were hitting. Also, I had to add this logic to every function making a request to API Gateway since any of them could trigger the total operation limitation. (e.g. GetRestApisPage reaches the limit then a call to say GetAccount will trigger the throttle). Here is the logic for the GetRestApisPages as an example.

  const MaxRetries = 5
  
  if err != nil {
	  retries := 0
	  retry := true
  
	  for retry && retries < MaxRetries {
		  sleepTime := time.Duration(math.Pow(2, float64(retries))) * 2 * time.Second
		  logrus.Warn("Error caught during GetRestApisPages! Attempt number ", retries+1, "/", MaxRetries, ". Retrying after sleeping for ", sleepTime, "...")
		  time.Sleep(sleepTime)
		  logrus.Debug("Awake! Attempting to make GetRestApisPages call again.")
		  err = r.client.GetRestApisPages(&input,
			  func(resp *apigateway.GetRestApisOutput, lastPage bool) bool {
				  restApis = append(restApis, resp.Items...)
				  return !lastPage
			  },
		  )
		  if err != nil && strings.Contains(err.Error(), "TooManyRequestsException") {
			  retry = true
		  } else {
			  retry = false
		  }
  
		  retries++
	  }
  }

To reduce duplicate code, I implemented a function


func retryOnFailure(callback func() error, message string) error {
	retries := 0
	retry := true

	var err error
	for retry && retries < MaxRetries {
		sleepTime := time.Duration(math.Pow(2, float64(retries))) * 2 * time.Second
		logrus.Warn(message, "Attempt number ", retries+1, "/", MaxRetries, ". Retrying after sleeping for ", sleepTime, "...")
		time.Sleep(sleepTime)
		logrus.Debug("Awake! Attempting to make API call again.")

		err = callback()
		if err != nil && strings.Contains(err.Error(), "TooManyRequestsException") {
			retry = true
		} else {
			retry = false
		}

		retries++
	}
	return err
}

and now I can check for error on the first call, then go into exponential backoff if there was an error

if err != nil {
		err = retryOnFailure(func() error {
			logrus.Debug("Making a call to get rest APIs not found in cache")
			err = r.client.GetRestApisPages(&input,
				func(resp *apigateway.GetRestApisOutput, lastPage bool) bool {
					restApis = append(restApis, resp.Items...)
					return !lastPage
				},
			)
			return err
		}, "Error caught during GetRestApisPages!")
	}

I'm happy to contribute this code to the project if everyone thinks it will be helpful. This logic should probably be implemented in other places/repositories too...

herrsergio · 2023-09-12T16:46:30Z

Hi, I have a similar issue. I am executing driftctl in a subdirectory with its own Terraform backend state.

driftctl scan --only-managed

Using Terraform state tfstate+s3://XXXXX/XXXXX/XXXXX/terraform.tfstate found in terraform-backend.tf. Use the --from flag to specify another state file.
INFO[0001] Start reading IaC
Scanned states (1)
INFO[0003] Start scanning cloud provider


TooManyRequestsException: Too Many Requests
{
  RespMetadata: {
    StatusCode: 429,
    RequestID: "6ca8acef-5412-402d-8825-a72c10a15f77"
  },
  Message_: "Too Many Requests"
}

nsballmann · 2024-01-10T19:10:52Z

This doesn't seem to be related to the number of resources within the Terraform state. Even with just 12 resources I unfortunately constantly run into this issue in our CI despite prefixing driftctl with cpulimit --limit=5 --include-children -- (on 16 core machines inside Alpine containers).

Furthermore the Terraform state is not S3 hosted. So these connections don't even count towards the rate limit.

This happens so often that I already added allow_failure: true to the scan job, so the scan jobs don't block the GitLab MRs. 😬

valdestron · 2024-02-07T10:19:15Z

Regards this issue, could driftctl implement some caching mechanism and/or retry itself?
I think it would be nice if drifctl cache the state data until it gets error, then if its started again from the outer retry, it would pick up the cache and continue on tfstate files that is left.

It would be very convenient when you run

drifctl --from tfstate+s3://fileone --from tfstate+s3://filetwo --from tfstate+s3://filethree ....

could have a flag

driftctl --cached --from tfstate+s3://fileone --from tfstate+s3://filetwo --from tfstate+s3://filethree ....

Cached flag would have default:

1minute cache ? - maybe something more smarter dont know
key hash all the arguments combined

this way when scan failed on the --from tfstate+s3://filetwo resource scans it could retry from where it left, or if there is no internal retry, it could pick up from where its left.

nsballmann · 2024-02-07T14:21:46Z

@valdestron shortly after my post I discovered this heading: https://github.com/snyk/driftctl?tab=readme-ov-file#this-project-is-now-in-maintenance-mode-we-cannot-promise-to-review-contributions-please-feel-free-to-fork-the-project-to-apply-any-changes-you-might-want-to-make which makes me think that driftctl development has stopped and it's time to replace it where ever we use it. Unfortunately, I haven't found a suitable successor for my use cases, yet.

Arisfx added the kind/bug Something isn't working label Feb 7, 2022

jangraefen mentioned this issue Feb 16, 2023

Driftctl ThrottlingException: Rate exceeded on AWS #1629

Closed

yevgenypats mentioned this issue Feb 26, 2023

CloudQuery Integration #1635

Open

wbeuil mentioned this issue Mar 15, 2023

Throttling #1639

Closed

drem-darios mentioned this issue Mar 18, 2023

Added exponential backoff of calls to api gateway api #1645

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThrottlingException: Rate exceeded #1344

ThrottlingException: Rate exceeded #1344

Arisfx commented Feb 7, 2022 •

edited

Loading

sundowndev commented Feb 7, 2022 •

edited

Loading

Arisfx commented Feb 10, 2022

sundowndev commented Feb 10, 2022

Arisfx commented Feb 10, 2022

eliecharra commented Feb 10, 2022

brunzefb commented Apr 25, 2022 •

edited

Loading

aroes commented May 17, 2022

gmaghera commented Jun 6, 2022

eliecharra commented Jun 7, 2022

brunzefb commented Jun 13, 2022

gmaghera commented Oct 13, 2022

moadibfr commented Oct 17, 2022

gmaghera commented Oct 17, 2022 •

edited

Loading

eliecharra commented Oct 18, 2022

b0bu commented Oct 27, 2022

johnalotoski commented Feb 3, 2023 •

edited

Loading

bshramin commented Feb 20, 2023

drem-darios commented Mar 8, 2023

herrsergio commented Sep 12, 2023

nsballmann commented Jan 10, 2024

valdestron commented Feb 7, 2024

nsballmann commented Feb 7, 2024

ThrottlingException: Rate exceeded #1344

ThrottlingException: Rate exceeded #1344

Comments

Arisfx commented Feb 7, 2022 • edited Loading

sundowndev commented Feb 7, 2022 • edited Loading

Arisfx commented Feb 10, 2022

sundowndev commented Feb 10, 2022

Arisfx commented Feb 10, 2022

eliecharra commented Feb 10, 2022

brunzefb commented Apr 25, 2022 • edited Loading

aroes commented May 17, 2022

gmaghera commented Jun 6, 2022

eliecharra commented Jun 7, 2022

brunzefb commented Jun 13, 2022

gmaghera commented Oct 13, 2022

moadibfr commented Oct 17, 2022

gmaghera commented Oct 17, 2022 • edited Loading

eliecharra commented Oct 18, 2022

b0bu commented Oct 27, 2022

johnalotoski commented Feb 3, 2023 • edited Loading

bshramin commented Feb 20, 2023

drem-darios commented Mar 8, 2023

herrsergio commented Sep 12, 2023

nsballmann commented Jan 10, 2024

valdestron commented Feb 7, 2024

nsballmann commented Feb 7, 2024

Arisfx commented Feb 7, 2022 •

edited

Loading

sundowndev commented Feb 7, 2022 •

edited

Loading

brunzefb commented Apr 25, 2022 •

edited

Loading

gmaghera commented Oct 17, 2022 •

edited

Loading

johnalotoski commented Feb 3, 2023 •

edited

Loading