Terragrunt apply fails (could not find aws credentails ) #2730

skkc2 · 2023-09-22T14:59:47Z

Hi All,

We use Terraform and Terragrunt to manage AWS infrastructure. when I run the terragrunt locally it seems fine and no issues in deploying infrastructure but it errors out while deploying through Jenkins as no AWS creds were found and it only happens to some of the folders rest all other services in other folders deploy successfully. it was working fine till a week ago but all of a sudden there is an issue. Not sure what went wrong any suggestions pls?

Previously we used to save .terraform.lock.hcl in SCM along with terragrunt.hcl but we’ve removed in some folders and there is inconsistenyc so we've reinitailised and saved .terraform.lock.hcl in folders. is it causing issues?

Exact Errors

time=2023-09-22T11:41:56Z level=error msg=Module /home/ec2-user/workspace/CI-CD Infrastructure/nft/service-discovery-services has finished with an error: Error finding AWS credentials (did you set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables?): NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors prefix=[/home/ec2-user/workspace/CI-CD Infrastructure/nft/service-discovery-services] 
time=2023-09-22T11:41:59Z level=error msg=Module /home/ec2-user/workspace/CI-CD Infrastructure/nft/rds-config-null-resource has finished with an error: Error finding AWS credentials (did you set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables?): NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors prefix=[/home/ec2-user/workspace/CI-CD Infrastructure/nft/rds-config-null-resource] 
time=2023-09-22T11:42:03Z level=error msg=Module /home/ec2-user/workspace/CI-CD Infrastructure/nft/rds-config-null-resource has finished with an error: Error finding AWS credentials (did you set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables?): NoCredentialProviders: no valid providers in chain. Deprecated.


locals {
account_vars      = read_terragrunt_config(find_in_parent_folders("account.hcl"))
region_vars       = read_terragrunt_config(find_in_parent_folders("region.hcl"))
environment_vars  = read_terragrunt_config(find_in_parent_folders("environment.hcl"))
account_name      = local.account_vars.locals.account_name
account_name_abbr = local.account_vars.locals.account_name_abbr
account_id        = local.account_vars.locals.aws_account_id
aws_region        = local.region_vars.locals.aws_region
environment_name  = local.environment_vars.locals.environment
default_tags = {
  Name        = local.environment_name
  Environment = local.environment_name
  Terraform   = true
}
}

# Generate an AWS provider block
generate "provider" {
path      = "provider.tf"
if_exists = "overwrite_terragrunt"
contents  = <<EOF
provider "aws" {
region = "${local.aws_region}"
# version             = "= 3.30.0"
# Only these AWS Account IDs may be operated on by this template
allowed_account_ids = ["${local.account_id}"]

# default_tags {
#   tags = {
#     Name        = "${local.environment_name}"
#     Environment = "${local.environment_name}"
#     Terraform   = true  
#   }
# }
}
EOF
}

# Configure Terragrunt to automatically store tfstate files in an S3 bucket
remote_state {
backend = "s3"
config = {
  encrypt = true
  bucket  = "tfstate-apps-${local.account_id}-${local.aws_region}"
  key     = "${local.environment_name}/${path_relative_to_include()}/terraform.tfstate"
  region  = local.aws_region
  # dynamodb_table = "terraform-locks"
}
generate = {
  path      = "backend.tf"
  if_exists = "overwrite_terragrunt"
}
}

inputs = merge(
local.account_vars.locals,
local.region_vars.locals,
local.environment_vars.locals,
)

Versions

Terragrunt version: v0.38.7
Terraform version:
Environment details (Ubuntu 20.04, Windows 10, etc.):

Any suggestions please?

The text was updated successfully, but these errors were encountered:

denis256 · 2023-09-22T17:19:52Z

Hello,
I wanted to confirm if was updated Terragrunt version? or it is the same as before?
I suspect that AWS credentials were removed from the env variables used in Jenkins job

skkc2 · 2023-09-26T13:52:06Z

Hi denis256,

Terragrunt and terraform remained same version in local machines and jenkins

I don't think AWS credentails removed if it was it shouldn't execute eny modules but some modules are being executed.

mimadrone · 2023-11-29T22:31:39Z

I've also been encountering this. Jenkins job does run-all init, validate, plan on many directories in parallel, and some of them (not the same ones, not necessarily at the same points in the process) error out saying there are no credentials. I suspect AWS' behavior has changed (rate-limiting maybe?) because the Terragrunt version hasn't. Trying to see if auto-retry for this error helps now.

mimadrone · 2023-11-30T21:10:15Z

Update: auto-retry tuning is dicy. I got it to sometimes work by also setting the number of retries to 5, but occasionally that wasn't enough, so I also increased the delay, and then it started failing the job after only one error. So I haven't been able to come up with a consistent method to avoid this.

skkc2 · 2023-12-01T11:05:04Z

Update: auto-retry tuning is dicy. I got it to sometimes work by also setting the number of retries to 5, but occasionally that wasn't enough, so I also increased the delay, and then it started failing the job after only one error. So I haven't been able to come up with a consistent method to avoid this.

What version of terraform and terragrunt are you using?
Recent version of terraform seems to notice this issue they've rolled out update. I tried with latest version aswell still same.
The only way i could reduce the amount of aws creds issue is by executing the shared directory (10 services i.e,) first and then applications directory (which has 10 + folders init with multiple services).

mimadrone · 2023-12-01T17:27:40Z

Changing auto-retry doesn't seem to work, which probably is because the error Terragrunt surfaces is its own and not caught from elsewhere? I have:

retry_sleep_interval_sec = 10
retryable_errors = [ 
  # Default list
  "(?s).*Failed to load state.*tcp.*timeout.*",
  "(?s).*Failed to load backend.*TLS handshake timeout.*",
  "(?s).*Creating metric alarm failed.*request to update this alarm is in progress.*",
  "(?s).*Error installing provider.*TLS handshake timeout.*",
  "(?s).*Error configuring the backend.*TLS handshake timeout.*",
  "(?s).*Error installing provider.*tcp.*timeout.*",
  "(?s).*Error installing provider.*tcp.*connection reset by peer.*",
  "NoSuchBucket: The specified bucket does not exist",
  "(?s).*Error creating SSM parameter: TooManyUpdates:.*",
  "(?s).*app.terraform.io.*: 429 Too Many Requests.*",
  "(?s).*ssh_exchange_identification.*Connection closed by remote host.*",
  "(?s).*Client\\.Timeout exceeded while awaiting headers.*",
  "(?s).*Could not download module.*The requested URL returned error: 429.*",
  # Tests hit erroneous NoCredentialProviders errors because of some kind of rate limiting AWS-side
  "(?s).*NoCredentialProviders: no valid providers in chain.*",
]

but it doesn't retry at all.

mimadrone · 2023-12-04T17:59:18Z

Contacted AWS support, who told me that they don't publish the throttling/rate limiting numbers because "they're internal" (so, they don't publish the numbers because they don't publish the numbers?) and that Terragrunt should implement a retry with exponential backoff.

The AWS support person indicated that the limit might change at any point, which I suspect means they did recently change it. Experimentally: we've got about 150 modules and we hit a few denials each time; setting TERRAGRUNT_PARALLELISM to 100 seems to prevent the failures, though I haven't got many runs to prove it. UPDATE: no, we see it at 100. I think the limit must be under 70.

gitsstewart · 2023-12-18T17:41:01Z

@denis256 given the latest information, is there anything that should be looked at from this point?

denis256 · 2023-12-18T19:17:06Z

I will do more tests, but so far I have been thinking about:

retries when AWS api errors happens
automatically adjust TERRAGRUNT_PARALLELISM (if not configured) based on number of modules

skkc2 · 2023-12-19T15:48:20Z

I will do more tests, but so far I have been thinking about:
* retries when AWS api errors happens

* automatically adjust `TERRAGRUNT_PARALLELISM` (if not configured) based on number of modules

It would be great help denis, we're facing this issue for a while

denis256 · 2024-01-15T18:56:43Z

Hi,
I wanted to check if the issue still appears after upgrade to https://github.com/gruntwork-io/terragrunt/releases/tag/v0.54.13

skkc2 · 2024-01-16T13:03:17Z

Hi @denis256
unfortunately no, issue still remain.

denis256 · 2024-01-17T19:23:12Z

It is still complicated on my side to reproduce this issue, I tried to setup something in https://github.com/denis256/terragrunt-tests/tree/master/aws-rate-limit but still not getting the same error as reported.

Will be helpful to share an example repository where this error happens.

skkc2 · 2024-01-18T12:28:22Z

Hi denis,
Sorry, I don't have any samples to share due to restrictions. I've seen your sample repo I think having multiple modules of rate1 and each has a similar/somewhat different main.tf would generate this issue. I've ~145 modules.
Thanks

skkc2 · 2024-02-06T14:06:57Z

@denis256

I got this working my issue was resolved by updating the terragurnt and also increasing the RAM. it would be nice that if it highlighted the memory error and also limited the terrragrunt/terraform memory usage.

articles/blogs from online about limiting RAM usage, shows there are good few of them experience this issue because of the modules and providers sizes. the problem is not with the module api calls but with aws provider processes that are heavy because they support lot of aws services at once. In our case for environments like NFT and others which have a lot of resources to deploy which require a lot of provider versions and doing all that stuff at once would require a good bit of RAM. So 8GB would crash terraform.

Any plans in the future to throttle it without breaking it.

skkc2 added the bug Something isn't working label Sep 22, 2023

skkc2 closed this as completed Sep 26, 2023

skkc2 reopened this Sep 26, 2023

skkc2 closed this as completed Dec 1, 2023

skkc2 reopened this Dec 1, 2023

denis256 self-assigned this Dec 18, 2023

skkc2 closed this as completed Dec 19, 2023

skkc2 reopened this Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terragrunt apply fails (could not find aws credentails ) #2730

Terragrunt apply fails (could not find aws credentails ) #2730

skkc2 commented Sep 22, 2023

denis256 commented Sep 22, 2023

skkc2 commented Sep 26, 2023

mimadrone commented Nov 29, 2023

mimadrone commented Nov 30, 2023

skkc2 commented Dec 1, 2023

mimadrone commented Dec 1, 2023 •

edited

mimadrone commented Dec 4, 2023 •

edited

gitsstewart commented Dec 18, 2023

denis256 commented Dec 18, 2023

skkc2 commented Dec 19, 2023

denis256 commented Jan 15, 2024

skkc2 commented Jan 16, 2024

denis256 commented Jan 17, 2024

skkc2 commented Jan 18, 2024

skkc2 commented Feb 6, 2024

Terragrunt apply fails (could not find aws credentails ) #2730

Terragrunt apply fails (could not find aws credentails ) #2730

Comments

skkc2 commented Sep 22, 2023

denis256 commented Sep 22, 2023

skkc2 commented Sep 26, 2023

mimadrone commented Nov 29, 2023

mimadrone commented Nov 30, 2023

skkc2 commented Dec 1, 2023

mimadrone commented Dec 1, 2023 • edited

mimadrone commented Dec 4, 2023 • edited

gitsstewart commented Dec 18, 2023

denis256 commented Dec 18, 2023

skkc2 commented Dec 19, 2023

denis256 commented Jan 15, 2024

skkc2 commented Jan 16, 2024

denis256 commented Jan 17, 2024

skkc2 commented Jan 18, 2024

skkc2 commented Feb 6, 2024

mimadrone commented Dec 1, 2023 •

edited

mimadrone commented Dec 4, 2023 •

edited