Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_route -> Error: error reading Route for Route Table (rtb-xxx) with destination (xx.x.x.x/xx) to become available: couldn't find resource (still) #19985

Closed
alewando opened this issue Jun 28, 2021 · 14 comments · Fixed by #21161
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Milestone

Comments

@alewando
Copy link
Contributor

alewando commented Jun 28, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform version 0.14.11
AWS provider version 3.47.0

Affected Resource(s)

  • aws_route

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "main" {
  cidr_block           = "10.10.0.0/16"
}

resource "aws_subnet" "sn" {
  vpc_id     = aws_vpc.main.id
  cidr_block = "10.10.1.0/24"
}

resource "aws_network_interface" "test" {
  subnet_id       = aws_subnet.sn.id
  private_ips     = ["10.10.1.100"]
}

resource "aws_route_table" "route_table" {
  vpc_id = aws_vpc.main.id  
}

resource "aws_route" "route" {
  route_table_id            = aws_route_table.route_table.id
  destination_cidr_block    = "10.11.0.0/16"
  network_interface_id      = aws_network_interface.test.id

  timeouts {
    create = "5m"
    delete = "5m"
  }
}

Expected Behavior

Route is created successfully and terraform apply completes without issue.

Actual Behavior

Terraform failed with:

module.common_vpc_east.aws_route_table.rt-private-1: Creation complete after 0s [id=rtb-0344e534dd600116a]
module.common_vpc_peering.aws_route.route_east_to_west_peering[0]: Creating...
Error: error waiting for Route in Route Table (rtb-0344e534dd600116a) with destination (xx.xx.xx.xx/xx) to become available: couldn't find resource (21 retries)

  on ../modules/vpc-peering/main.tf line 60, in resource "aws_route" "route_east_1_to_west_1_peering":
  60: resource "aws_route" "route_east_1_to_west_1_peering" {

Steps to Reproduce

19985-recreate.zip

  1. Unzip the attached file
  2. Set AWS credentials
  3. Run recreate.sh - repeatedly creates and destroys resources until it encounters a failure.

Important Factoids

The create timeout value (which has been suggested as a mitigation to prior eventual consistency errors with aws_route) does not seem to be honored (at least not for these post-creation retrievals). I'm not sure if waiting longer would be beneficial or not, though.

References

A possible fix for this went in with #19426, but we are still seeing it with a version of AWS provider (3.47.0) that has the fix applied.

Similar issues:

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ec2 Issues and PRs that pertain to the ec2 service. labels Jun 28, 2021
@circa10a
Copy link

We're getting a similar error with v3.47.0:

Error: error reading Route Table Association (rtbassoc-06b72ef7b55c8264b): Empty result

  on ./modules/aws/terraform-aws-vpc/main.tf line 985, in resource "aws_route_table_association" "public":
 985: resource "aws_route_table_association" "public" {

@matthewmann-RL
Copy link

matthewmann-RL commented Jun 28, 2021

We are also getting a similar error with v3.47.0 when destroying a route table association:

Error: error waiting for Route Table Association (rtbassoc-0a5fee8e3d82348fb) delete: unexpected state 'associated', wanted target ''. last error: %!!(MISSING)s(<nil>)

but if you look in the console, the route table was successfully disasocciated.

@alewando alewando changed the title aws_route -> Error: error reading Route for Route Table (rtb-xxx) with destination (xx.x.x.x/xx): couldn't find resource (still) aws_route -> Error: error reading Route for Route Table (rtb-xxx) with destination (xx.x.x.x/xx) to become available: couldn't find resource (still) Jun 29, 2021
@ewbankkit ewbankkit added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Jun 29, 2021
@gchristidis
Copy link
Contributor

gchristidis commented Jun 30, 2021

We are using the 3.47 version and getting this error during destroy

Error: error waiting for Route Table Association (rtbassoc-0081aefcad37ea14a) delete: unexpected state 'associated', wanted target ''. last error: %!s(<nil>)

The resource is removed from AWs but not from the state file and Terraform exits with the error. Running destroy again refreshes the route table associations find them not there and continues destroying the rest of the VPC resources.

This has happened multiple times.

@alewando
Copy link
Contributor Author

I updated the description above to include a zip file containing a TF file and shell script to reproduce the errors I am seeing. It just repeatedly creates and destroys the TF context until it has a problem.

It sounds like people are having issues with more than just aws_route, but it should be relatively simple to adjust as needed to reproduce their specific issue.

@wpbeckwith
Copy link

We are also getting a similar error with v3.47.0 when destroying a route table association:

Error: error waiting for Route Table Association (rtbassoc-0a5fee8e3d82348fb) delete: unexpected state 'associated', wanted target ''. last error: %!!(MISSING)s(<nil>)

but if you look in the console, the route table was successfully disasocciated.

This is becoming a real issue as it causes the destroy to bail, leaving a lot of VPC resources around to be cleaned up. While doing things manually, it can be fixed by rerunning the terraform destroy, but when doing IAC automated testing the tests can't be rerun, as they are designed to create unique envs, each time. Therefore, someone has to manually go cleanup the env left behind.

@rogerscuall
Copy link

We are also getting a similar error with v3.47.0 when destroying a route table association:

Error: error waiting for Route Table Association (rtbassoc-0a5fee8e3d82348fb) delete: unexpected state 'associated', wanted target ''. last error: %!!(MISSING)s(<nil>)

but if you look in the console, the route table was successfully disasocciated.

This is becoming a real issue as it causes the destroy to bail, leaving a lot of VPC resources around to be cleaned up. While doing things manually, it can be fixed by rerunning the terraform destroy, but when doing IAC automated testing the tests can't be rerun, as they are designed to create unique envs, each time. Therefore, someone has to manually go cleanup the env left behind.

That is exactly our problem too, we had to remove the testing it was leaving behind a lot of stuff.

@wpbeckwith
Copy link

wpbeckwith commented Jul 6, 2021

This is becoming a real issue as it causes the destroy to bail, leaving a lot of VPC resources around to be cleaned up. While doing things manually, it can be fixed by rerunning the terraform destroy, but when doing IAC automated testing the tests can't be rerun, as they are designed to create unique envs, each time. Therefore, someone has to manually go cleanup the env left behind.

That is exactly our problem too, we had to remove the testing it was leaving behind a lot of stuff.

As a hack we added a 2nd terraform destroy step to our IAC tests and if the issue happens then the 2nd destroy will cleanup what's left and if the issue doesn't happen the the extra destroy will add about 5s to the tests to figure out it needs to do nothing.

This fixes the cleanup but the 1st destroy failing marks the test as a failure and there is no way to undo that it seems.

@circa10a
Copy link

circa10a commented Jul 22, 2021

@obourdon does your fix in #20265 also close this issue?

Edit: ah so it looks like the fix only fixes destroys, not creates

Cc @ewbankkit

@jpke
Copy link

jpke commented Aug 5, 2021

Also sometimes seeing something similar when creating aws_route_table_association.

module.hub.aws_route_table_association.main["Gateway_1a"]: Still creating... [2m30s elapsed]
╷
│ Error: error waiting for Route Table Association (rtbassoc-0ef96a2816d891444) create: couldn't find resource (21 retries)
│ 
│   with module.hub.aws_route_table_association.main["Gateway_1a"],
│   on .terraform/modules/hub/vpc.tf line 84, in resource "aws_route_table_association" "main":
│   84: resource "aws_route_table_association" "main"  {

The association exists in the console and matches when queried via the cli:

            "Associations": [
                {
                    "Main": false,
                    "RouteTableAssociationId": "rtbassoc-0ef96a2816d891444",
                    "RouteTableId": "rtb-06b02a6183b44a729",
                    "SubnetId": "subnet-087bfe208f6d5eccc",
                    "AssociationState": {
                        "State": "associated"
                    }
                }

Strange thing is, other aws_route_table_association resources provision fine.

@jpke
Copy link

jpke commented Sep 1, 2021

These issues seem to be transient. I can run the same terraform, and may succeed on the first try, or require several retries to succeed, or may never seem to succeed, but complete successfully on the first try the next day. Maybe this points to a race condition?

I haven't dug into the aws terraform provider codebase much, so just a guess- could there be a bug in querying aws during stateConf.WaitForState()? This method is called on all route table resources, so might explain why we see route_tables, routes and associations all affected. Maybe the first result is cached and not overwritten by subsequent calls?

outputRaw, err := stateConf.WaitForState()

@stimmerman
Copy link

We have the same problem @jpke I see these in the debug logging:

2021-09-10T13:56:52.852Z [INFO]  provider.terraform-provider-aws_v3.58.0_x5: 2021/09/10 13:56:52 [WARN] WaitForState timeout after 1m0s: timestamp=2021-09-10T13:56:52.851Z
2021-09-10T13:56:52.852Z [INFO]  provider.terraform-provider-aws_v3.58.0_x5: 2021/09/10 13:56:52 [WARN] WaitForState starting 30s refresh grace period: timestamp=2021-09-10T13:56:52.851Z
2021-09-10T13:57:22.858Z [INFO]  provider.terraform-provider-aws_v3.58.0_x5: 2021/09/10 13:57:22 [ERROR] WaitForState exceeded refresh grace period: timestamp=2021-09-10T13:57:22.858Z

Strange thing is that we only encounter this when running the plan on TF Agents inside the VPC. When we switch the workspace to 'remote' the plan finishes quick.

Using the remote runner: 1m 18s
Using the agent in the VPC: 16m 49s (with these logs)

What did 'solve' it in our specific use case is enabling the EC2 interface endpoint in the VPC. Somehow it prevents from reaching the WaitForState condition (maybe reduced latency?).

@circa10a
Copy link

circa10a commented Oct 1, 2021

@ryancragun does your recent fix also resolve this issue?

@github-actions
Copy link

github-actions bot commented Oct 8, 2021

This functionality has been released in v3.62.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Jun 3, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants