Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaitForState() leaves Refresh() running in the background on timeout #530

Open
enool opened this issue Aug 11, 2020 · 1 comment
Open

WaitForState() leaves Refresh() running in the background on timeout #530

enool opened this issue Aug 11, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@enool
Copy link

enool commented Aug 11, 2020

SDK version

v1.13.1-1

Relevant provider source code

Taken from: https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_iam_role.go#L152

var createResp *iam.CreateRoleOutput
err := resource.Retry(30*time.Second, func() *resource.RetryError {
    var err error
    createResp, err = iamconn.CreateRole(request)                <-- Has internally a retry loop, can block more then 30 seconds
    // IAM users (referenced in Principal field of assume policy)
    // can take ~30 seconds to propagate in AWS
    if isAWSErr(err, "MalformedPolicyDocument", "Invalid principal in policy") {
        return resource.RetryableError(err)
    }
    return resource.NonRetryableError(err)
})
if isResourceTimeoutError(err) {                                <-- Goroutine started in Retry (WaitForState) can still be running
    createResp, err = iamconn.CreateRole(request)               <-- Issues another blocking CreateRole
}

Debug Output

[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[DEBUG] [aws-sdk-go] DEBUG: Send Request iam/CreateRole failed, attempt 0/25, error RequestError: send request failed
[DEBUG] [aws-sdk-go] DEBUG: Retrying Request iam/CreateRole, attempt 1
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[WARN] WaitForState timeout after 30s
[WARN] WaitForState starting 30s refresh grace period
[DEBUG] [aws-sdk-go] DEBUG: Send Request iam/CreateRole failed, attempt 1/25, error RequestError: send request failed
[DEBUG] [aws-sdk-go] DEBUG: Retrying Request iam/CreateRole, attempt 2
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[ERROR] WaitForState exceeded refresh grace period
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:

Expected Behavior

resource.Retry() blocks until callback finishes - even on timeout

Actual Behavior

Callback is still running when timeout happens. For example, AWS provider issues another, parallel, CreateRole(). In many cases this results double creation attempt, and eventually a failure in the plugin.

Error: Error creating IAM Role hello-world-ssm_role: EntityAlreadyExists: Role with name hello-world-ssm_role already exists.
status code: 409, request id: removed

on main.tf line 18, in resource "aws_iam_role" "ssm_role":
18: resource "aws_iam_role" "ssm_role"

References

We have been running a crude patch ( #529 ) in production for a few weeks with good results.

@kconley-sq
Copy link

I believe I'm seeing this issue as well in the context of the AWS provider (v3.75.2, which I understand uses terraform-plugin-sdk v2.10.1) creating an EC2 instance:

07:01:16.405
2022-07-22T14:00:44.338Z [INFO] provider.terraform-provider-aws_v3.75.2_x5: 2022/07/22 14:00:44 [DEBUG] Waiting for state to become: [success]: timestamp=2022-07-22T14:00:44.337Z
07:01:16.522
module.ec2_dev.aws_instance.default[0]: Still creating... [10s elapsed]
07:01:16.582
module.ec2_dev.aws_instance.default[0]: Still creating... [20s elapsed]
07:01:16.644
module.ec2_dev.aws_instance.default[0]: Still creating... [30s elapsed]
07:01:23.998
module.ec2_dev.aws_instance.default[0]: Still creating... [40s elapsed]
07:01:33.999
module.ec2_dev.aws_instance.default[0]: Still creating... [50s elapsed]
07:01:44.000
module.ec2_dev.aws_instance.default[0]: Still creating... [1m0s elapsed]
07:01:54.000
module.ec2_dev.aws_instance.default[0]: Still creating... [1m10s elapsed]
07:02:04.001
module.ec2_dev.aws_instance.default[0]: Still creating... [1m20s elapsed]
07:02:14.002
module.ec2_dev.aws_instance.default[0]: Still creating... [1m30s elapsed]
07:02:24.003
module.ec2_dev.aws_instance.default[0]: Still creating... [1m40s elapsed]
07:02:34.004
module.ec2_dev.aws_instance.default[0]: Still creating... [1m50s elapsed]
07:02:44.004
module.ec2_dev.aws_instance.default[0]: Still creating... [2m0s elapsed]
07:02:44.339
2022-07-22T14:02:44.339Z [INFO] provider.terraform-provider-aws_v3.75.2_x5: 2022/07/22 14:02:44 [WARN] WaitForState timeout after 2m0s: timestamp=2022-07-22T14:02:44.338Z
07:02:44.417
2022-07-22T14:02:44.339Z [INFO] provider.terraform-provider-aws_v3.75.2_x5: 2022/07/22 14:02:44 [WARN] WaitForState starting 30s refresh grace period: timestamp=2022-07-22T14:02:44.339Z
07:02:54.006
module.ec2_dev.aws_instance.default[0]: Still creating... [2m10s elapsed]
07:03:04.006
module.ec2_dev.aws_instance.default[0]: Still creating... [2m20s elapsed]
07:03:14.008
module.ec2_dev.aws_instance.default[0]: Still creating... [2m30s elapsed]
07:03:14.339
2022-07-22T14:03:14.339Z [INFO] provider.terraform-provider-aws_v3.75.2_x5: 2022/07/22 14:03:14 [ERROR] WaitForState exceeded refresh grace period: timestamp=2022-07-22T14:03:14.339Z
07:03:15.789
2022-07-22T14:03:15.789Z [INFO] provider.terraform-provider-aws_v3.75.2_x5: 2022/07/22 14:03:15 [INFO] Instance ID: ELIDED: timestamp=2022-07-22T14:03:15.789Z
07:03:15.850
2022-07-22T14:03:15.789Z [INFO] provider.terraform-provider-aws_v3.75.2_x5: 2022/07/22 14:03:15 [DEBUG] Waiting for instance (ELIDED) to become running: timestamp=2022-07-22T14:03:15.789Z
07:03:15.966
2022-07-22T14:03:15.789Z [INFO] provider.terraform-provider-aws_v3.75.2_x5: 2022/07/22 14:03:15 [DEBUG] Waiting for state to become: [running]: timestamp=2022-07-22T14:03:15.789Z
07:03:24.010
module.ec2_dev.aws_instance.default[0]: Still creating... [2m40s elapsed]
07:03:27.194
module.ec2_dev.aws_instance.default[0]: Creation complete after 2m45s [id=ELIDED]

From this Terraform run we saw that two EC2 instances were created in our AWS console. 😮 I think that indicates that an EC2 instance creation request was still running when the WaitForState timeout occurred and then a second EC2 instance creation request was submitted, and then both of those requests ultimately succeeded.

I'm hoping that #828 (which I understand copies #529 but fixes the unit tests) can be reviewed soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants