New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openstack creating secgroup timeout #8819
Comments
I am assuming that this is a timeout issue since the Create function (https://github.com/hashicorp/terraform/blob/master/builtin/providers/openstack/resource_openstack_compute_secgroup_v2.go#L96) of the secgroup provider is missing this stateConf like in the secgroup delete function: https://github.com/hashicorp/terraform/blob/master/builtin/providers/openstack/resource_openstack_compute_secgroup_v2.go#L225 I could not find where it may be pulling in a default timeout, nor does the error point to a timeout issue. Openstack logs show the pipe breaking and no other telling errors. |
It does sound like a timeout of some sort, but I agree, the error doesn't indicate that. I've run into the EOFs before -- I'll have to check past issues to refresh my memory.
Can you elaborate on this more? If you're creating five security groups, maybe the Nova API endpoint is hitting a bottleneck. Can you try creating 1 group, then 2 etc etc and see if there's a consistent number where the creation fails? Can you also try running |
I have a complex terraform file that I am using to stand up a mesos cluster. Once we encountered the error I created a simple terraform file to do some testing so that we could narrow down the issue. I found that creating one security group is successful but creating multiple 5x was not. I just used the same security group settings as those shown above, renaming them via increasing numbers. Running with parallelism in the same test file from my debug output was successful. Another thing to note is that when terraform fails, openstack shows the security groups as being created. But terraform has no state of that and will not destroy them. |
OK, so this definitely sounds like the Nova API endpoint is being overloaded. But...
Yes, this is due to the lack of safely checking to see if the security group was created or not. Definitely a bug. Thanks for reporting this. :) |
Ah - I found where I've run into the EOF / secgroup error before: So it sounds like in this case, even though EOF is happening, the security group is still being created? I wonder if it's safe enough to pass on EOF in this case... Are you able to alter the security group resource to pass on the EOF such as how it's done in the instance resource, then build from source and test? It'd definitely help since you have an environment that can easily trigger this. No big deal if not. |
I shall give a best attempt but really make no promises as I am not proficient in golang but have been wanting to learn. Trial by fire perhaps? |
Sounds like a plan :) But do let me know if you aren't able to get the patch in place. I can make one up for you to test in the next day or so and I'd even be happy to compile a linux binary so you can skip that part, too. |
The error appears to be hitting here https://github.com/hashicorp/terraform/blob/master/builtin/providers/openstack/resource_openstack_compute_secgroup_v2.go#L116 Changing that if statement to:
Still results in the same error. I have tried adding in more from that block that you provided but am not getting through. |
That's interesting... the crash/error output still points to that same line? Try adding this: fmt.Printf("[DEBUG] foobar error output: %#v", err) And then search for "foobar" for easy grepping and see what the err looks like in its entirety. |
That was a great idea:
We think that we have discovered that the 30s timeout is coming from Nova like you had suggested. We are making a change tonight to see if that gets us through. I will update tomorrow with a status. |
After our change we are still seeing the same issue. We had found that nova had a setting for url_timeout=30, which matched up with our 30s timeout terraform side (https://access.redhat.com/solutions/2150241). Upping that timeout to 60s still had terraform failing at 30s during the creation of security groups. After our change I no longer see the 403 error. Just seeing the other error mentioned:
|
Darn. Is there a helpful error message if you do: log.Printf("%s", err.Error()) |
No extra info in that error message:
To make sure that I am coding how you think, here is what I am doing, line 116: if err != nil && err.Error() != "EOF" {
log.Printf("[DEBUG] foobar error output: %s", err.Error())
return fmt.Errorf("Error creating OpenStack security group: %s", err)
} |
That looks correct to me. :) |
Oh, hold on. Is it correct to say the entire error message is:
And not simply
? If so, try: eof := strings.Contains("EOF", err.Error())
if err != nil && ! eof { |
I was able to use your code to get around the EOF error, I just had to flip around the Contains statement:
However Terraform then had a panic. Before posting I did one more build and added in another log output line after the if statement we are breaking on to see if we got any further. We actually appear to be. |
Oops - right. The crash log is reporting that it's making it to line 120. It's possible that the I think the correct fix here might be to add a StateChangeConf in the Create. I can take a look at doing that. Another possible workaround is defining security groups via the Neutron API -- maybe that'll help sort things out? |
Actually, scratch that. The error is happening in Create and that's even before one can check on the status of the create request. This still sounds like something funny is going on with Nova API -- especially if running the requests one at a time works out. Definitely check out the api logs and also see if switching to the Keep me in the loop, though - hopefully we can get this one resolved. :) |
I agree, we found the bug that is causing our slowness actually. Then because openstack is slow we have timeouts throughout the stack (haproxys, api's) at default (30s). That is my current working theory. I was thinking that it was terraform timing out because I could not track down where it got its timeouts from, just thought it was set default somewhere to 30s. Unless you want to keep this open for the safely checking on the create I am good to close this out. I can also open another more specific issue for the safely checking. |
Let's keep this one open. Thank you for all of your help with this! |
No no, Thank YOU for all your help :) |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
When creating multiple security groups with the Openstack provider I ran into a issue where terraform would fail after 30 seconds.
Affected Resource(s)
Please list the resources as a list, for example:
Terraform Configuration Files
I have 5 created security groups in a test file, just increased the number
Debug Output
https://gist.github.com/ChiefAlexander/87875d431c5eaeedee699c8340ba47cc
Expected Behavior
Terraform should not have exited its build
Actual Behavior
Terraform quits after 30s of trying to create the security groups
Steps to Reproduce
Please list the steps required to reproduce the issue, for example:
terraform apply
Important Factoids
Our Openstack instance has a known issue of being slow to create security groups
The text was updated successfully, but these errors were encountered: