The provider is slow when creating/destroying resources #929

zhenrong-wang · 2023-11-27T11:18:17Z

Hi opentofu developers,

I am switching my workload from terraform to opentofu. When I use 1.6.0-alpha5 and aliyun cloud provider 1.213.0, the provider is really slow to orchestrate the resources.

I opened the DEBUG mode, and it turns out the following message is streaming out continously. It usually takes ~100 seconds before the start of creating/destroying resources.

Is there anything wrong with the provider or openTofu ? Thanks a lot!

2023-11-27T19:09:42.665+0800 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0 pid=30472
2023-11-27T19:09:42.665+0800 [DEBUG] provider: plugin exited
2023-11-27T19:09:42.665+0800 [DEBUG] created provider logger: level=debug
2023-11-27T19:09:42.665+0800 [INFO]  provider: configuring client automatic mTLS
2023-11-27T19:09:42.675+0800 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0 args=[".terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0"]
2023-11-27T19:09:42.676+0800 [DEBUG] provider: plugin started: path=.terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0 pid=30479
2023-11-27T19:09:42.676+0800 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0
2023-11-27T19:09:42.762+0800 [INFO]  provider.terraform-provider-alicloud_v1.213.0: configuring server automatic mTLS: timestamp="2023-11-27T19:09:42.761+0800"
2023-11-27T19:09:42.793+0800 [DEBUG] provider.terraform-provider-alicloud_v1.213.0: plugin address: address=/tmp/plugin3192086639 network=unix timestamp="2023-11-27T19:09:42.793+0800"
2023-11-27T19:09:42.793+0800 [DEBUG] provider: using plugin: version=5
2023-11-27T19:09:42.925+0800 [DEBUG] No provider meta schema returned
2023-11-27T19:09:43.056+0800 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-11-27T19:09:43.059+0800 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0 pid=30479
2023-11-27T19:09:43.059+0800 [DEBUG] provider: plugin exited
2023-11-27T19:09:43.059+0800 [DEBUG] created provider logger: level=debug
2023-11-27T19:09:43.060+0800 [INFO]  provider: configuring client automatic mTLS
2023-11-27T19:09:43.068+0800 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0 args=[".terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0"]
2023-11-27T19:09:43.069+0800 [DEBUG] provider: plugin started: path=.terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0 pid=30486
2023-11-27T19:09:43.069+0800 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.opentofu.org/aliyun/alicloud/1.213.0/linux_amd64/terraform-provider-alicloud_v1.213.0
2023-11-27T19:09:43.158+0800 [INFO]  provider.terraform-provider-alicloud_v1.213.0: configuring server automatic mTLS: timestamp="2023-11-27T19:09:43.158+0800"
2023-11-27T19:09:43.188+0800 [DEBUG] provider: using plugin: version=5
2023-11-27T19:09:43.189+0800 [DEBUG] provider.terraform-provider-alicloud_v1.213.0: plugin address: address=/tmp/plugin1095469596 network=unix timestamp="2023-11-27T19:09:43.188+0800"

The text was updated successfully, but these errors were encountered:

kislerdm · 2023-11-27T11:59:31Z

@zhenrong-wang Hey Zhenrong! Thanks for raising the issue.

It usually takes ~100 seconds before the start of creating/destroying resources.

I guess that the refresh step is accountable for majority of that duration which is expected for large state files, because many API calls are required to be made over network to define the drift between the state file and the real infra.

Could you please share the following:

What was the typical/expected operation's duration for that specific infra config before? If you used terraform before, which tf version and which provider's version did you use?
How does the refresh time change if you rerun the plan operation?
Could you please share the output of the commands:
- TF_LOG=TRACE tofu init
- TF_LOG=TRACE tofu plan

Thanks!

zhenrong-wang · 2023-11-27T13:02:56Z

Thanks @kislerdm for your reply.

Here is my project HPC-NOW. It depends on Terraform or openTofu to orchestrate cloud resources.

1. Versions:

openTofu - 1.6.0-alpha5, with the provider terraform-provider-alicloud_1.213.0_linux_amd64.zip
Terraform - 1.6.2, with the provider terraform-provider-alicloud_1.203.0_linux_amd64.zip

2. How did I run:

Instead of running terraform/openTofu command directly, the HPC-NOW project uses a "wrapper" to run in the hpcopr CLI. Therefore, I built 2 versions of hpcopr CLI, one uses openTofu, one uses Terraform. The provider keeps the same.

I didn't wrap the plan command of tofu/terraform into the hpcopr CLI - only init, apply and destroy.

Then, I run the command hpcopr init -b and hpcopr destroy -b with the 2 versions of CLI. Both of them should create and destroy the same resource stack in the cloud (aliyun).

The hpcopr init -b contains both terraform/tofu init and terraform/tofu apply process
The hpcopr destroy -b equals to the terraform/tofu destroy process

3. The Results:

a. hpcopr with openTofu:

Creating a stack took 440 seconds.

Destroying the stack took 380 seconds.

b. hpcopr with Terraform

Creating a stack took 161 seconds.

Destroying the stack took 108 seconds

4. The DEBUG logs

The logs of creating and destroying process are saved in 1 file. Sorry that I forgot to rename the first log when finished creating the stack, so the destroying process appended logs to the same file.

hpcopr with openTofu

tofu.log

hpcopr with Terraform
terraform.log

5. Summary:

Creating a stack: openTofu - 440 seconds, Terraform - 168 seconds
Destroying a stack: openTofu - 380 seconds, Terraform - 108 seconds

My local network remained unchanged during the 2 tests. From the DEBUG log of openTofu, it seems the provider doesn't work smoothly as expected.

Looking forward to your support, thanks so much!

Zhenrong

kislerdm · 2023-11-27T13:14:12Z

Hey Zhenrong! Thanks for your prompt reply!

Did I get it right, the provider version used with OpenTofu differs from the one used with terraform? If it's the case, could you please try to rerun your flow using identical provider versions, e.g. "terraform-provider-alicloud_1.203.0_linux_amd64.zip" in both cases? Thanks!

zhenrong-wang · 2023-11-27T13:17:53Z

Hey Zhenrong! Thanks for your prompt reply!

Did I get it right, the provider version used with OpenTofu differs from the one used with terraform? If it's the case, could you please try to rerun your flow using identical provider versions, e.g. "terraform-provider-alicloud_1.203.0_linux_amd64.zip" in both cases? Thanks!

Hi @kislerdm ,

The reason why I upgraded the provider version from 1.203.0 to the latest one (1.213.0) for openTofu, is because the same scenario (low speed and wait 100 secs to start) occurs in version 1.203.0.

Sure I can rerun the test in minutes. But the situation will probably be the same. Please wait minutes.

zhenrong-wang · 2023-11-27T14:04:50Z

Hey Zhenrong! Thanks for your prompt reply!
Did I get it right, the provider version used with OpenTofu differs from the one used with terraform? If it's the case, could you please try to rerun your flow using identical provider versions, e.g. "terraform-provider-alicloud_1.203.0_linux_amd64.zip" in both cases? Thanks!

Hi @kislerdm ,

The reason why I upgraded the provider version from 1.203.0 to the latest one (1.213.0) for openTofu, is because the same scenario (low speed and wait 100 secs to start) occurs in version 1.203.0.

Sure I can rerun the test in minutes. But the situation will probably be the same. Please wait minutes.

Hi @kislerdm
Here are the results with provider version 1.203.0:

Creating a stack took 446 seconds.
Destroying the stack took 338 seconds.
Logs are here.

tofu-1.203.0-provider.log

Thanks a lot!

kislerdm · 2023-11-27T14:20:33Z

@zhenrong-wang Hey! Thanks a lot for sharing the details!

Could you please share the logs for the verbosity level TRACE (as it was requested in the first comment) to help us with identification of root cause: TF_LOG=trace tofu apply? Thanks!

zhenrong-wang · 2023-11-27T15:05:46Z

@zhenrong-wang Hey! Thanks a lot for sharing the details!

Could you please share the logs for the verbosity level TRACE (as it was requested in the first comment) to help us with identification of root cause: TF_LOG=trace tofu apply? Thanks!

Sure, Let me modify the wrapper to generate a new group of logs.

In order to compare, I will keep on using the provider version 1.203.0.

Please wait minutes.

zhenrong-wang · 2023-11-27T15:45:20Z

@zhenrong-wang Hey! Thanks a lot for sharing the details!
Could you please share the logs for the verbosity level TRACE (as it was requested in the first comment) to help us with identification of root cause: TF_LOG=trace tofu apply? Thanks!

Sure, Let me modify the wrapper to generate a new group of logs.

In order to compare, I will keep on using the provider version 1.203.0.

Please wait minutes.

Hi @kislerdm

Just ran the test again. Creating the stack took 408 seconds; and destroying it took 405 seconds.

Here are the logs. I generated 2 files. Please take a look.

tofu-trace-creating-1.203.0-provider.log

tofu-trace-destroying-1.203.0-provider.log

Hope it helps. Thanks!

kislerdm · 2023-11-27T16:40:20Z

Hi @zhenrong-wang! Thanks a lot for your collaboration!

May I kindly ask to confirm if your setup was identical expect for binary your executed, i.e. tofu vs. terraform? In order words, did you ran tofu/terraform apply/destroy commands on the same machine in the same VPC/availability zone/region?

Also, would it be possible to share the logs after running the TF_LOG=trace terraform apply command, so we could compare tofu vs. terraform side-by-side? Thanks!

For context, we have a couple of guesses and would like to verify them, but it'd take us extra time in order to reproduce the issue because we don't have experience with the hpcopr "wrapper" you used, neither do we have a lot of experience with ali cloud.

Thank you very much for your support! 🙏🏻

zhenrong-wang · 2023-11-27T16:52:28Z

Hi @zhenrong-wang! Thanks a lot for your collaboration!

May I kindly ask to confirm if your setup was identical expect for binary your executed, i.e. tofu vs. terraform? In order words, did you ran tofu/terraform apply/destroy commands on the same machine in the same VPC/availability zone/region?

Also, would it be possible to share the logs after running the TF_LOG=trace terraform apply command, so we could compare tofu vs. terraform side-by-side? Thanks!

For context, we have a couple of guesses and would like to verify them, but it'd take us extra time in order to reproduce the issue because we don't have experience with the hpcopr "wrapper" you used, neither do we have a lot of experience with ali cloud.

Thank you very much for your support! 🙏🏻

Sure, I will run the terraform version and post the log here.

One thing is for sure: only the binary (executable) changed. All other elements kept unchanged.

In terms of the "wrapper", it is just another way the hpcopr CLI use to run terraform/tofu commands. Nothing different from running terraform/tofu directly.

Please wait for my log with terraform.

Thanks!

zhenrong-wang · 2023-11-27T17:38:22Z

Hi @zhenrong-wang! Thanks a lot for your collaboration!
May I kindly ask to confirm if your setup was identical expect for binary your executed, i.e. tofu vs. terraform? In order words, did you ran tofu/terraform apply/destroy commands on the same machine in the same VPC/availability zone/region?
Also, would it be possible to share the logs after running the TF_LOG=trace terraform apply command, so we could compare tofu vs. terraform side-by-side? Thanks!
For context, we have a couple of guesses and would like to verify them, but it'd take us extra time in order to reproduce the issue because we don't have experience with the hpcopr "wrapper" you used, neither do we have a lot of experience with ali cloud.
Thank you very much for your support! 🙏🏻

Sure, I will run the terraform version and post the log here.

One thing is for sure: only the binary (executable) changed. All other elements kept unchanged.

In terms of the "wrapper", it is just another way the hpcopr CLI use to run terraform/tofu commands. Nothing different from running terraform/tofu directly.

Please wait for my log with terraform.

Thanks!

terraform-trace-creating-1.203.0-provider.log
terraform-trace-destroying-1.203.0-provider.log

@kislerdm Please check this out. For this time with terraform, destroying the stack took 240 seconds, a bit longer; while creating it took 137 seconds.

Here is how the hpcopr run terraform/tofu. I used -parallelism=1000 to guarantee the concurrency.

kislerdm · 2023-11-27T18:12:49Z

@zhenrong-wang Thank you very much for supporting us with additional details - we appreciate your contribution a lot! We will continue digging into the issue on our side tomorrow, and will keep you posted. Thanks!

zhenrong-wang · 2023-12-04T10:46:28Z

@zhenrong-wang Thank you very much for supporting us with additional details - we appreciate your contribution a lot! We will continue digging into the issue on our side tomorrow, and will keep you posted. Thanks!

Hi @kislerdm

Just tested 1.6.0-beta1 with the same version of alicloud provider. The problem seems to be still there. It cost 410 seconds to create the stack.

Yantrio · 2023-12-04T12:31:35Z

Hi @zhenrong-wang, the proposed fix for this (#954) did not make it into beta1. Beta2 is coming really soon (hopefully within the next hour) which should resolve this issue for you.

Thanks for your patience on this one.

zhenrong-wang · 2023-12-04T12:35:13Z

Hi @zhenrong-wang, the proposed fix for this (#954) did not make it into beta1. Beta2 is coming really soon (hopefully within the next hour) which should resolve this issue for you.

Thanks for your patience on this one.

Thanks for your reply. Looking forward to the next beta version.

cube2222 · 2023-12-04T13:36:07Z

This should be fixed @zhenrong-wang. I'll be closing this issue, but feel free to reopen if you hit any issues please.

zhenrong-wang · 2023-12-05T02:17:03Z

This should be fixed @zhenrong-wang. I'll be closing this issue, but feel free to reopen if you hit any issues please.

Thanks opentofu team! With beta2 version, the problem reported in this issue got resolved.

Creating an HPC stack in Alicloud is as smooth as terraform now.

Fantastic job! We are step closer to switch to openTofu.

@cube2222

kislerdm added the question Further information is requested label Nov 27, 2023

Yantrio assigned kislerdm Nov 27, 2023

zhenrong-wang changed the title ~~The provider is slow to creating/destroying resources~~ The provider is slow when creating/destroying resources Nov 27, 2023

zhenrong-wang mentioned this issue Nov 28, 2023

[FAILED] Failed to run aws-provider-4.64.0 with openTofu #930

Closed

cube2222 added this to the First Stable Release (1.6.0) milestone Nov 29, 2023

This was referenced Nov 30, 2023

The execution of tofu plan and tofu apply becomes stuck #944

Closed

Fix global schema caching #954

Merged

cube2222 closed this as completed Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The provider is slow when creating/destroying resources #929

The provider is slow when creating/destroying resources #929

zhenrong-wang commented Nov 27, 2023 •

edited

kislerdm commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023 •

edited

kislerdm commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023 •

edited

kislerdm commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023

kislerdm commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023 •

edited

kislerdm commented Nov 27, 2023

zhenrong-wang commented Dec 4, 2023

Yantrio commented Dec 4, 2023

zhenrong-wang commented Dec 4, 2023

cube2222 commented Dec 4, 2023

zhenrong-wang commented Dec 5, 2023

The provider is slow when creating/destroying resources #929

The provider is slow when creating/destroying resources #929

Comments

zhenrong-wang commented Nov 27, 2023 • edited

kislerdm commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023 • edited

kislerdm commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023 • edited

kislerdm commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023

kislerdm commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023

zhenrong-wang commented Nov 27, 2023 • edited

kislerdm commented Nov 27, 2023

zhenrong-wang commented Dec 4, 2023

Yantrio commented Dec 4, 2023

zhenrong-wang commented Dec 4, 2023

cube2222 commented Dec 4, 2023

zhenrong-wang commented Dec 5, 2023

zhenrong-wang commented Nov 27, 2023 •

edited

zhenrong-wang commented Nov 27, 2023 •

edited

zhenrong-wang commented Nov 27, 2023 •

edited

zhenrong-wang commented Nov 27, 2023 •

edited