Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_monitoring_monitored_project exponentially slow and spawns thousands of calls to cloudresourcemanager api #12883

Comments

@danjamesmay
Copy link

danjamesmay commented Oct 26, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.3.3
on linux_amd64

Affected Resource(s)

google_monitoring_monitored_project

Terraform Configuration Files

locals {
    projects = jsondecode(file("projects.json"))
}

resource "google_monitoring_monitored_project" "project" {
    for_each =  toset(local.projects)

    metrics_scope = "my-scoping-project"
    name          = each.value
}

projects.json is a simple json array with project ids

[
  "my-scoped-project"
]

Debug Output

Snippet of output:
https://gist.github.com/danjamesmay/ce3e049831e6f6bb323964717d9f02b6

Panic Output

n/a

Expected Behavior

It appears the Google API supports full CRUD methods for interacting with monitored projects, via project ID or project number. The resource code seems to be doing a lot more than just interacting with this API.

API:

Actual Behavior

Thousands upon thousands of calls to cloudresourcemanager resulting in rate limiting and very slow time to update state when reading and writing changes to the google_monitoring_monitored_project resource.

Steps to Reproduce

When a project has 150+ projects listed, the time it takes for Terraform to check the state of each monitored project becomes slower and slower as rate limiting and exponential backoff limits Terraform's ability to work properly.

  1. terraform plan

Important Factoids

Nothing I can think of out of the ordinary.

References

b/275113183

@danjamesmay danjamesmay changed the title google_monitoring_monitored_project exponentially slow and spawns thousands of calls to cloudresourcemanager api google_monitoring_monitored_project exponentially slow and spawns thousands of calls to cloudresourcemanager api Nov 2, 2022
@melinath melinath removed their assignment Nov 2, 2022
@danjamesmay
Copy link
Author

I found a potentially useful line in the debug output:

2022-11-14T13:26:15.539Z [WARN]  Provider "registry.terraform.io/hashicorp/google" produced an invalid plan for google_monitoring_monitored_project.project["<redacted>"], but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .timeouts: planned for existence but config wants absence

@danjamesmay
Copy link
Author

danjamesmay commented Nov 14, 2022

I set parrelelism=1 in the plan options to serialise the projects as Terraform tries to refresh the state of each monitored project.

Looks like it's doing a call to GET monitoring.googleapis.com/v1/locations/global/metricsScopes/<scoping_project>?alt=json HTTP/1.1 first, then iterating through every single item in the response to do a lookup to GET cloudresourcemanager.googleapis.com/v1/projects/<for_each_key>?alt=json HTTP/1.1 for each key in the resource: google_monitoring_monitored_project.project[<project_id>].

It then seems to call /v1/projects over and over again while it performs the lookup below on the results from /v1/locations/global/metricsScope. When it finds a match it continues to the next key.

2022-11-14T14:04:04.759Z [INFO]  provider.terraform-provider-google_v4.41.0_x5: 2022/11/14 14:04:04 [DEBUG][DCL INFO] Attempting to match <id from v1/projects call> with <id from /v1/locations/global/metricsSceopes>.: timestamp=2022-11-14T14:04:04.759Z

It seems to be a simple change to just store the results of each /v1/projects call in memory while it does the internal lookup, which would save all the thousands of calls and thus rate limiting.

The "time complexity" then increases for the amount of monitored projects as it has to loop through all of them in ascending order to match the individual project that Terraform is trying to refresh the state for. That then makes this really unwieldy when you have hundreds of projects.

For example say you have 100 monitored projects in state then to do a refresh where Terraform checks the state of each one that's 100*100 possible calls to /v1/projects, which then causes rate limiting. I could just set parallelism=1 but it then would take an extreme amount of time to work with this resource.

@etruong42
Copy link

etruong42 commented Aug 15, 2023

I found the issue. It appears to have been added June 2022 or before. However, the trail is extremely difficult to follow because of all the ways the code gets autogenerated. So it may have been added before June 2022, but I am not absolutely sure at this time because the autogenerated code is very hard to follow. For example, this 21 line diff pretty obviously didn't actually cause this 41k line diff

In this diff we can see "GetMonitoredProject" gets called, and within that definition, it can call CloudResourceManager for every monitored project.

I can't find the implementation of GetMonitoredProject, but we can see the same behavior in the current implementation

modular-magician added a commit to modular-magician/terraform-provider-google that referenced this issue Sep 13, 2023
* Fix issue hashicorp#12883

Make one CloudResourceManager call per monitored_project terraform
resource rather than per (monitored_project * "sibling" projects)

* Add comment

* Correct build errors

* Fix build errors

* Add debug logs

* Fix printf

* Fix match between TF resource and API response

* Fix from non-matching to matching condition

* Correct documentation

* Correct documentation

* Fix go build issues

* Convert projectNumber to string

* Use correct string conversion

* Move comment

* Add debug logs to monitored project encoder

* correct delete_url

Signed-off-by: Modular Magician <magic-modules@google.com>
modular-magician added a commit that referenced this issue Sep 13, 2023
* Fix issue #12883

Make one CloudResourceManager call per monitored_project terraform
resource rather than per (monitored_project * "sibling" projects)

* Add comment

* Correct build errors

* Fix build errors

* Add debug logs

* Fix printf

* Fix match between TF resource and API response

* Fix from non-matching to matching condition

* Correct documentation

* Correct documentation

* Fix go build issues

* Convert projectNumber to string

* Use correct string conversion

* Move comment

* Add debug logs to monitored project encoder

* correct delete_url

Signed-off-by: Modular Magician <magic-modules@google.com>
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.