`google_monitoring_monitored_project` exponentially slow and spawns thousands of calls to cloudresourcemanager api #12883

danjamesmay · 2022-10-26T16:04:07Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.3.3
on linux_amd64

Affected Resource(s)

google_monitoring_monitored_project

Terraform Configuration Files

locals {
    projects = jsondecode(file("projects.json"))
}

resource "google_monitoring_monitored_project" "project" {
    for_each =  toset(local.projects)

    metrics_scope = "my-scoping-project"
    name          = each.value
}

projects.json is a simple json array with project ids

[
  "my-scoped-project"
]

Debug Output

Snippet of output:
https://gist.github.com/danjamesmay/ce3e049831e6f6bb323964717d9f02b6

Panic Output

n/a

Expected Behavior

It appears the Google API supports full CRUD methods for interacting with monitored projects, via project ID or project number. The resource code seems to be doing a lot more than just interacting with this API.

API:

Actual Behavior

Thousands upon thousands of calls to cloudresourcemanager resulting in rate limiting and very slow time to update state when reading and writing changes to the google_monitoring_monitored_project resource.

Steps to Reproduce

When a project has 150+ projects listed, the time it takes for Terraform to check the state of each monitored project becomes slower and slower as rate limiting and exponential backoff limits Terraform's ability to work properly.

terraform plan

Important Factoids

Nothing I can think of out of the ordinary.

References

b/275113183

The text was updated successfully, but these errors were encountered:

danjamesmay · 2022-11-14T13:29:01Z

I found a potentially useful line in the debug output:

2022-11-14T13:26:15.539Z [WARN]  Provider "registry.terraform.io/hashicorp/google" produced an invalid plan for google_monitoring_monitored_project.project["<redacted>"], but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .timeouts: planned for existence but config wants absence

danjamesmay · 2022-11-14T14:19:56Z

I set parrelelism=1 in the plan options to serialise the projects as Terraform tries to refresh the state of each monitored project.

Looks like it's doing a call to GET monitoring.googleapis.com/v1/locations/global/metricsScopes/<scoping_project>?alt=json HTTP/1.1 first, then iterating through every single item in the response to do a lookup to GET cloudresourcemanager.googleapis.com/v1/projects/<for_each_key>?alt=json HTTP/1.1 for each key in the resource: google_monitoring_monitored_project.project[<project_id>].

It then seems to call /v1/projects over and over again while it performs the lookup below on the results from /v1/locations/global/metricsScope. When it finds a match it continues to the next key.

2022-11-14T14:04:04.759Z [INFO]  provider.terraform-provider-google_v4.41.0_x5: 2022/11/14 14:04:04 [DEBUG][DCL INFO] Attempting to match <id from v1/projects call> with <id from /v1/locations/global/metricsSceopes>.: timestamp=2022-11-14T14:04:04.759Z

It seems to be a simple change to just store the results of each /v1/projects call in memory while it does the internal lookup, which would save all the thousands of calls and thus rate limiting.

The "time complexity" then increases for the amount of monitored projects as it has to loop through all of them in ascending order to match the individual project that Terraform is trying to refresh the state for. That then makes this really unwieldy when you have hundreds of projects.

For example say you have 100 monitored projects in state then to do a refresh where Terraform checks the state of each one that's 100*100 possible calls to /v1/projects, which then causes rate limiting. I could just set parallelism=1 but it then would take an extreme amount of time to work with this resource.

etruong42 · 2023-08-15T15:36:38Z

I found the issue. It appears to have been added June 2022 or before. However, the trail is extremely difficult to follow because of all the ways the code gets autogenerated. So it may have been added before June 2022, but I am not absolutely sure at this time because the autogenerated code is very hard to follow. For example, this 21 line diff pretty obviously didn't actually cause this 41k line diff

In this diff we can see "GetMonitoredProject" gets called, and within that definition, it can call CloudResourceManager for every monitored project.

I can't find the implementation of GetMonitoredProject, but we can see the same behavior in the current implementation

* Fix issue hashicorp#12883 Make one CloudResourceManager call per monitored_project terraform resource rather than per (monitored_project * "sibling" projects) * Add comment * Correct build errors * Fix build errors * Add debug logs * Fix printf * Fix match between TF resource and API response * Fix from non-matching to matching condition * Correct documentation * Correct documentation * Fix go build issues * Convert projectNumber to string * Use correct string conversion * Move comment * Add debug logs to monitored project encoder * correct delete_url Signed-off-by: Modular Magician <magic-modules@google.com>

* Fix issue #12883 Make one CloudResourceManager call per monitored_project terraform resource rather than per (monitored_project * "sibling" projects) * Add comment * Correct build errors * Fix build errors * Add debug logs * Fix printf * Fix match between TF resource and API response * Fix from non-matching to matching condition * Correct documentation * Correct documentation * Fix go build issues * Convert projectNumber to string * Use correct string conversion * Move comment * Add debug logs to monitored project encoder * correct delete_url Signed-off-by: Modular Magician <magic-modules@google.com>

github-actions · 2023-10-14T02:00:21Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

danjamesmay added the bug label Oct 26, 2022

edwardmedia assigned melinath Oct 26, 2022

danjamesmay changed the title ~~google_monitoring_monitored_project exponentially slow and spawns thousands of calls to cloudresourcemanager api~~ google_monitoring_monitored_project exponentially slow and spawns thousands of calls to cloudresourcemanager api Nov 2, 2022

melinath removed their assignment Nov 2, 2022

danjamesmay mentioned this issue Dec 1, 2022

feature request: parallelism parameter for resources with count hashicorp/terraform#14258

Open

melinath added forward/linked service/monitoring-services labels Mar 24, 2023

melinath mentioned this issue Sep 6, 2023

Fix issue #12883 GoogleCloudPlatform/magic-modules#8738

Merged

5 tasks

hao-nan-li closed this as completed in GoogleCloudPlatform/magic-modules#8738 Sep 13, 2023

This was referenced Sep 13, 2023

Fix issue #12883 hashicorp/terraform-provider-google-beta#6259

Merged

Fix issue #12883 #15828

Merged

modular-magician mentioned this issue Sep 13, 2023

Fix issue #12883 GoogleCloudPlatform/terraform-google-conversion#1425

Merged

5 tasks

github-actions bot locked as resolved and limited conversation to collaborators Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`google_monitoring_monitored_project` exponentially slow and spawns thousands of calls to cloudresourcemanager api #12883

`google_monitoring_monitored_project` exponentially slow and spawns thousands of calls to cloudresourcemanager api #12883

danjamesmay commented Oct 26, 2022 •

edited by melinath

Loading

danjamesmay commented Nov 14, 2022

danjamesmay commented Nov 14, 2022 •

edited

Loading

etruong42 commented Aug 15, 2023 •

edited

Loading

github-actions bot commented Oct 14, 2023

google_monitoring_monitored_project exponentially slow and spawns thousands of calls to cloudresourcemanager api #12883

google_monitoring_monitored_project exponentially slow and spawns thousands of calls to cloudresourcemanager api #12883

Comments

danjamesmay commented Oct 26, 2022 • edited by melinath Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

danjamesmay commented Nov 14, 2022

danjamesmay commented Nov 14, 2022 • edited Loading

etruong42 commented Aug 15, 2023 • edited Loading

github-actions bot commented Oct 14, 2023

`google_monitoring_monitored_project` exponentially slow and spawns thousands of calls to cloudresourcemanager api #12883

`google_monitoring_monitored_project` exponentially slow and spawns thousands of calls to cloudresourcemanager api #12883

danjamesmay commented Oct 26, 2022 •

edited by melinath

Loading

danjamesmay commented Nov 14, 2022 •

edited

Loading

etruong42 commented Aug 15, 2023 •

edited

Loading