Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.google_project.project.project_id randomly becoming null #9509

Closed
atrauzzi opened this issue Jul 5, 2021 · 24 comments
Closed

data.google_project.project.project_id randomly becoming null #9509

atrauzzi opened this issue Jul 5, 2021 · 24 comments
Assignees
Labels

Comments

@atrauzzi
Copy link

atrauzzi commented Jul 5, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

1.0.1

Affected Resource(s)

  • google_project

Expected Behavior

I expect this value to never change unless I switch projects.

Actual Behavior

I'm not sure what triggers it, but I get changes like this:

  # module.non-global-regions["us"].module.regional-worker.google_cloud_run_service.api will be updated in-place
  ~ resource "google_cloud_run_service" "api" {
        id                         = "locations/us-central1/namespaces/my-project-name-here/services/worker-us"
        name                       = "worker-us"
        # (4 unchanged attributes hidden)

      ~ template {

          ~ spec {
                # (4 unchanged attributes hidden)

              ~ containers {
                    # (4 unchanged attributes hidden)

                  ~ env {
                        name  = "CLOUD_TASKS_PROJECT"
                      - value = "my-project-name-here" -> null
                    }

                    # (24 unchanged blocks hidden)
                }
            }
            # (1 unchanged block hidden)
        }

        # (2 unchanged blocks hidden)
    }

Important Factoids

I've added some secrets & values and added a storage hmac resource, but that's it. Obviously nothing should ever cause the project values to become inconsistent between deploys within the same project. Yet for some reason it's becoming null.

@atrauzzi atrauzzi added the bug label Jul 5, 2021
@edwardmedia edwardmedia self-assigned this Jul 5, 2021
@edwardmedia
Copy link
Contributor

@atrauzzi can you have a config sample with resources to repro the issue?

@atrauzzi
Copy link
Author

atrauzzi commented Jul 5, 2021

Absolutely not as I have no clue what the trigger condition is for this. It's totally unpredictable as to what's causing it.

In this situation, I think the best thing I can do is offer to meet with you on a screen share to go through things. This is going to take some intuition.

@atrauzzi
Copy link
Author

atrauzzi commented Jul 5, 2021

I'm also getting this if I apply the change and then do a another run:

│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.non-global-regions["us"].module.api.google_cloud_run_service.api to include new values learned so far during apply,
│ provider "registry.terraform.io/hashicorp/google" produced an invalid new value for .template[0].spec[0].containers[0].env[11].value: was null, but now
│ cty.StringVal("my-project-name-here").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.non-global-regions["us"].module.regional-worker.google_cloud_run_service.api to include new values learned so far
│ during apply, provider "registry.terraform.io/hashicorp/google" produced an invalid new value for .template[0].spec[0].containers[0].env[11].value: was
│ null, but now cty.StringVal("my-project-name-here").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

Something is definitely not right here. But again, I get different results between different runs, so it's impossible to pin down.

@edwardmedia
Copy link
Contributor

edwardmedia commented Jul 5, 2021

@atrauzzi the data source uses below code to retrieve project which is shared with many other resources & data sources. Is it a timing issue? Do you own the module? Can you add depend on and sleep logic etc. ?

func getProject(d TerraformResourceData, config *Config) (string, error) {

@atrauzzi
Copy link
Author

atrauzzi commented Jul 6, 2021

Gave that a shot, no dice. It's still generating the changes I noted above as well as the message about the provider providing an inconsistent plan.

@edwardmedia
Copy link
Contributor

@atrauzzi it's hard to say for me without the complete config available. From the plan you provided above, it seems the new value can't be detected. Did you say it is provided by a data source? Can you try hard-coding it to see what happens?

                  ~ env {
                        name  = "CLOUD_TASKS_PROJECT"
                      - value = "my-project-name-here" -> null
                    }

@atrauzzi
Copy link
Author

atrauzzi commented Jul 7, 2021

Hardcoding gets around the issue it seems. It's weird, the problem doesn't come up 100% of the time, sometimes I'll run again and it doesn't happen.

I still feel like in some way, the google provider be having some internal state issues with that data module.

@atrauzzi
Copy link
Author

atrauzzi commented Jul 7, 2021

Also interesting: I reverted the change from hard-coding, back to using the data module's value, and I get no pending changes.

Again, something is not right here, and even Terraform is saying it's an issue in the provider.

I'll offer again, if you want to do a screenshare to go over my templates, more than happy!

@edwardmedia
Copy link
Contributor

@atrauzzi I believe the issue is in the module as your hard-coding test proves that is the case. Let me know if you are able to repro it with isolated resources

@atrauzzi
Copy link
Author

atrauzzi commented Jul 9, 2021

What do you mean by isolated resources?

@edwardmedia
Copy link
Contributor

@atrauzzi I meant using resources only (not modules). In that way we can isolate the issues within the provider.

@atrauzzi
Copy link
Author

atrauzzi commented Jul 13, 2021

I'm not sure I understand what you are saying right now. This bug could very well be caused by the fact that the data provider is being used in multiple modules.

Let's do a quick screen share and I'll be happy to let you poke around. You can use your familiarity with the provider to zero in on what's going on. I cannot post the entirety of it publicly.

I have no way to intuitively tell what is causing this bug, and I think you need to be realistic in how much you expect me to do to isolate the issue without you looking into things a little yourself.

(I'm trying to report a bug.)

@edwardmedia
Copy link
Contributor

@megan07 what do you think about this issue?

@megan07
Copy link
Contributor

megan07 commented Jul 19, 2021

@edwardmedia - thanks for taking a look!

Hi @atrauzzi ! I'm sorry you're experiencing this issue. Thank you for reaching out! In general, we prefer to see at least a snippet of the configuration so we can see exactly which datasources/resources we're dealing with. Any debug logs would definitely be beneficial as well. We don't need your entire configuration, but I think the important thing would be to know what exactly you're interpolating into .template[0].spec[0].containers[0].env[11].value and how that is derived (is it an isolated datasource, or the output of a module? and if it's an output of a module, how is that output derived?).

Thanks!

@atrauzzi
Copy link
Author

@megan07 - I'm not sure how to produce that value or what it even is really. None of those names look familiar to me. If there's something specific that I can give you the output from - ideally anonymized - feel free to send me commands to run or files to cat here. 🙂

@megan07
Copy link
Contributor

megan07 commented Jul 20, 2021

@atrauzzi oh! sorry! I can help with that :)
And yes, anonymized is great!

So, for the debug logs, you can take a look here and this should help us a lot!
Documentation: https://www.terraform.io/docs/internals/debugging.html
Tutorial: https://learn.hashicorp.com/tutorials/terraform/troubleshooting-workflow#enable-terraform-logging

As for sending us a snippet of your configuration, you'll have to go through your terraform files and find where that particular resource is set. I would do a search over your files for "google_cloud_run_service" "api" and see if you can find it. Does that help?

@atrauzzi
Copy link
Author

atrauzzi commented Jul 21, 2021

Well I know what I'm putting in there. It's the value from the data-module. My initial problem description has the answers to your questions.

That's the problem that I'm trying to report here. The value from the google project data-module is behaving unpredictably. Sometimes it will generate a change, and other times it won't.

This data-module is so basic and has no inputs, I have no influence over it. I'm simply using it to read it the current project from the environment.

Even Terraforms error is pointing out that the project data-module is behaving inconsistently. That ought to be plenty to go off of here.

Just going to ask this once, but is there any chance we can move past the basic "try to blame it on the user" triage? At the very least, give me something specific besides general community links, which are not specific steps. Asking me for the terraform snippet in this situation is disingenuous as I'm literally defining the data-module and then just trying to consume it.

If I knew how this was happening, I'd almost invariably stumble upon the solution. I think a screen share would work best for this one, so I'll offer it again.

@megan07
Copy link
Contributor

megan07 commented Jul 23, 2021

Hi @atrauzzi, I'm sorry for this misunderstanding. The reason we ask for you to share your configuration is because the first step in our process is for us to reproduce the issue that you're seeing. Without your configuration, we can't see everything that is at play.

Debug logs are equally as helpful, if not more. Would you please send us some debug logs so we can see what value is being returned from the API and what might be happening behind the scenes? Thanks!

@atrauzzi
Copy link
Author

@megan07 - Is there anywhere I can post them privately?

@megan07
Copy link
Contributor

megan07 commented Aug 2, 2021

Hi @atrauzzi - Yes! You can use the PGP key published on our Security page or Keybase to encrypt your logs, then attach them here. Thanks!

@atrauzzi
Copy link
Author

atrauzzi commented Aug 3, 2021

Oy... this is getting ridiculously convoluted. I don't use any of that.

Is there any reason why we can't just do a screen share and I can show this problem happening? Or maybe I can just email one of you my .tf files? It would be a lot quicker and less hassle.

@megan07
Copy link
Contributor

megan07 commented Aug 3, 2021

I'm sorry it seems convoluted. Unfortunately, those are your options for sharing your debug logs with us per our workflow. We like to keep all of our communications public for the benefit of other users. This is an open source project and support is offered on a best effort basis. If you're a Terraform Cloud or Enterprise customer, you may open a ticket with HashiCorp support and they will help you collect the information needed to troubleshoot your issue.

@atrauzzi atrauzzi closed this as completed Aug 3, 2021
@atrauzzi
Copy link
Author

atrauzzi commented Aug 3, 2021

Nothing says we can't summarize the call in an update here.

I think this ticket will serve better at this point of how this revolving door triage process can create blind spots. I can assure you something is wrong here with such a simple data provider.

But I have no clue how to make you guys aware of it.

@github-actions
Copy link

github-actions bot commented Sep 3, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants