Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform 1.3.6 crashes when using optional types for list(map(any)) #32396

Closed
rajatvig opened this issue Dec 15, 2022 · 9 comments · Fixed by #32454
Closed

Terraform 1.3.6 crashes when using optional types for list(map(any)) #32396

rajatvig opened this issue Dec 15, 2022 · 9 comments · Fixed by #32454
Assignees
Labels
bug config confirmed a Terraform Core team member has reproduced this issue cty Use in conjunction with "upstream" when cty is the relevant upstream explained a Terraform Core team member has described the root cause of this issue in code

Comments

@rajatvig
Copy link

Terraform Version

1.3.6

Terraform Configuration Files

Using a module with a nested type like

taints                      = optional(list(map(any)), [])

Debug Output

Runs in TF Cloud

Expected Behavior

Works in TF 1.3.1

Actual Behavior

Terraform Crashed

Steps to Reproduce

Plan crashes

Additional Context

!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

inconsistent list element types (cty.Object(map[string]cty.Type{"auto_repair":cty.Bool, "auto_upgrade":cty.Bool, "disk_size_gb":cty.Number, "disk_type":cty.String, "enable_integrity_monitoring":cty.Bool, "enable_secure_boot":cty.Bool, "image_type":cty.String, "initial_count":cty.Number, "labels":cty.Map(cty.String), "machine_type":cty.String, "max_count":cty.Number, "max_pods_per_node":cty.Number, "max_surge":cty.Number, "max_unavailable":cty.Number, "metadata":cty.Map(cty.String), "min_count":cty.Number, "min_cpu_platform":cty.String, "name":cty.String, "node_locations":cty.List(cty.String), "node_metadata":cty.String, "preemptible":cty.Bool, "service_account":cty.String, "tags":cty.List(cty.String), "taints":cty.List(cty.Map(cty.String)), "version":cty.String}) then cty.Object(map[string]cty.Type{"auto_repair":cty.Bool, "auto_upgrade":cty.Bool, "disk_size_gb":cty.Number, "disk_type":cty.String, "enable_integrity_monitoring":cty.Bool, "enable_secure_boot":cty.Bool, "image_type":cty.String, "initial_count":cty.Number, "labels":cty.Map(cty.String), "machine_type":cty.String, "max_count":cty.Number, "max_pods_per_node":cty.Number, "max_surge":cty.Number, "max_unavailable":cty.Number, "metadata":cty.Map(cty.String), "min_count":cty.Number, "min_cpu_platform":cty.String, "name":cty.String, "node_locations":cty.List(cty.String), "node_metadata":cty.String, "preemptible":cty.Bool, "service_account":cty.String, "tags":cty.List(cty.String), "taints":cty.List(cty.Map(cty.DynamicPseudoType)), "version":cty.String}))
goroutine 58 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
/usr/local/go/src/runtime/debug/stack.go:16 +0x19
github.com/hashicorp/terraform/internal/logging.PanicHandler()
/home/circleci/project/project/internal/logging/panic.go:55 +0x153
panic({0x21e4380, 0xc000d420a0})
/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/zclconf/go-cty/cty.ListVal({0xc002030400, 0x8, 0x1?})
/home/circleci/go/pkg/mod/github.com/zclconf/go-cty@v1.12.1/cty/value_init.go:166 +0x42e
github.com/zclconf/go-cty/cty.transform({0x0?, 0x0, 0x0}, {{{0x2b6dc50?, 0xc0011934f0?}}, {0x20f31c0?, 0xc0010e45e8?}}, {0x2b55020, 0xc00012f1a0})
/home/circleci/go/pkg/mod/github.com/zclconf/go-cty@v1.12.1/cty/walk.go:175 +0xf4a
github.com/zclconf/go-cty/cty.TransformWithTransformer(...)
/home/circleci/go/pkg/mod/github.com/zclconf/go-cty@v1.12.1/cty/walk.go:130
github.com/hashicorp/hcl/v2/ext/typeexpr.(*Defaults).Apply(0xc000d0fa40?, {{{0x2b6dc50?, 0xc0011934f0?}}, {0x20f31c0?, 0xc0010e45e8?}})
/home/circleci/go/pkg/mod/github.com/hashicorp/hcl/v2@v2.15.0/ext/typeexpr/defaults.go:38 +0xa7
github.com/hashicorp/terraform/internal/terraform.prepareFinalInputVariableValue({{0xc000c3e200, 0x1, 0x1}, {{}, {0xc000c13bd0, 0xa}}}, 0xc001e17a58, 0xc000f5bba0)
/home/circleci/project/project/internal/terraform/eval_variable.go:140 +0xaed
github.com/hashicorp/terraform/internal/terraform.(*nodeModuleVariable).evalModuleVariable(0xc000982720?, {0x2b84138?, 0xc00029d420?}, 0x8?)
/home/circleci/project/project/internal/terraform/node_module_variable.go:239 +0x545
github.com/hashicorp/terraform/internal/terraform.(*nodeModuleVariable).Execute(0xc000982720, {0x2b84138, 0xc00029d420}, 0x4)
/home/circleci/project/project/internal/terraform/node_module_variable.go:152 +0xd0
github.com/hashicorp/terraform/internal/terraform.(*ContextGraphWalker).Execute(0xc000ebd950, {0x2b84138, 0xc00029d420}, {0x7fd3ff8b1000, 0xc000982720})
/home/circleci/project/project/internal/terraform/graph_walk_context.go:136 +0xc2
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk.func1({0x23b2e20, 0xc000982720})
/home/circleci/project/project/internal/terraform/graph.go:74 +0x2f0
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0xc000982780, {0x23b2e20, 0xc000982720}, 0xc00173ef40)
/home/circleci/project/project/internal/dag/walk.go:381 +0x2f6
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
/home/circleci/project/project/internal/dag/walk.go:304 +0xf65
Operation failed: failed running terraform plan (exit 11)

References

No response

@rajatvig rajatvig added bug new new issue not yet triaged labels Dec 15, 2022
@liamcervante
Copy link
Member

Hi @rajatvig, thanks for the report! I'm having difficulty reproducing this with only the provided attribute definition.

Could you share a more complete reproduction example? I think I mainly need to see exactly what you are trying to pass into this variable within the module.

I do suspect that what you're trying to do is likely not going to be possible, but Terraform should do a better job of telling you why instead of simply crashing. I can provide more detail about a potential workaround when I see the values you are providing. Thanks!

@liamcervante liamcervante added waiting for reproduction unable to reproduce issue without further information and removed new new issue not yet triaged labels Dec 15, 2022
@apparentlymart
Copy link
Member

Looking at the two types mentioned in the message I see that the key difference here is in the "taints" attribute, which has a different attribute type in each case:

  • cty.List(cty.Map(cty.String))
  • cty.List(cty.Map(cty.DynamicPseudoType))

However, I think the more important observation here is that neither of these list element types is a map and neither can possibly convert to a map, because the attributes of these object types have mutually-incompatible attribute types. Therefore I think the expected behavior here would have been for Terraform to report that there's no possible type to substitute for any in this type constraint when converting these object types into map types.

I suspect it's crashing rather than returning that error because Terraform is trying to apply the default values in a bottom-up fashion before doing the final type conversion, and so whereas normally the type conversion functions would've caught this early and complained about the incomplete map type it's instead not reaching that point and is instead failing inside the default value insertion process.

I think the following would be the correct type constraint for the result that this example seems to be aiming to achieve:

  type = list(object({
    auto_repair                 = bool
    auto_upgrade                = bool
    disk_size_gb                = number
    disk_type                   = string
    enable_integrity_monitoring = bool
    enable_secure_boot          = bool
    image_type                  = string
    initial_count               = number
    labels                      = map(string)
    machine_type                = string
    max_count                   = number
    max_pods_per_node           = number
    max_surge                   = number
    max_unavailable             = number
    metadata                    = map(string)
    min_count                   = number
    min_cpu_platform            = string
    name                        = string
    node_locations              = list(string)
    node_metadata               = string
    preemptible                 = bool
    service_account             = string
    tags                        = list(string)
    taints                      = map(list(string))
    version                     = string
  }))

The solution to this bug will be to make Terraform correctly report that the given value is invalid for the given type constraint, so it'll still be necessary to use the above corrected type constraint to get an actually working configuration even after this issue is closed. You may wish to also mark some or all of those attributes as optional if your goal is to allow callers to omit some of them.

@rajatvig
Copy link
Author

@apparentlymart The type for the module is correctly set but it is a map(list(any)) with a default. It is not a map(list(string)) which it seems is what TF is trying to interpret.

taints = optional(list(map(any)), [])

That we pass a value of

taints = [
  {
    key    = "etsy.com/nodepool"
    value  = "envoy"
    effect = "NO_SCHEDULE"
  },
]

@liamcervante Please note that this works on 1.3.1 and we have it working on that release. It does not work however on 1.3.6 or 1.3.5.

The module works on TF 1.3.6 when we do not change the default but crashes when we pass a value.

@apparentlymart
Copy link
Member

apparentlymart commented Dec 15, 2022

Hi @rajatvig,

I'm sorry I misunderstood your report and thought that list(map(any)) was the type of the overall variable rather than just the taints attribute.

The value you've shown in your comment is a map(list(string)), but I suspect you must have another object in this list which omits the taints attribute altogether and so Terraform is trying to insert it as taints = [] to match your default, and that's failing here because there aren't any elements for Terraform to use to guess what element type you meant. (The default value [] would convert as list(map(<unknown-type>)) because there aren't any elements to use to infer the type.)

Now that I've re-read the report and understood the problem better, I agree that this ought to work and that Terraform should be able to notice that an empty list of maps of an unknown type can unify with a list of maps of strings. I think the root cause is still in the vicinity of what I previously guessed: the default-inserting and the type conversion are not happening in an order which allows Terraform to see that it needs to unify those two types together, and so it is instead crashing while trying to construct a non-unified type.

With all of that said though: if you use an exact type constraint (no any placeholders) then you should be able to avoid this problem today. We'll see about fixing it so that Terraform can correctly guess that any is intended to be string, but if you just write string directly then Terraform won't need to guess what you meant and so you can work around this problem with Terraform v1.3.6.

It would help if you could share the entire declaration of this variable and the entire value you're assigning to it so that we can make sure we're reproducing exactly the problem you're seeing, and then we can add a test case to Terraform that's equivalent to your configuration and change Terraform so that it doesn't crash anymore.

@rajatvig
Copy link
Author

Here's the full declaration

variable "node_pools" {
  type = list(
    object({
      name                        = optional(string, "default")
      initial_count               = optional(number, 2)
      min_count                   = optional(number, 0)
      max_count                   = optional(number, 10)
      node_locations              = optional(list(string), [])
      version                     = optional(string, "cluster-default")
      max_pods_per_node           = optional(number, 64)
      auto_repair                 = optional(bool, true)
      auto_upgrade                = optional(bool, false)
      max_surge                   = optional(number, 1)
      max_unavailable             = optional(number, 0)
      image_type                  = optional(string, "cos")
      machine_type                = optional(string, "n1-standard-8")
      min_cpu_platform            = optional(string, "Intel Skylake")
      disk_size_gb                = optional(number, 100)
      disk_type                   = optional(string, "pd-ssd")
      preemptible                 = optional(bool, false)
      service_account             = optional(string, "cluster-default")
      node_metadata               = optional(string, "cluster-default")
      enable_secure_boot          = optional(bool, false)
      enable_integrity_monitoring = optional(bool, true)
      labels                      = optional(map(string), {})
      metadata                    = optional(map(string), {})
      taints                      = optional(list(map(any)), [])
      tags                        = optional(list(string), [])
    })
  )
  description = "list of maps containing node pools"
}

The different ways we instantiate it

  node_pools = [
    {
      name              = "node-pool-32"
      machine_type      = "n2-standard-32"
      min_cpu_platform  = "Intel Ice Lake"
      initial_count     = 0
      min_count         = 0
      max_pods_per_node = 64
      image_type        = "cos_containerd"
      labels = {
        "domain.com/nodepool" = "default"
      }
    },
    {
      name              = "node-envoy-32"
      machine_type      = "n2-highcpu-32"
      min_cpu_platform  = "Intel Ice Lake"
      initial_count     = 0
      min_count         = 0
      max_pods_per_node = 64
      image_type        = "cos_containerd"
      labels = {
        "domain.com/nodepool" = "envoy"
      }
      taints = [
        {
          key    = "etsy.com/nodepool"
          value  = "envoy"
          effect = "NO_SCHEDULE"
        },
      ]
    },
  ]

The taints is an abstraction over https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#nested_taint which I'll try and define the type out a little more vs using any.

@apparentlymart
Copy link
Member

Ahh, I see... so map(list(any)) being inferred as map(list(string)) was working okay for now because all of the arguments in that taint nested block type happen to be strings. That assumption would not hold if there were any arguments in that block that need to be of other types.

Based on your closing sentence I think you've already figured out what I'm about to suggest 😀 but just for completeness I'll write it out anyway:

  taints = optional(list(object({
    key    = string
    value  = string
    effect = string
  })), [])

The above should still accept the values you showed in your example and should still be compatible with the code inside your module that's making use of this taints attribute to populate the real taint block.

@apparentlymart
Copy link
Member

apparentlymart commented Dec 16, 2022

I reproduced this using essentially exactly the configuration shown in the latest comments above.

Root module:

module "child" {
  source = "./mod"

  node_pools = [
    {
      name              = "node-pool-32"
      machine_type      = "n2-standard-32"
      min_cpu_platform  = "Intel Ice Lake"
      initial_count     = 0
      min_count         = 0
      max_pods_per_node = 64
      image_type        = "cos_containerd"
      labels = {
        "domain.com/nodepool" = "default"
      }
    },
    {
      name              = "node-envoy-32"
      machine_type      = "n2-highcpu-32"
      min_cpu_platform  = "Intel Ice Lake"
      initial_count     = 0
      min_count         = 0
      max_pods_per_node = 64
      image_type        = "cos_containerd"
      labels = {
        "domain.com/nodepool" = "envoy"
      }
      taints = [
        {
          key    = "etsy.com/nodepool"
          value  = "envoy"
          effect = "NO_SCHEDULE"
        },
      ]
    },
  ]
}

...and then in the ./mod directory:

variable "node_pools" {
  type = list(
    object({
      name                        = optional(string, "default")
      initial_count               = optional(number, 2)
      min_count                   = optional(number, 0)
      max_count                   = optional(number, 10)
      node_locations              = optional(list(string), [])
      version                     = optional(string, "cluster-default")
      max_pods_per_node           = optional(number, 64)
      auto_repair                 = optional(bool, true)
      auto_upgrade                = optional(bool, false)
      max_surge                   = optional(number, 1)
      max_unavailable             = optional(number, 0)
      image_type                  = optional(string, "cos")
      machine_type                = optional(string, "n1-standard-8")
      min_cpu_platform            = optional(string, "Intel Skylake")
      disk_size_gb                = optional(number, 100)
      disk_type                   = optional(string, "pd-ssd")
      preemptible                 = optional(bool, false)
      service_account             = optional(string, "cluster-default")
      node_metadata               = optional(string, "cluster-default")
      enable_secure_boot          = optional(bool, false)
      enable_integrity_monitoring = optional(bool, true)
      labels                      = optional(map(string), {})
      metadata                    = optional(map(string), {})
      taints                      = optional(list(map(any)), [])
      tags                        = optional(list(string), [])
    })
  )
}

After running terraform apply, I got exactly the same panic message at the same source location as shown in the original issue.

As I mentioned in an earlier comment, my sense is that Terraform should've been able to see that the taints value that defaults to [] in the first node pool can unify with the explicitly-specified one in the second element, and determine then that the any in that type constraint should resolve to string (making taints final type be list(map(string))) and then the empty list in the first element should be able to convert to that type successfully.

I suspect this problem is arising because Terraform is trying to insert all of the default values before trying to resolve the type constraint, but I've not yet confirmed that so I will not yet mark this as "explained". I've run out of time to spend on this for now so I'll leave this context here in the hope it's useful to someone else who might dig into this further.

@apparentlymart apparentlymart added confirmed a Terraform Core team member has reproduced this issue config and removed waiting for reproduction unable to reproduce issue without further information labels Dec 16, 2022
@liamcervante liamcervante self-assigned this Dec 16, 2022
@liamcervante
Copy link
Member

@apparentlymart is correct, the defaults package is trying to replace a null list(map(string)) with an empty list(map(dynamic)), and then when the object is merged back into the collection with the other unchanged object we see the crash because the types no longer match.

This was introduced in v1.3.4 by this PR: #32027. Unfortunately that PR did fix multiple other crashes so the fix here isn't as simple as just reverting it.

I will look into making the HCL defaults package resolve this, as it should be clever enough to convert an empty list with a nested dynamic type into an empty list with nested string types. This should be doable with more liberal application of the Convert and Unify functions from the go-cty convert package within HCL.

@github-actions
Copy link

github-actions bot commented Mar 3, 2023

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug config confirmed a Terraform Core team member has reproduced this issue cty Use in conjunction with "upstream" when cty is the relevant upstream explained a Terraform Core team member has described the root cause of this issue in code
Projects
None yet
3 participants