Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform nomad_job throwing "job stanza not found" error during terraform plan when we have made no code change #92

Closed
wfeng-fsde opened this issue Jan 31, 2020 · 9 comments · Fixed by #105
Assignees
Labels
Milestone

Comments

@wfeng-fsde
Copy link

Hi there,

Thank you for opening an issue. Please note that we try to keep the Terraform issue tracker reserved for bug reports and feature requests. For general usage questions, please see: https://www.terraform.io/community.html.

Terraform Version

Run terraform -v to show the version. If you are not running the latest version of Terraform, please upgrade because your issue may have already been fixed.

This command will also output the provider version, please include that as well.

$ terraform -v
Terraform v0.12.20
+ provider.archive v1.3.0
+ provider.aws v2.47.0
+ provider.consul v2.6.1
+ provider.local v1.4.0
+ provider.nomad v1.4.2
+ provider.null v2.1.2
+ provider.random v2.2.1
+ provider.template v2.1.2
+ provider.tls v2.1.1
+ provider.vault v2.7.1`

Nomad Version

Run nomad server members in your target node to view which version of Nomad is running. Make sure to include the entire version output.

$ nomad server members                                                                                                                                                                                                       [01/31|10:52AM]
Name                                   Address      Port  Status  Leader  Protocol  Build   Datacenter              Region
nomad-shared-ip-10-27-0-143.us-west-2  10.27.0.143  4648  alive   true    2         0.10.2  apiq-sre-us1-r1-shared  us-west-2
nomad-shared-ip-10-27-1-224.us-west-2  10.27.1.224  4648  alive   false   2         0.10.2  apiq-sre-us1-r1-shared  us-west-2
nomad-shared-ip-10-27-2-209.us-west-2  10.27.2.209  4648  alive   false   2         0.10.2  apiq-sre-us1-r1-shared  us-west-2

Provider Configuration

Which values are you setting in the provider configuration?

provider "nomad" {
  ...
  version   = "~> 1.4"
}

Environment Variables

Do you have any Nomad specific environment variable set in the machine running Terraform?

env | grep "NOMAD_"

Nothing.

Affected Resource(s)

Please list the resources as a list, for example:

  • nomad_job

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

resource "nomad_job" "zookeeper_server" {
  jobspec = templatefile("${path.module}/zookeeper_server_nomad.tpl", {
    instance_count             = var.instance_count
    nomad_region           = var.nomad_region
    nomad_datacenter       = var.nomad_datacenter
    ...
  })
}

and the template is:

      type          = "service"
    
      region        = "${nomad_region}"
      datacenters   = ["${nomad_datacenter}"]
    
      constraint {
        operator  = "distinct_hosts"
        value     = "true"
      }
    
      meta {
        S3BUCKET = "${meta_s3_bucket}"
      }
    
      group "main" {
        count = ${instance_count}
    
        constraint {
          attribute = "$${meta.ResourceId}"
          operator  = "=="
          value     = "${resource_id}"
        }
    
        meta {
          restartflag = "1"
        }
    
        update {
          max_parallel        = 1
          health_check        = "checks"
          ...
          canary              = 0
        }
    
        restart {
          attempts  = 2
          ...
        }
    
        reschedule {
          unlimited      = false
          ...
          max_delay      = "30m"
        }
    
        ephemeral_disk {
          size = 2048 #MB
        }
    
        task "exhibitor-prestep" {
          driver = "raw_exec"
          ...
          }
    
        }
    
        task "exhibitor" {
          driver = "raw_exec"
    
          artifact {
            source = "..."
          }
        }
      }
    
      migrate {
        max_parallel        = 1
        ...
      }
    }

Debug Output

Please provider a link to a GitHub Gist containing the complete debug output: https://www.terraform.io/docs/internals/debugging.html. Please do NOT paste the debug output in the issue; just paste a link to the Gist.

During terraform plan, I get the following error:


  on .terraform/.../modules/zookeeper_nomad_job/main.tf line 1, in resource "nomad_job" "zookeeper_server":
   1: resource "nomad_job" "zookeeper_server" {

Panic Output

If Terraform produced a panic, please provide a link to a GitHub Gist containing the output of the crash.log.

Expected Behavior

What should have happened?

We have not changed this code for quite a long time and our infra has been up-to-date with this resource. So I expect terraform plan should pass with no change in this resource or no error output.

Actual Behavior

What actually happened?
We came across this just recently while we made some changes to some other resources that is totally unrelated, terraform plan gives us this error. So I'm suspecting that this is a provider bug.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

This was during terraform plan and we have not made any code change to this resource.

Important Factoids

Are there anything atypical about your accounts that we should know? For example: Do you have ACL enabled? Multi-region deployment?

No

References

Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:

  • GH-1234
@wfeng-fsde wfeng-fsde changed the title Terraform nomad_job throwing "job stanza not found" error Terraform nomad_job throwing "job stanza not found" error during terraform plan when we have made no code change Jan 31, 2020
@cgbaker
Copy link
Contributor

cgbaker commented Jan 31, 2020

Hi @wfeng-fsde , thanks for the report. Was there an upgrade (of Terraform or the Nomad provider) that changed to cause this?

@rmlsun
Copy link

rmlsun commented Feb 4, 2020

@cgbaker Teammate of @wfeng-fsde here.

We ran into this issue with Terraform 0.12.18 first I believed. Then we tried using latest Terraform 0.12.20. But we ran into same issue with both versions of Terraform.

On the nomad provider side, since the provider version spec is "~> 1.4", I am not sure if we were using the same version earlier, but now I see we're pulling v1.4.2 version of nomad tf provider.

@cgbaker
Copy link
Contributor

cgbaker commented Feb 4, 2020

I'm not getting the same error as you, but I'm getting something similar. Using the following versions:

$ terraform -v
Terraform v0.12.20
+ provider.nomad v1.4.2

and a shortened template base on the one included above. I get the following error:

Error: error parsing jobspec: 4 errors occurred:
	* invalid key: type
	* invalid key: region
	* invalid key: datacenters
	* invalid key: group

The reason is that the jobspec string for the nomad_job resource must be a valid Nomad jobspec; specifically, it must include a job stanza. The error is resolved by modifying my template file to have the following shape:

job "jobname" {
  ...
}

There are a few things I don't understand: why you're getting a different error message and why this worked before. Is that the entire template pasted above (perhaps the copy above is missing the first line)? If not, can you provide the full template?

@mahsoud
Copy link

mahsoud commented Apr 15, 2020

I also ran into this bug with Terraform v0.12.24 + provider.nomad v1.4.5

We are able to reproduce 100% of time. Here is my job template:

job "docs" {
  datacenters = ["test"]
  group "example" {
    meta {
      date_time = "${deploy_timestamp}"
    }
    task "server" {

      driver = "raw_exec"
      template {
        destination   = "local/sample.conf"
        data = "sample"
      }
    }
  }
}

and this is main.tf:

locals {
    template_vars = {
        deploy_timestamp = formatdate("DD-MM-YY hh-mm ZZZ", timestamp())
    }
}

resource nomad_job test {
  jobspec                 = templatefile("${path.module}/job.hcl.tpl", local.template_vars)
  deregister_on_destroy   = true
  deregister_on_id_change = true
}

it works fine on first apply, but after nomad_job is added to state file, the refresh of the state fails to parse job spec with error "'job' stanza not found"

@mahsoud
Copy link

mahsoud commented Apr 15, 2020

Tried replacing timestamp with random_id (via random provider) and got the same result.

Moved deploy_timestamp from meta, into template and also got the same result.

It seems that any dynamic value that changes between apply commands causes the bug. At the same time, if deploy_timestamp is set via a terraform variable that we change between applys works fine.

@cgbaker
Copy link
Contributor

cgbaker commented Apr 15, 2020 via email

@cyrilgdn
Copy link

It seems that any dynamic value that changes between apply commands causes the bug

Even if the generated value doesn't change 🤔
I used:

  [...]
  jobspec = templatefile("${path.module}/test.nomad", {
    date = formatdate("YYYY-MM-DD", timestamp())
  })
}

So the generated value today was 2020-04-29, both apply was made today and I had this error on the second one. Very strange 😮

@cyrilgdn
Copy link

cyrilgdn commented Apr 29, 2020

@cgbaker So I quickly checked and here:

https://github.com/terraform-providers/terraform-provider-nomad/blob/f8b584cd73bc9dce1e64c8758fa62806aba77486/nomad/resource_job.go#L449-L459

As soon as there is a timestamp() call, the field is set as computed (as Terraform can't know the value before the real apply) and d.GetChange returns an empty value for newSpecRaw so the JSON parsing fails.

I tried to add

	if !d.NewValueKnown("jobspec") {
		return nil
	}

just before and NewValueKnown returns true in this case so it returns and I get this diff:

[...]
      ~ jobspec                 = <<~EOT
            job "docs" {
              datacenters = ["dc1"]
              group "example" {
                meta {
                  date = "2020-04-29 48"
                }
                task "server" {
                  driver = "docker"
                  config {
                    image = "nginx"
                  }
                }
              }
            }
        EOT -> (known after apply)
[...]

Which indeed says known after apply.

The problem is that in this case, there will be a diff on every apply. But I think there's no choice actually...

@mahsoud
Copy link

mahsoud commented May 12, 2020

The problem is that in this case, there will be diff on every application. But I think there's no choice actually.

I believe this is fair, and in my case, this was expected and desired behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants