nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation #66

nhi-vanye · 2019-05-27T10:43:43Z

I have a terraform environment that creates a VM, add nomad to the VM and then deploys a number of nomad jobs to the environment - this has been working for about 18months.

A recent change to the terraform-nomad-provider (1.4 is broke, 1.3 works) has changed this so that when running terraform plan or terraform apply that it seems to fail in building the "environment". No state file is built and there is no prompt for whether to make changes.

A typical error looks like

...
data.vsphere_resource_pool.pool: Refreshing state...
data.vsphere_network.vswitch: Refreshing state...
data.vsphere_datastore.datastore: Refreshing state...
data.vsphere_network.vswitch: Refreshing state...
data.vsphere_datastore.datastore: Refreshing state...
data.vsphere_virtual_machine.backup_detective_templ: Refreshing state...
data.vsphere_resource_pool.pool: Refreshing state...


Error: Error running plan: 20 error(s) occurred:

* nomad_job.prometheus: 1 error(s) occurred:

* nomad_job.prometheus: error from 'nomad plan': Put https://192.168.19.100:4646/v1/job/prometheus/plan?region=local: dial tcp 192.168.19.100:4646: i/o timeout

The 192.168.19.100 is a vsphere_virtual_machine resource that has not yes been created (it should be as part of the apply)

Terraform Version

Terraform v0.11.8
+ provider.archive v1.2.2
+ provider.consul v2.3.0
+ provider.http v1.1.1
+ provider.ignition v1.0.1
+ provider.nomad v1.4.0
+ provider.null v2.1.2
+ provider.template v2.1.2
+ provider.vsphere v1.3.0

I have narrowed it down to a nomad 1.4 issue - fixing the plugin to use version ~> 1.3.0 worked.

Looked through the change log for 1.4 and nothing seemed to jump out ( did try setting detach to false - no change in behavior).

A typical job file looks like

resource "nomad_job" "prometheus" {

    # runs on the management servers
    #
    depends_on = [ "vsphere_virtual_machine.mgmtsvr",
                   "null_resource.mgmtsvr"
                 ]

    jobspec = <<EOF

job "prometheus" {

    region = "local"
    datacenters = ["ca"]

    type = "batch"

    parameterized {
        payload = "forbidden",
        meta_optional = [ "doit" ]
    }

    constraint {
        attribute = "$${node.class}"
        value     = "fpsvr"
    }

    group "servers" {

        task "prometheus" {

            driver = "docker"

            env {

                CONSOL_HOST="${element(var.mgmtsvr_nodes,0)}"
                CONSUL_TOKEN="${base64encode(substr(var.consul_token,0,16))}"
            }


            config {

                image = "prom/prometheus"

                network_mode = "weave"

                dns_search_domains = [ "weave.fred.local" ]

                hostname = "prometheus.weave.fred.local"

                force_pull = true

                volumes = [
                    "/etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro",
                    "/data/prometheus/:/data/"
                ]

                args = [

                    "--config.file=/etc/prometheus/prometheus.yml",

                    "--storage.tsdb.path=/data/data/",


                ]
            }


            resources {
                memory = 128
            }

            service {

                name = "prometheus"

            }
        }
    }
}
EOF

}

The text was updated successfully, but these errors were encountered:

cgbaker · 2019-06-26T13:44:49Z

@nhi-vanye, it's not immediately clear what the problem would be. I will reach out to the Terraform team and see if they have any advice for this.

nhi-vanye changed the title ~~nomad 0.3 to 0.4 all jobs are failing - not even prompting for confirmation~~ nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation May 27, 2019

cgbaker added this to the 1.4.2 milestone Sep 5, 2019

lgfa29 mentioned this issue Sep 5, 2019

Prevent error if Nomad server is not available during plan phase #74

Merged

lgfa29 closed this as completed in #74 Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation #66

nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation #66

nhi-vanye commented May 27, 2019 •

edited

Loading

cgbaker commented Jun 26, 2019

nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation #66

nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation #66

Comments

nhi-vanye commented May 27, 2019 • edited Loading

Terraform Version

cgbaker commented Jun 26, 2019

nhi-vanye commented May 27, 2019 •

edited

Loading