Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation #66

Closed
nhi-vanye opened this issue May 27, 2019 · 1 comment · Fixed by #74
Closed

nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation #66

nhi-vanye opened this issue May 27, 2019 · 1 comment · Fixed by #74
Milestone

Comments

@nhi-vanye
Copy link

nhi-vanye commented May 27, 2019

I have a terraform environment that creates a VM, add nomad to the VM and then deploys a number of nomad jobs to the environment - this has been working for about 18months.

A recent change to the terraform-nomad-provider (1.4 is broke, 1.3 works) has changed this so that when running terraform plan or terraform apply that it seems to fail in building the "environment". No state file is built and there is no prompt for whether to make changes.

A typical error looks like

...
data.vsphere_resource_pool.pool: Refreshing state...
data.vsphere_network.vswitch: Refreshing state...
data.vsphere_datastore.datastore: Refreshing state...
data.vsphere_network.vswitch: Refreshing state...
data.vsphere_datastore.datastore: Refreshing state...
data.vsphere_virtual_machine.backup_detective_templ: Refreshing state...
data.vsphere_resource_pool.pool: Refreshing state...


Error: Error running plan: 20 error(s) occurred:

* nomad_job.prometheus: 1 error(s) occurred:

* nomad_job.prometheus: error from 'nomad plan': Put https://192.168.19.100:4646/v1/job/prometheus/plan?region=local: dial tcp 192.168.19.100:4646: i/o timeout

The 192.168.19.100 is a vsphere_virtual_machine resource that has not yes been created (it should be as part of the apply)

Terraform Version

Terraform v0.11.8
+ provider.archive v1.2.2
+ provider.consul v2.3.0
+ provider.http v1.1.1
+ provider.ignition v1.0.1
+ provider.nomad v1.4.0
+ provider.null v2.1.2
+ provider.template v2.1.2
+ provider.vsphere v1.3.0

I have narrowed it down to a nomad 1.4 issue - fixing the plugin to use version ~> 1.3.0 worked.

Looked through the change log for 1.4 and nothing seemed to jump out ( did try setting detach to false - no change in behavior).

A typical job file looks like

resource "nomad_job" "prometheus" {

    # runs on the management servers
    #
    depends_on = [ "vsphere_virtual_machine.mgmtsvr",
                   "null_resource.mgmtsvr"
                 ]

    jobspec = <<EOF

job "prometheus" {

    region = "local"
    datacenters = ["ca"]

    type = "batch"

    parameterized {
        payload = "forbidden",
        meta_optional = [ "doit" ]
    }

    constraint {
        attribute = "$${node.class}"
        value     = "fpsvr"
    }

    group "servers" {

        task "prometheus" {

            driver = "docker"

            env {

                CONSOL_HOST="${element(var.mgmtsvr_nodes,0)}"
                CONSUL_TOKEN="${base64encode(substr(var.consul_token,0,16))}"
            }


            config {

                image = "prom/prometheus"

                network_mode = "weave"

                dns_search_domains = [ "weave.fred.local" ]

                hostname = "prometheus.weave.fred.local"

                force_pull = true

                volumes = [
                    "/etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro",
                    "/data/prometheus/:/data/"
                ]

                args = [

                    "--config.file=/etc/prometheus/prometheus.yml",

                    "--storage.tsdb.path=/data/data/",


                ]
            }


            resources {
                memory = 128
            }

            service {

                name = "prometheus"

            }
        }
    }
}
EOF

}

@nhi-vanye nhi-vanye changed the title nomad 0.3 to 0.4 all jobs are failing - not even prompting for confirmation nomad 1.3 to 1.4 all jobs are failing - not even prompting for confirmation May 27, 2019
@cgbaker
Copy link
Contributor

cgbaker commented Jun 26, 2019

@nhi-vanye, it's not immediately clear what the problem would be. I will reach out to the Terraform team and see if they have any advice for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants