Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect exit code report by nomad job run #12784

Closed
mr-karan opened this issue Apr 26, 2022 · 3 comments · Fixed by #19876
Closed

Incorrect exit code report by nomad job run #12784

mr-karan opened this issue Apr 26, 2022 · 3 comments · Fixed by #19876
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/cli type/bug

Comments

@mr-karan
Copy link
Contributor

Nomad version

Nomad v1.2.6 (a6c6b475db5073e33885377b4a5c733e1161020c)

Operating system and Environment details

vagrant@localhashi:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal

vagrant@localhashi:~$ uname -a
Linux localhashi 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Issue

When a cluster has no eligible node, the deployment fails - as it should.

image

However, if the node became ready during the progress of deployment, it still errors out with an exit code of 2.

Reproduction steps

  1. Run a single node agent.
$ nomad node status
ID        DC          Name        Class   Drain  Eligibility  Status
29f1643a  localhashi  localhashi  <none>  false  eligible     ready
  1. Make the node as ineligible.
nomad node eligibility -disable -self
Node "29f1643a-2dca-53a4-076d-29c79aaff1f9" scheduling eligibility set: ineligible for scheduling
  1. Run a deployment. (Job file is below)
nomad run redis.nomad
==> 2022-04-26T17:27:52+05:30: Monitoring evaluation "c737ef0a"
    2022-04-26T17:27:52+05:30: Evaluation triggered by job "redis"
==> 2022-04-26T17:27:53+05:30: Monitoring evaluation "c737ef0a"
    2022-04-26T17:27:53+05:30: Evaluation within deployment: "18674959"
    2022-04-26T17:27:53+05:30: Evaluation status changed: "pending" -> "complete"
==> 2022-04-26T17:27:53+05:30: Evaluation "c737ef0a" finished with status "complete" but failed to place all allocations:
    2022-04-26T17:27:53+05:30: Task Group "cache" (failed to place 1 allocation):
      * No nodes were eligible for evaluation
      * No nodes are available in datacenter "localhashi"
    2022-04-26T17:27:53+05:30: Evaluation "3514787b" waiting for additional capacity to place remainder
==> 2022-04-26T17:27:53+05:30: Monitoring deployment "18674959"
  1. Observe that deployment is pending
nomad deployment status 186

ID          = 18674959
Job ID      = redis
Job Version = 0
Status      = running
Description = Deployment is running

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       1        0       0        0          N/A

nomad job status redis     
ID            = redis
Name          = redis
Submit Date   = 2022-04-26T17:27:51+05:30
Type          = service
Priority      = 50
Datacenters   = localhashi
Namespace     = default
Status        = pending
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       1       0         0        0       0         0

Placement Failure
Task Group "cache":
  * No nodes were eligible for evaluation
  * No nodes are available in datacenter "localhashi"

Latest Deployment
ID          = 18674959
Status      = running
Description = Deployment is running

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       1        0       0        0          N/A

Allocations
No allocations placed

image

  1. Mark the node as eligible
nomad node eligibility -enable -self 
Node "29f1643a-2dca-53a4-076d-29c79aaff1f9" scheduling eligibility set: eligible for scheduling
  1. Wait for a few seconds. The deployment is complete and the terminal which was showing the progress loader in step 3 returns. Check the exit code
echo $?
2

Expected Result

Non zero exit code since the deployment eventually was success.

0

Actual Result

echo $?
2

Exit code of 2, as documented here: https://www.nomadproject.io/docs/commands/job/run

Job file (if appropriate)

job "redis" {
  datacenters = ["localhashi"]

  type = "service"

  update {
    max_parallel      = 1
    min_healthy_time  = "10s"
    healthy_deadline  = "3m"
    progress_deadline = "10m"
    auto_revert       = false
    canary            = 0
  }

  migrate {
    max_parallel     = 1
    health_check     = "checks"
    min_healthy_time = "10s"
    healthy_deadline = "5m"
  }

  group "cache" {
    count = 1

    network {
      port "db" {
        to = 6379
      }
    }

    service {
      name = "redis-cache"
      tags = ["global", "cache"]
      port = "db"

    }

    restart {
      attempts = 2
      interval = "1m"
      delay    = "15s"
      mode     = "fail"
    }

    ephemeral_disk {
      size = 300
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:6"

        ports = ["db"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}
@lgfa29 lgfa29 added theme/cli stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Apr 27, 2022
@lgfa29
Copy link
Contributor

lgfa29 commented Apr 27, 2022

Thanks for the detailed report @mr-karan.

I was able to verity the issue and marked for further triage.

@JoaoPPinto
Copy link

Hello,
We've recently been affected by this issue using version 1.4.5
Is this on the roadmap for a future release?

@lgfa29 lgfa29 added this to Needs Triage in Nomad - Community Issues Triage via automation Jun 16, 2023
@lgfa29 lgfa29 moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Jun 16, 2023
@lgfa29
Copy link
Contributor

lgfa29 commented Jun 16, 2023

Hi @JoaoPPinto 👋

This is still in the backlog and has not been roadmapped yet. It got dropped from one of our boards but I placed it back for future roadmapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/cli type/bug
Development

Successfully merging a pull request may close this issue.

3 participants