You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We received in an internal report where a job that had been stopped with the -purge flag and then re-run would not be placed until after a nomad system gc command was run. But upon further observation, it appears that while the job doesn't get placed initially, it does eventually get placed.
I suspect that the system gc is a bit of a red herring and timing dependent. There's already a GC evalution for the job in flight because of the -purge flag. But the old allocation still exists in the state store because the client hasn't GC'd it yet. The old allocation's reference to the job is orphaned, but then becomes valid once the new job is registered.
The reporter of the issue has a workaround to run system gc (or they could simply not re-run the job immediately after purging it). So this isn't particularly urgent.
$ nomad job run ./example.nomad
==> Monitoring evaluation "1a2b9089"
Evaluation triggered by job "example"
==> Monitoring evaluation "1a2b9089"
Evaluation within deployment: "e21d83a7"
Allocation "ef54f207" created: node "7df6e0cf", group "group"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "1a2b9089" finished with status "complete"
vagrant@linux$ nomad job stop -purge ^C
vagrant@linux$ nomad job status ex
ID = example
Name = example
Submit Date = 2021-05-04T13:34:46Z
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = pending
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
group 0 0 0 1 0 0
Future Rescheduling Attempts
Task Group Eval ID Eval Time
group ab6d7799 24s from now
Latest Deployment
ID = e21d83a7
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
group 1 1 0 1 2021-05-04T13:44:46Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
ef54f207 7df6e0cf group 0 run failed 7s ago 2s ago
Stop and purge the job, and note that no deployment is visible:
$ nomad job stop -purge example
==> Monitoring evaluation "b98aa4ca"
Evaluation triggered by job "example"
==> Monitoring evaluation "b98aa4ca"
Evaluation within deployment: "e21d83a7"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "b98aa4ca" finished with status "complete"
vagrant@linux$ nomad job run ./example.nomad
==> Monitoring evaluation "1b42cb0e"
Evaluation triggered by job "example"
==> Monitoring evaluation "1b42cb0e"
Evaluation within deployment: "e21d83a7"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "1b42cb0e" finished with status "complete"
$ nomad job status ex
ID = example
Name = example
Submit Date = 2021-05-04T13:35:03Z
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = pending
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
group 0 0 0 0 0 0
Allocations
No allocations placed
Wait a little bit and see that the deployment has been created and failed as expected:
$ nomad job status example
...
Latest Deployment
ID = 9e69d457
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
group 1 1 0 1 2021-05-04T13:45:18Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
09ca1a69 7df6e0cf group 0 run failed 16s ago 11s ago
The text was updated successfully, but these errors were encountered:
We received in an internal report where a job that had been stopped with the
-purge
flag and then re-run would not be placed until after anomad system gc
command was run. But upon further observation, it appears that while the job doesn't get placed initially, it does eventually get placed.I suspect that the
system gc
is a bit of a red herring and timing dependent. There's already a GC evalution for the job in flight because of the-purge
flag. But the old allocation still exists in the state store because the client hasn't GC'd it yet. The old allocation's reference to the job is orphaned, but then becomes valid once the new job is registered.The reporter of the issue has a workaround to run
system gc
(or they could simply not re-run the job immediately after purging it). So this isn't particularly urgent.jobspec
To reproduce, run the failing job:
Stop and purge the job, and note that no deployment is visible:
Wait a little bit and see that the deployment has been created and failed as expected:
The text was updated successfully, but these errors were encountered: