-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource allocation for non-sidecar prestart tasks #9725
Comments
@tgross - to add to this. I ran some tests on Nomad 1.0.1 and see the following We have a Zookeeper Deployment that consists of the following (in the same group) 1x Prestart Task NO sidecar (64 CPU) Nomad UI reports reserved CPU: 740 So both line 2 and line 4 both seem to be incorrect in the "Allocated" behavior as I would expect to see 676. Here's a screenshot of the allocation, and also the topology visualization |
Thanks for those details @idrennanvmware. I did some further digging and although it looks like the scheduler is doing what I'd expect it to according to the design, but the reported resource allocation is incorrect. I ran the current HEAD on a machine that Nomad fingerprints with 1.9GiB of memory:
The web UI agrees with what we see here, as we'd expect: I have a job with a prestart task and a redis, each of which take 1000MB of memory. If I run the prestart task with
So instead I run the prestart task with jobspec no sidecarjob "example" {
datacenters = ["dc1"]
group "group" {
task "init" {
lifecycle {
hook = "prestart"
sidecar = false
}
driver = "docker"
config {
image = "busybox:1"
command = "/bin/sh"
args = ["-c", "echo ok; sleep 5"]
}
resources {
cpu = 500
memory = 1000
}
}
task "redis" {
driver = "docker"
config {
image = "redis:3.2"
}
resources {
cpu = 500
memory = 1000
}
}
}
} I run it:
Wait a few seconds for the prestart task to finish and the main task to start, then check the allocation status:
Now if we look at the node status, we see we're reporting more memory allocated than is available on the host!
If we look at the allocation in the web UI we see a similar (incorrect) value: But it turns out the scheduler still seems to have the same view of the world. If we run a job that takes 500MB of memory, I'd expect it to work. small jobspecjob "example2" {
datacenters = ["dc1"]
group "group" {
task "redis" {
driver = "docker"
config {
image = "redis:3.2"
}
resources {
cpu = 500
memory = 500
}
}
}
} Run that job and the scheduler accepts it:
And now the node reports even more memory usage which is greater than that available on the host:
So this is not just a documentation issue but a bug in the way that the resources are being reported for tasks with a non-sidecar prestart. At this point I'm going to tag my colleagues @jazzyfresh and @notnoop who worked on this feature and see if they have any insights. |
"Now if we look at the node status, we see we're reporting more memory allocated than is available on the host!" I think this explains a strange topology total count (resources) we were seeing but hadn't tracked down! Thanks for the really detailed breakdown. |
Following up on this, are there any new developments here? |
I'll echo the thanks for the breakdown above, this has been really instructive in understanding how Nomad is mis-reporting what the actual usage is. I was expecting us to have to rework all our prestart tasks to be run within the container before the actual task runs in the same container to save on the ram. (In our case it's doubling the ram allocation I expect it to use in the UI!) We're seeing the same over provision showing in the UI for memory, nodes reporting 8.7GB memory used out of 7.7GB available. We don't have memory overprovisoning enabled (yet), which was confusing me as to how we could get into that state. |
As @tgross mentioned, I tried to reproduce the same thing. Here is the node status before I ran any jobs. It has 25 GB of free space.
Now, I ran the example job mentioned above consisting pre-start task and
It deployed successfully and here is the allocation status
I ran the node status after pre-start task dead.
It's showing 22 GB of allocated memory out of 25 GB. As the pre-start task was allocated 21 GB of memory, let's presume that 21GB of memory is released. So 24 GB of memory would be free according to the scheduler. So, I tried to run the example2 job mentioned above with 5 GB of memory. But it's failed to allocate due to memory exhaust.
I am not able to understand, why the scheduler couldn't allocate the 5 GB of memory. |
Over on Discuss we noted that we're missing documentation around how the scheduler allocates resources when there are
prestart
tasks in play:We should have documentation in the
lifecycle
docs about how resources are scheduled, probably cross-linked with theresources
docs.The text was updated successfully, but these errors were encountered: