-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocation resources usage showing zero for rootless raw_exec driver in v1.7.6 #20285
Comments
FYI, this post seems like reporting exactly same issue with mine. |
Hi @simon1990zcs and thanks for raising this issue. The Nomad client process needs to run as root and therefore this could be causing the issue you have seen. Recent cgroup changes and other isolation changes to support features like NUMA and cgroupv2 are likely to be involved here. This issue has some useful additional information. Allocation resource usage is collected by the Nomad clients, and therefore the logs from those agents might hold the information and errors to diagnose this problem. |
Hi, @jrasell , thanks for looking into it.
Let me know if you need any more information. |
in case you need full logs from nomad clients, including the logs from running
|
Now I assure more this is a bug, since it works fine on Windows and MacOs nomad clients, but not Linux, or at least CentOS 7.6. |
@jrasell hey, James, I found the root cause after debugging in code base.
One proposal of code fixing is adding a condition here, to allow a fallback logic when cgroup is missing, as below.
The |
Nomad version
Nomad v1.6.9 (expected behavior) compared with Nomad v1.7.6 (incorrect behavior)
Operating system and Environment details
Issue
In v1.7.6, allocation resources usage showing wrong (zero) for Raw_Exec driver, while nomad is running rootless environment (NOT running as ROOT). In the contrast, it would display properly in v1.6.9.
FYI, I don't know how nomad behaves under ROOT or under other drivers (Java, Docker, Exec,...), since we can't run as ROOT.
Reproduction steps
I did these steps below exactly for nomad v1.6.9 and v1.7.6, while nomad v1.69 output expected value and nomad v1.7.6 output wrong (zero) values.
nomad agent -dev -log-include-location -log-level=debug -bind 0.0.0.0
nomad job run python-service.hcl
. Note: the python-service.hcl is shown belownomad alloc status <correspondingAllocId>
. FYI, this misbehavior is not just happening onnomad alloc status
command, it's happening across the board, including the usage graph on UI dashboard, /v1/metrics API output.Expected Result
The CPU and Memory usage should be displayed properly, non-zero, as shown in nomad v1.6.9.
Actual Result
As shown in output below, nomad v1.7.6 output 0 for both CPU and memory usage.
Job file (if appropriate)
Nomad Server logs (if appropriate)
Below are two startup logs from both Nomad v1.6.9 and Nomad v.1.7.6.
I noticed that cgroup is not detected in v1.7.6, I don't think it's related to CPU/Memory usage info, just worth mentioning.
Startup log of Nomad v1.6.9
the startup log of nomad v1.7.6
Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: