Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cpu_total_compute configuration for CPU usage too #17628

Merged
merged 1 commit into from Jul 19, 2023

Conversation

TrueBrain
Copy link
Contributor

@TrueBrain TrueBrain commented Jun 20, 2023

Fixes #17577.

Currently, cpu_total_compute is only used for fingerprinting / capacity, and doesn't influence TotalTicksAvailable. But TotalTicksAvailable is used to calculate CPU usage ticks. If for example on ARM64 no CPU MHz can be detected, TotalTicksAvailable remains zero. And in result all CPU usage remains zero.

This PR sets out to address that problem by having cpu_total_compute override TotalTicksAvailable too.

A good way to test if this works, is by setting cpu_total_compute to 100. This means that all CPU metrics in the UI etc should show the percentage your CPU is loaded.
So next it is a matter of loading a job that uses 100% of a single core, and one should see 1 / CPU-cores of load. The easiest way I could find was a task that does dd if=/dev/zero of=/dev/null. That burns a single core for as long as you keep it running.

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Jun 20, 2023
@TrueBrain TrueBrain changed the title Use cpu_total_compute setting also for CPU usage information Use cpu_total_compute setting for CPU usage information too Jun 20, 2023
@TrueBrain TrueBrain changed the title Use cpu_total_compute setting for CPU usage information too Use cpu_total_compute configuration for CPU usage information too Jun 20, 2023
@TrueBrain TrueBrain changed the title Use cpu_total_compute configuration for CPU usage information too Use cpu_total_compute configuration for CPU usage too Jun 20, 2023
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TrueBrain I see where you're going with this and how you've threaded it down into the executor. Architecturally this seems like we're having to expose fingerprinted data to the plugin, when normally I'd expect that data to flow the other direction.

I'm going to tag-in @shoenig as I know he's looked at this area of the code a bunch recently to do #16672 and I want to get his thoughts here.

drivers/shared/executor/utils.go Outdated Show resolved Hide resolved
client/fingerprint/cpu.go Show resolved Hide resolved
@tgross tgross moved this from Needs Triage to Triaging in Nomad - Community Issues Triage Jun 21, 2023
@shoenig
Copy link
Member

shoenig commented Jun 22, 2023

Architecturally this seems like we're having to expose fingerprinted data to the plugin, when normally I'd expect that data to flow the other direction.

The total cpu compute available is used by the scheduler though (minus reserved), and is a shared resource among all task drivers. I think @TrueBrain is on the right track, but I'd go even further and attempt to purge any reference to the client/stats package from driver packages. The new NomadDriverConfig.CpuCompute RPC field would then represent the final value Nomad client will use, whether the value originated from fingerprinting or from client config. There's an interesting side affect here where TaskResourceUsage utilization metrics would become dependent on cpu_total_compute being set correctly (if set), but I think that's probably fine.

@TrueBrain let us know if you want to keep working on this, you're definitely poking at some old and surprisingly twisty code with this one 🙂

@TrueBrain
Copy link
Contributor Author

After fiddling with this for a while, I noticed that the stats/cpu.go in the client folder contains things used by both client and driver. And helper/cpu.go contains things only used by client. So in the end, I decided to mostly flip those two files around, with some exceptions.

Because of this, the driver modules now no longer (should) need client/stats, and all use helper/stats.

I also decided to name the function cpuTotalTicks; there were several ways this was named, and it was a bit confusing (what does totalAvailable means?). The only thing I did not change, is how the configuration entry is named. That is still cpu_total_compute.

Curious what you think of this work!

PS: I can't seem to get make test to work locally, so I am just hoping nothing is broken. If there is, I will address them as I spot them :)

@tgross
Copy link
Member

tgross commented Jul 11, 2023

Thanks @TrueBrain. We're in the middle of shipping the release candidate for 1.6.0 but will circle back to re-review once that work is done.

@shoenig
Copy link
Member

shoenig commented Jul 17, 2023

This looks good @TrueBrain! The refactoring definitely makes sense. I think we can focus on this as-is and backport it (going into 1.6.1, 1.5.x, 1.4.x). Just FYI in Nomad 1.7 we're planning some significant work around NUMA node detection, where we might go back to plumbing the cpu_total_compute through the RPC layer (along with the rest of the cpu topology). The trick here of passing it through directly to the NewExecutor constructor doesn't help external plugins.

I tested these changes on ec2 and it seems Task utilization is still broken,

ubuntu@ip-172-31-80-227:~$ ./nomad.tb node status -self -verbose | grep -E '(cpu\.)|(instance-type)'
cpu.arch                                 = arm64
cpu.frequency                            = 2500
cpu.modelname                            = Neoverse-N1
cpu.numcores                             = 8
cpu.reservablecores                      = 8
cpu.totalcompute                         = 20000
platform.aws.instance-type               = t4g.2xlarge
job "example" {
  group "group" {
    task "task" {
      driver = "exec"
      config {
        command = "stress"
        args    = ["-c", "2"] # burn 2 cores
      }
      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

and cpu utilization for the task is still 0, though the system CPU stats are calculated correctly.

ubuntu@ip-172-31-80-227:~$ ./nomad.tb alloc status -stats a8
Task "task" is "running"
Task Resources:
CPU        Memory           Disk     Addresses
0/500 MHz  480 KiB/256 MiB  300 MiB

Memory Stats
Cache  Swap     Usage
0 B    480 KiB  480 KiB

CPU Stats
Percent  System Mode  Throttled Periods  Throttled Time  User Mode
199.91%  0.00%        0                  0               199.91%

Where the task indicates 0/500 MHz I'd expect to see 5000/500 MHz (2 cores * 2500 MHz each).

Likewise the node status utilization metric is zero

ubuntu@ip-172-31-80-227:~$ ./nomad.tb node status -self -stats
Allocated Resources
CPU            Memory          Disk
500/20000 MHz  256 MiB/31 GiB  300 MiB/29 GiB

Allocation Resource Utilization
CPU          Memory
0/20000 MHz  480 KiB/31 GiB

Host Resource Utilization
CPU          Memory          Disk
0/20000 MHz  353 MiB/31 GiB  (/dev/root)

(for comparison, on my amd64 machine)

➜ nomad alloc status -stats 39
Task "task" is "running"
Task Resources:
CPU           Memory           Disk     Addresses
6000/500 MHz  592 KiB/256 MiB  300 MiB

Memory Stats
Cache  Swap     Usage
0 B    592 KiB  592 KiB

CPU Stats
Percent  System Mode  Throttled Periods  Throttled Time  User Mode
200.01%  0.00%        0                  0               200.01%
➜ nomad node status -self -stats
Node Events
Time                  Subsystem  Message
2023-07-17T14:31:16Z  Cluster    Node registered

Allocated Resources
CPU            Memory           Disk
500/96000 MHz  256 MiB/126 GiB  300 MiB/202 GiB

Allocation Resource Utilization
CPU             Memory
5998/96000 MHz  592 KiB/126 GiB

Host Resource Utilization
CPU             Memory           Disk
6119/96000 MHz  685 MiB/126 GiB  19 GiB/233 GiB

@TrueBrain
Copy link
Contributor Author

TrueBrain commented Jul 18, 2023

Ah, yes, I only tested when you set the configuration cpu_total_compute manually. For AWS there is of course the auto-detect part, and so it seems that is not routed through correctly with my PR .. let's see where that information gets lost :)

Edit: meh; bit of a pita to fix properly .. AWS fingerprint is run after CPU fingerprint, and can change the cpu.totalcompute attribute (and resources.CPU). The more I work on this, the more it feels it needs a bit bigger refactor :D But that is not within my scope of expertise atm. So I am going to fix this in the best way I can :)

Edit 2: fixed and tested on a t4g on AWS.

Before this commit, it was only used for fingerprinting, but not
for CPU stats on nodes or tasks. This meant that if the
auto-detection failed, setting the cpu_total_compute didn't resolved
the issue.

This issue was most noticeable on ARM64, as there auto-detection
always failed.
@shoenig
Copy link
Member

shoenig commented Jul 19, 2023

I think this PR is looking good now, thanks @TrueBrain!

The more I work on this, the more it feels it needs a bit bigger refactor

Indeed! That refactor (in combination with refactoring cgroups) is a prerequisite for what I'm working on for 1.7

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 @shoenig the original ticket is classified as enhancement but this really feels more like a bugfix and as you've noted we'll need this for upcoming work. Should we just go ahead and backport this to 1.5.x and 1.4.x too?

@shoenig shoenig added backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line labels Jul 19, 2023
@shoenig shoenig merged commit e190eae into hashicorp:main Jul 19, 2023
27 of 29 checks passed
Nomad - Community Issues Triage automation moved this from Triaging to Done Jul 19, 2023
shoenig pushed a commit that referenced this pull request Jul 19, 2023
(manual backport of e190eae)

Before this commit, it was only used for fingerprinting, but not
for CPU stats on nodes or tasks. This meant that if the
auto-detection failed, setting the cpu_total_compute didn't resolved
the issue.

This issue was most noticeable on ARM64, as there auto-detection
always failed.
shoenig pushed a commit that referenced this pull request Jul 20, 2023
… (#17992)

(manual backport of e190eae)

Before this commit, it was only used for fingerprinting, but not
for CPU stats on nodes or tasks. This meant that if the
auto-detection failed, setting the cpu_total_compute didn't resolved
the issue.

This issue was most noticeable on ARM64, as there auto-detection
always failed.

Co-authored-by: Patric Stout <github@truebrain.nl>
jrasell pushed a commit that referenced this pull request Jul 26, 2023
Before this commit, it was only used for fingerprinting, but not
for CPU stats on nodes or tasks. This meant that if the
auto-detection failed, setting the cpu_total_compute didn't resolved
the issue.

This issue was most noticeable on ARM64, as there auto-detection
always failed.
shoenig added a commit that referenced this pull request Aug 3, 2023
@TrueBrain TrueBrain deleted the fix-arm64-cpu branch August 15, 2023 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line
Development

Successfully merging this pull request may close these issues.

Fix ARM64 CPU monitoring of jobs
3 participants