Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource usage monitoring for build steps #3860

Merged
merged 6 commits into from
Jun 28, 2023

Conversation

tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented May 11, 2023

This adds the possibility of capturing resource usage of build steps (CPU, memory, io, network) so it can be used for performance analysis or resource controls #2108 in the future.

The usage data is available in provenance attestation. To opt in user needs to set capture-usage=true. For provenance available via history API, usage is available automatically.

Some examples: Search resourceUsage, sysUsage:

https://explore.ggcr.dev/?blob=tonistiigi/buildkit@sha256:c771192c6f500ab46dc75c8aeced2d45cd564ca18dc230458aa225fbdd718936&mt=application%2Fvnd.in-toto%2Bjson&size=101624
https://explore.ggcr.dev/?blob=tonistiigi/buildkit@sha256:55904ca0b7a18fe76c10438c2a2c756eb73302628f6027138cd430b37de13d24&mt=application%2Fvnd.in-toto%2Bjson&size=109356
https://explore.ggcr.dev/?blob=docker.io/tonistiigi/buildkit@sha256:84fc0b9059458cce035df1d49402e30b441ff2a1ebe79f1d7aad6afa3e80754d&mt=application%2Fvnd.in-toto%2Bjson&size=61853

This feature requires cgroupv2. Pressure fields require a kernel with CONFIG_PSI enabled. memory.peak file requires kernel 5.19+ . Related fields are empty if requirements are not met or some group controllers are not enabled. I don't think it makes sense to add any fallbacks for cgroupv1 or non-PSI. These will be requirements for #2108 anyway in the future.

Network monitoring only works with CNI provider.

Samples are taken at the end of the step and also while execution if the step takes a long time. The minimal sample interval is 2sec and the maximum limit of samples is 10 per build step.

Please check if I'm missing any fields that could be useful, or if you think some of the fields are useless. I didn't add all fields, but I think it is better to add more than miss out on something that could become useful in the future.

System samples for CPU/Memory are added as well so they can be compared against step information to understand how much relative resources a step used. The maximum number of system samples for the whole build is 20 (same 2sec minimum interval).

@tonistiigi tonistiigi force-pushed the step-usage-monitoring branch 3 times, most recently from 9b68967 to 853b11e Compare May 12, 2023 00:46
@thaJeztah
Copy link
Member

Is this something we could do with containerd's cgroups package? Does this mean we now have 3 separate implementations (runc, containerd, buildkit)?

@tonistiigi
Copy link
Member Author

I don't see what could be reused from there.

@thaJeztah
Copy link
Member

Isn't the "stats" code in there that collects metrics of the container's cgroup?

@AkihiroSuda
Copy link
Member

I don't see what could be reused from there.

Would be nice if the structs can be shared
https://pkg.go.dev/github.com/containerd/cgroups/v3@v3.0.1/cgroup2/stats

@tonistiigi
Copy link
Member Author

Would be nice if the structs can be shared

But they are not the same, for example:

  • This is designed around PSI so we can track what aspects are blocking the build. I don't see any PSI/pressure types there.
  • The sample for build step is taken after the container has exited. So the memory values are zeros in that case, except for memory.peak that is not in that pkg. For current memory fields, there are many other fields that I didn't add - are you making a case that these are useful and should all be included?
  • There are fields that control the limits, eg. pids, memory, swap etc. This PR is not about setting or detecting limits. If buildkit limit is set in LLB then it is already part of definition.
  • The IO types are completely different as containerd types track devices, while this PR captures throughput.

We could take this one type https://pkg.go.dev/github.com/containerd/cgroups/v3@v3.0.1/cgroup2/stats#CPUStat and embed it into our type that has the pressure support. But I don't think it makes sense to include a big package for just one struct with 6 fields.

@AkihiroSuda
Copy link
Member

needs rebase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments linking to the specifications for the underlying file formats in https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html.

executor/resources/cpu.go Show resolved Hide resolved
*cpuStat.UserNanos = value * 1000
case cpuSystemUsec:
cpuStat.SystemNanos = new(uint64)
*cpuStat.SystemNanos = value * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use uint64Ptr (from io.go) to simplify this.

executor/resources/memory.go Show resolved Hide resolved
util/network/cniprovider/cni_linux.go Show resolved Hide resolved
}
return nil, releaseContainer(context.TODO())

return rec, rec.CloseAsync(releaseContainer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, can we not spawn a goroutine here, so we can have a standard Close function:

go func() {
  // errors are explicitly discarded in this example
  _ = rec.Close()  
  releaseContainer()
}()
return rec, nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already did a new implementation before I realized why I did it this way.

If the daemon is closed before releaseContainer can be called, then it can leak resources. Monitor will take care of this and will block daemon from shutting down before all the release calls have been invoked. So if releaseContainer has not been called synchronously we still need to register it so that it is guaranteed to be called before shutdown. In a new goroutine this guarantee would not exist. 451e18c

This would be cleaner if Executor itself would have a Shutdown function. Then in here, it could make it block until release has been called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add this context in a comment, I think it's too easy to accidentally remove in a refactor or similar.

executor/containerdexecutor/executor.go Outdated Show resolved Hide resolved
@jedevc
Copy link
Member

jedevc commented May 17, 2023

I'll be honest, I'm not sure I have a good grasp on why we need both per-step resource usage, and sysUsage - what are they both tracking and why are they different?

Is sysUsage capturing from Buildkit, while the per-step is from each ExecOp? Ideally we could have some comments inline that could elaborate 😄

@tonistiigi
Copy link
Member Author

@jedevc sysUsage is capturing the total system usage while the build is happening. This is important to put the step usage into relative context of how much of the whole system was used and what other things were happening at the system while the step/build was running. Eg. you can look at the usage of the same steps in two different runs, and while their user/sys times match, their real time might be completely different if the system was constrained by some other process at the same time. This allows to understand these cases and, for example, suggest that the reason the build step was unexpectedly slow was not because of the step itself but because of the other processes in the system. There is also sysCPUStat per step that is a delta of system CPU usage between the start and end of the step for comparison with the CPU info captured by the specific cgroup.

@tonistiigi tonistiigi added this to the v0.12.0 milestone May 22, 2023
@AkihiroSuda
Copy link
Member

Needs rebase again

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
This can be used to convert step usage to relative units.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@AkihiroSuda AkihiroSuda requested a review from jedevc June 20, 2023 10:46
@@ -396,6 +405,12 @@ func NewProvenanceCreator(ctx context.Context, cp *provenance.Capture, res solve
}
}

withUsage := false
if v, ok := attrs["capture-usage"]; ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs update

Copy link
Member

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of unresolved comments on the PR (would be nice to leave some comments/links inline to make it easier to read later, but shouldn't be a blocker).

@tonistiigi tonistiigi merged commit a2d1c24 into moby:master Jun 28, 2023
55 checks passed
@gabriel-samfira
Copy link
Collaborator

gabriel-samfira commented Jul 3, 2023

This ends up crashing on Windows due to the fact that there is no procfs. We either disable it by returning a stub sampler or we skip allocating a sampler altogether and check for nil pointer before we call Record(). How would you like to proceed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants