-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support collecting FsUsageMetrics for containerd #2872
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Hi @yyrdl. Thanks for your PR. I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
@googlebot I fixed it. |
/ok-to-test |
Thanks for PR! Can you clarify what cAdvisor prometheus metrics this PR aims to add to for containerd? Is it just |
@@ -37,15 +36,18 @@ type FsUsage struct { | |||
InodeUsage uint64 | |||
} | |||
|
|||
type FsUsageProvider interface { | |||
Usage() (*FsUsage, error) | |||
Targets() []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add godoc on what "Targets" is referring to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added .
this method is used to get where the fs usage is collected.
container/common/fsHandler.go
Outdated
} | ||
|
||
// Wait to handle errors until after all operartions are run. | ||
// Wait to handle errors until after all operations are run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still relevant here? I think it needs to be moved to Usage()
in container/common/fsHandler.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -125,6 +130,21 @@ func (c *client) Version(ctx context.Context) (string, error) { | |||
return response.Version, nil | |||
} | |||
|
|||
func (c *client) ContainerFsUsage(ctx context.Context, snapshotter, snapshotkey string) (*common.FsUsage, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does ContainerFsUsage
need to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes , this client is delivered as interface, if this method is not exported, we can't get the fs usage.
func newContainerdContainerHandler(
client ContainerdClient,// Here
name string,
machineInfoFactory info.MachineInfoFactory,
fsInfo fs.FsInfo,
cgroupSubsystems *containerlibcontainer.CgroupSubsystems,
inHostNamespace bool,
metadataEnvs []string,
includedMetrics container.MetricSet,
) (container.ContainerHandler, error)
@@ -125,6 +130,21 @@ func (c *client) Version(ctx context.Context) (string, error) { | |||
return response.Version, nil | |||
} | |||
|
|||
func (c *client) ContainerFsUsage(ctx context.Context, snapshotter, snapshotkey string) (*common.FsUsage, error) { | |||
usage, err := c.snapshotsService.Usage(ctx, &snaptshotapi.UsageRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how expensive is this?
will containerd snapshotter have to recaculate the usage every time this is called or does it cache the usage internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a cache in it's CRI implemention . And Usage
method will recaculate the usage every time, I have change it to ContainerStats
,use CRI instead.
} | ||
return &common.FsUsage{ | ||
BaseUsageBytes: uint64(usage.Size_), | ||
TotalUsageBytes: uint64(usage.Size_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct that these are both set to usage.Size
?
BaseUsageBytes
is "Number of bytes consumed by a container through its root filesystem."
vs TotalusageBytes
is "Total Number of bytes consumed by container."
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ,for containerd ,there is no log file(used to store logs from container's stdout or stderr) ( At least ,I didn't find it (T_T) ) . But for docker and crio , there are log files, so when collect fs usage, the usage of log file is counted.
By the way, kubelet storge container's logs in another place (/var/log/pods
), and it is counted by kubelet when it caculate the usage. see makeContainerStats
in kubelet/stats/cri_stats_provider.go
// NOTE: This doesn't support the old pod log path, `/var/log/pods/UID`. For containers
// using old log path, empty log stats are returned. This is fine, because we don't
// officially support in-place upgrade anyway.
var (
containerLogPath = kuberuntime.BuildContainerLogsDirectory(meta.GetNamespace(),
meta.GetName(), types.UID(meta.GetUid()), container.GetMetadata().GetName())
err error
)
result.Logs, err = p.getPathFsStats(containerLogPath, rootFsInfo)
if err != nil {
klog.Errorf("Unable to fetch container log stats for path %s: %v ", containerLogPath, err)
}
return result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't cadvisor take the log size of /var/log/pods
in account when counting TotalusageBytes
?
/cc @crosbymichael There was earlier slack thread about of FS Usage containerd metrics on cadvisor xref |
There are many metrics defined in fs.FsStats,at current ,we only get For prometheus metrics defined in container.DiskUsageMetrics ,we only lost |
And thanks for your reply :) |
/test pull-cadvisor-e2e |
Can you please rebase? There appears to be some go.mod changes which is causing |
container/containerd/client.go
Outdated
"google.golang.org/grpc" | ||
"google.golang.org/grpc/backoff" | ||
runtimeapi "k8s.io/cri-api/pkg/apis/runtime/v1alpha2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can depend on k8s repos since cadvisor itself is imported into k8s which can result in an import loop.
See #2726 and #2726 (comment) for example where this broke us before.
Do we need to import criapi here? Can we just talk to containerd directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok , this package is just pb files.
And there are two way to get the diskusage metrics :
- call snapshots.Usage()
- call cri.ContainerStats()
They are both grpc interface implemented by containerd , the difference is that snapshots.Usage
didn‘t cache any thing, while containerd cri server will sync the disk usage metrics periodically, see snapshotsSyncer.sync ,in order to avoid collecting disk usage metric repeatedly , we should use the cri method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the background. Unfortunately, I don't think the fact that this package is just pb files matters; I think it is general issue with dependency management in k8s =>
"third party software can use staging modules IFF the third party software is not vendored back into k/k :)"
"k8s.io/cri-api/"
is staging module and cadvisor itself is vendored back into k/k.
@dims, please keep me honest here, but I don't think anything has changed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we be able to switch to snapshots.Usage()
instead? Can we maybe cache the results internally / only collect them during some periodic house keeping period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just consider this feature in cadvisor independently , we can use snapshots.Usage
, but for the sake of performance ,we have to use cri.ContainerStats
instead. there is a picture which shows the relationship between cadvisor ,kubelet and containerd in this situation :
If we use snapshots.Usage
in cadvisor, it will collect the metrics from filesystem repeatedly,which will cause unnecessary performance loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yyrdl!
Would you be able to draft a PR of the kubelet changes required prior to merging this PR?
I would just like to ensure we know exactly what changes are required to kubelet first, prior to merging this PR. That way it's clear if we need to make any other cAdvisor/kubelet changes to fully integrate these changes since most important consumer of cAdvisor will be kubelet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes time to redesign the StatsProvider in kubelet , I am a little busy , but I will do it as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yyrdl FYI, I work with @bobbypage and we are solving another cadvisor-containerd issue which needs CRI as well. To work around the circular dependency issue, current plan is integrating cadvisor with CRI in a separate branch. #2923
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiutongs Good job! So I should submit this PR to this branch too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yyrdl I sent you a message on the slack and we can discuss more.
0217503
to
d593fc1
Compare
fa7fb15
to
2988f63
Compare
extraDir string | ||
fsInfo fs.FsInfo | ||
lastUpdate time.Time | ||
usage FsUsage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is thisusage
the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
ping... |
Do you mean sanbox container(pause:3.2)? |
@vsxen No. I was cloned cadvisor code, ,change to tag v0.43.0, then docker build the cadvisor image. I deployed that image as daemonset on our K8s to check if the container_fs_use_bytes metrics works with containerd. |
what is you containerd version? |
it's containerd://1.4.4 |
Does your prometheus pull metrics from the casvisor you deployed or still from kubelet ?@qingguee |
Hi @yyrdl , I have both. My v0.43.0 cadvisor was deployed as daemonset, and I use endpoints to collect those metrics. I use {job="cadvisor"} to filter out the kubelet advisor data. |
Hi @yyrdl
I deployed cadvisor as daemonset, my Prometheus collect metrics from those endpoint/POD.
So, is this issue fixed in 0.43.0? Or my configuration has issue? |
@qingguee cadvisor is integrated in kubelet ,at current ,the fix of this PR is not merged for some package dependency problem , you can copy the code changed in this pr , and build a custom version of kubelet or cadvisor to solve your problem. |
@yyrdl Thank you so much for the info. Could you help to provide the link for package dependency problem, I'd like to follow up. When that solved, we will do upgrade accordingly. |
What's added?
Collect container's disk usage metrics for containerd .
Related Issue: Disk usage metrics for containerd #2785
Note:
This pr only report the usage and inode information powered by containerd's standard API. Some disk usage information is missing until there is way to get the id of the snapshot or containerd enhances it's API.But for now, we can report what we can get .
We urgently need the disk usage of the container,these information is ok for our scenario in Tencent ,although it's not complete.
For k8s users, cadvisor is integrated in kubelet , you need to apply these changes to your kubelet (modify cadvisor's code in vendor ), and check the
New
method inkubernetes/pkg/kubelet/cadvisor/cadvisor_linux.go
, if there is an conditional statement like below:please add an new condition here,Related discussion: