Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more volume types in stats/summary endpoint including configm… #70172

Merged

Conversation

WanLinghao
Copy link
Contributor

Support more volume types in stats/summary endpoint including configmap, secret, projected

/kind bug

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 24, 2018
@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 24, 2018
@msau42
Copy link
Member

msau42 commented Oct 25, 2018

Du on atomic volume types is interesting because we keep "snapshots" of older versions around, so the size usage will always grow.

What should be the intended reported metric? Should it be total usage + all the snapshots? Or just the size of the current?

@kubernetes/sig-storage-pr-reviews

@msau42
Copy link
Member

msau42 commented Oct 25, 2018

We should probably run some stress tests to make sure that adding these new sources of du won't excasperate #62917. I believe the configmap max size is 1mb? 1mb really shouldn't be a problem but a pod could have many configmaps. Do we cleanup old snapshots periodically or only when a pod is deleted?

@WanLinghao
Copy link
Contributor Author

WanLinghao commented Oct 25, 2018

@msau42 I am not sure which snapshots feature you mean here. Is that about https://github.com/kubernetes-incubator/external-storage/tree/master/snapshot?

Du on atomic volume types is interesting because we keep "snapshots" of older versions around, so the size usage will always grow.

What should be the intended reported metric? Should it be total usage + all the snapshots? Or just the size of the current?

@kubernetes/sig-storage-pr-reviews

@WanLinghao
Copy link
Contributor Author

WanLinghao commented Oct 25, 2018

I think one possible reason is executing "find" command with no nice level set:

findCmd := exec.Command("find", path, "-xdev", "-printf", ".")

We can see that similar "du" invoke in k8s has a nice level 19 set:
out, err := exec.Command("nice", "-n", "19", "du", "-s", "-B", "1", path).CombinedOutput()

And also in cadvisor, the nice level was set to 19:
https://github.com/google/cadvisor/blob/c5510abcd7bf38e1db6b9bfe3bf6728139d8c3e1/fs/fs.go#L619
https://github.com/google/cadvisor/blob/c5510abcd7bf38e1db6b9bfe3bf6728139d8c3e1/fs/fs.go#L572
I will do a test to verify if my guess was correct.

We should probably run some stress tests to make sure that adding these new sources of du won't excasperate #62917. I believe the configmap max size is 1mb? 1mb really shouldn't be a problem but a pod could have many configmaps. Do we cleanup old snapshots periodically or only when a pod is deleted?

@msau42
Copy link
Member

msau42 commented Oct 25, 2018

Ah sorry, I don't mean the volume snapshot feature. What I meant was that when a configmap gets updated, for the volume mount, we make a new timestamped directory with all the contents and then just flip the symlink to point to the new directory. But I don't think we cleanup the older directories until the pod is deleted. So over time, the actual configmap volume directory will keep on growing and never shrink.

@WanLinghao
Copy link
Contributor Author

WanLinghao commented Oct 29, 2018

Hello, I tried many ways to observe the snapshot but I didn't. Did I miss something? Or need any feature gates on?

Ah sorry, I don't mean the volume snapshot feature. What I meant was that when a configmap gets updated, for the volume mount, we make a new timestamped directory with all the contents and then just flip the symlink to point to the new directory. But I don't think we cleanup the older directories until the pod is deleted. So over time, the actual configmap volume directory will keep on growing and never shrink.

@WanLinghao
Copy link
Contributor Author

WanLinghao commented Oct 30, 2018

@msau42
I have test this describe as follows:

  1. Deploy a test cluster with two nodes.
  2. Each node has 100 pods running. Each pod mounts 10 configmap volumes.
  3. Each configmap has about 1MB unique data.
  4. Run a shell script which simulates busy I/O request on one node, while keeping the other node untouched.
  5. Request /stats/summary endpoint on each node for many times.
  6. Observe the difference behaviours between two nodes.
    The conclusion is: everything goes well on both node, no panic, no error. It seems the configmap volume metrics calculation would not trigger the panic you mentioned above.

We should probably run some stress tests to make sure that adding these new sources of du won't excasperate #62917. I believe the configmap max size is 1mb? 1mb really shouldn't be a problem but a pod could have many configmaps. Do we cleanup old snapshots periodically or only when a pod is deleted?

@msau42
Copy link
Member

msau42 commented Oct 30, 2018

Hi @WanLinghao thanks for the thorough testing! Could you also try the following?

  • Make the configmap full of very many small files
  • After the pod has started, update the configmap frequently

@WanLinghao
Copy link
Contributor Author

@msau42 Hello, I have test them like this:

  • For small configmaps, I test a cluster with 100 pods running on each node, and each pod bounded with 100 configmap volumes, each configmap volume has 50 byte data injected. And no error or panic happened.
  • For frequent update configmap, I write a shell script which update configmaps one by one in a dead loop. And I also noticed no error or panic when I try to access stats/summary endpoint.
    As a conclusion, my test proves that configmap volume metrics calculation would not trigger error or panic on most circumstances.

@msau42
Copy link
Member

msau42 commented Nov 1, 2018

Thanks!
/lgtm

/assign @gnufied
cc @dashpole

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2018
@WanLinghao
Copy link
Contributor Author

@gnufied PTAL thanks

@WanLinghao
Copy link
Contributor Author

@gnufied friendly ping

@WanLinghao
Copy link
Contributor Author

@gnufied PTAL thanks

@gnufied
Copy link
Member

gnufied commented Nov 20, 2018

I am bit wary of emitting ephemeral volume stats as metrics. if I understand correctly, this will cause explosion of timer series data in large enough cluster, because most things use ephemeral storage. Also - some of those ephemeral volumes are auto generated and have same lifecycle as pod.

@gnufied
Copy link
Member

gnufied commented Nov 20, 2018

Not to mention - how will a user use these metrics? Say, if I am running out of diskspace in configmap "foo", what is user supposed to do? You can't resize a configmap. You can probably just reduce the amount of data you put in. But since this is limited to 1MB and you get a validation error during creating and updating a configmap - we should be fine.

Also IIRC - most of these ephemeral volume types are read-only now, so they can't grow on the node.

@WanLinghao
Copy link
Contributor Author

@gnufied So should we remove the metrics calculation in secret volume?

volume.NewCachedMetrics(volume.NewMetricsDu(getPath(pod.UID, spec.Name(), plugin.host))),

@msau42
Copy link
Member

msau42 commented Nov 20, 2018

The strange thing is that since ephemeral volumes are on the root disk, the used capacity will be miniscule compared to the boot disk.

Even though volume is mounted read-only, i think it will grow. If you update the config map, then we will create a new directory and update the symlink and remount. I also think we don't delete old directories, so the size will constantly grow if you update the objects frequently

@gnufied
Copy link
Member

gnufied commented Nov 20, 2018

@WanLinghao hmm, I think one thing I got wrong was - these metrics do not automatically show up in prometheus end point. I will verify tomorrow, but I am thinking we should be fine.

@msau42 @WanLinghao also yeah I think symlink may cause volume to grow on disk (as long as volume is on same node). I will take a look again, thanks.

@WanLinghao
Copy link
Contributor Author

@gnufied what's your view now~

@WanLinghao
Copy link
Contributor Author

@gnufied PTAL thanks

@gnufied
Copy link
Member

gnufied commented Dec 6, 2018

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, WanLinghao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2018
@WanLinghao
Copy link
Contributor Author

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Dec 10, 2018
@k8s-ci-robot k8s-ci-robot merged commit 13985d2 into kubernetes:master Dec 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants