resize pv failed #72393

z8772083 · 2018-12-28T08:47:49Z

What happened:
kubectl create -f ceph-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-expand-test
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: ceph-storage

and result is ok
pvc-expand-test Bound pvc-962a674c-0a73-11e9-b2d8-0050569bfc0f 1Gi RWO ceph-storage 1h

now i want to resize pv from 1G to 2G, edit

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-expand-test
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: ceph-storage

kubectl describe pvc pvc-expand-test

Name:          pvc-expand-test
Namespace:     default
StorageClass:  ceph-storage
Status:        Bound
Volume:        pvc-962a674c-0a73-11e9-b2d8-0050569bfc0f
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
               volume.beta.kubernetes.io/storage-provisioner=ceph.com/rbd
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
Conditions:
  Type       Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----       ------  -----------------                 ------------------                ------  -------
  Resizing   True    Mon, 01 Jan 0001 00:00:00 +0000   Fri, 28 Dec 2018 15:40:04 +0800           
Events:
  Type     Reason              Age                From           Message
  ----     ------              ----               ----           -------
  Warning  VolumeResizeFailed  24s (x21 over 1h)  volume_expand  Error expanding volume "default/pvc-expand-test" of plugin kubernetes.io/rbd : rbd info failed, error: can not get image size info kubernetes-dynamic-pvc-962c05a3-0a73-11e9-951a-0a580af404fa: rbd image 'kubernetes-dynamic-pvc-962c05a3-0a73-11e9-951a-0a580af404fa':
           size 1 GiB in 256 objects
           order 22 (4 MiB objects)
           id: 374526b8b4567
           block_name_prefix: rbd_data.374526b8b4567
           format: 2
           features: 
           op_features: 
           flags: 
           create_timestamp: Fri Dec 28 07:38:36 2018

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
ceph storageclass yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ceph-storage
provisioner: ceph.com/rbd
parameters:
  monitors: 192.168.20.195:6789,192.168.20.196:6789,192.168.20.197:6789
  adminId: admin
  adminSecretNamespace: default
  adminSecretName: ceph-secret
  pool: k8s-rbd
  userId: admin
  userSecretName: ceph-secret
  fsType: ext4
  imageFormat: "2"
allowVolumeExpansion: true

ceph secret:


apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
type: "kubernetes.io/rbd"
data:
  key: QVFCaVhBWmNJcdfa1QTJmZXJRRmNRLzBtSnlYZ1BEdmlMakE9PQ==

Environment:

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-11-26T14:41:50Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-11-26T14:31:35Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
CentOS Linux release 7.4.1708 (Core)
Kernel (e.g. uname -a):
Linux m01 3.10.0-693.el7.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
kubeadm
Others:
ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)

/kind bug

The text was updated successfully, but these errors were encountered:

z8772083 · 2018-12-28T08:52:43Z

@kubernetes/sig-storage-bugs

k8s-ci-robot · 2018-12-28T08:52:50Z

@z8772083: Reiterating the mentions to trigger a notification:
@kubernetes/sig-storage-bugs

In response to this:

@kubernetes/sig-storage-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mlmhl · 2018-12-29T02:22:41Z

What's the version of your ceph cluster? k8s uses the rbd info command to get rbd image's size, and it expects the size unit in the output is MB. However, in your environment the unit is GiB:

size 1 GiB in 256 objects
order 22 (4 MiB objects)
...

z8772083 · 2018-12-29T03:08:04Z

@mlmhl ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)

mlmhl · 2018-12-29T12:09:38Z

It seems that the output format is different between different ceph versions. I will try to fix it.

mlmhl · 2019-01-04T02:06:52Z

/assign

polym · 2019-01-08T07:06:14Z

@mlmhl your PR has been merged to master branch, and when will Kubernetes v1.11 fix this bug?

mlmhl · 2019-01-09T12:18:40Z

Sorry for the late, I've send cherry-pick PRs: #72718 #72719 #72720

lucaim · 2019-01-29T10:37:22Z

@mlmhl It seems this change is not working in some cases:

  Type     Reason              Age                    From           Message
  ----     ------              ----                   ----           -------
  Warning  VolumeResizeFailed  100s (x53 over 3h29m)  volume_expand  (combined from similar events): Error expanding volume "namespace/prometheus1-server" of plugin kubernetes.io/rbd : rbd info failed, error: parse rbd info output failed: 2019-01-28 16:47:24.878565 7f7eef611100 -1 did not load config file, using default settings.
{"name":"kubernetes-dynamic-pvc-155ee065-4ede-11e8-b665-02420a141559","size":536870912000,"objects":128000,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.259f7374b0dc51","format":2,"features":[],"flags":[]}, invalid character '-' after top-level value

I think this is due to two concurrent conditions:
1- ceph client still complain on stderr about not being able to load config file even if there is no need as a key is supplied and the result is (hence the output on stderr of 2019-01-28 16:47:24.878565 7f7eef611100 -1 did not load config file, using default settings.). This happens not only in hyperkube.
2- it seems we capture both stdout and stderr from the execution of the rbd info command (please note the output in the error from exec.Run) and then there is then a problem trying to unmarshall the JSON (to be clear, the unuseful warning is on stderr, the json output on stdout).

I am able to have a workaround creating empty files for /etc/ceph/ceph.conf and /etc/ceph/ceph.keyring so that the ceph client does not output anymore but I feel it would be much better to parse only stdout.

A reference on the source code for #72431 https://github.com/kubernetes/kubernetes/pull/72431/files#diff-36f18e327f36d95eb1333c4a18781184R704

Thanks
edit:typo

hasonhai · 2019-02-05T15:46:43Z

@lucaim could you tell where did you put the empty ceph.conf and ceph.keyring files? On the node running kubelet?

lucaim · 2019-02-05T17:42:36Z

@hasonhai Inside the pod of the kube-controller-manager as the rbd resize is executed by this component.

xiyangxdy · 2019-03-02T06:53:04Z

@lucaim I had the same problem, I added ceph. conf and ceph. Keyring and solved it, but I think k8s got the default user of CEPH before going back to ceph. conf and ceph. Keyring. The following code validates my conjecture: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/rbd/rbd_util.go//num 672
Ultimately, I want k8s to get those two files through configurable CEPH parameters.
What do you think?

lucaim · 2019-03-06T16:57:25Z

@xiyangxdy as far as I know you do not need to configure those two files if you use a key. That is still an hypothesis as in the version of Ceph we are using this switch seems to be undocumented: http://docs.ceph.com/docs/luminous/man/8/rbd/
Anyway the warning can be safely ignored and I do not see why we should parse stderr when there is no error

xiyangxdy · 2019-03-07T07:21:18Z

@lucaim I added two files with empty content before, no error; later I added a soft connection called ceph.conf and ceph. Keyring (pointing to the ceph configuration file and key ring generated by my cluster), but ceph did not report an error. . I think ceph only needs two files to exist and doesn't care about its contents. Fortunately, the current measures can solve the problem.

lucaim · 2019-03-07T07:31:01Z

@xiyangxdy this matches my experience (see above, I created the empty files too and it gives no error) but another possibility is just not parse stderr which I find "cleaner" as there is no error in this case even from the error code, and even what we get is just a warning.

cschramm · 2019-07-01T15:29:39Z

Same issue as @lucaim here. The warnings about the missing etc files break the JSON output. Adding empty files in the kube-controller-manager container fixes resizing.

Bluewind · 2019-07-17T14:28:38Z

Shouldn't this be reopened until the bug is fixed?

Creating the two files also "solves" the issue for us.

msau42 · 2019-07-17T14:46:50Z

/reopen
cc @gnufied @humblec

k8s-ci-robot · 2019-07-17T14:46:51Z

@msau42: Reopened this issue.

In response to this:

/reopen
cc @gnufied @humblec

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

humblec · 2019-07-18T04:08:21Z

@lucaim @Bluewind which version of ceph cluster is in use in your setup ?

Bluewind · 2019-07-18T07:55:00Z

We are running a ceph cluster with ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable). kube-controller-manager is running rancher/hyperkube:v1.14.3-rancher1 which internally uses ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e). Also just to prevent any confusion: I've used kubectl edit to resize the PVC, not rancher.

Bluewind · 2019-07-18T08:02:59Z

This also seems related to #66757.

lucaim · 2019-07-18T08:23:57Z

@humblec We are running ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)

adampl · 2019-10-11T21:57:06Z

Are there plans to resolve this issue?

fejta-bot · 2020-01-09T22:46:29Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-02-08T23:27:42Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-03-10T00:11:30Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-03-10T00:11:38Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubernetes#72393

adampl · 2020-12-09T00:06:53Z

Answering myself - apparently it has been fixed in k8s 1.20: #92027

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 28, 2018

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 28, 2018

mlmhl mentioned this issue Dec 29, 2018

Get rbd image size more accurately #72431

Merged

k8s-ci-robot closed this as completed in #72431 Jan 3, 2019

k8s-ci-robot assigned mlmhl Jan 4, 2019

jakubek mentioned this issue Jan 8, 2019

Ceph RBD expansion fails #69082

Closed

k8s-ci-robot reopened this Jul 17, 2019

wnxn mentioned this issue Oct 25, 2019

存储卷扩容失败 kubesphere/kubesphere#1156

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 8, 2020

k8s-ci-robot closed this as completed Mar 10, 2020

gucki added a commit to netskin/kubernetes that referenced this issue Aug 3, 2020

Fixes resize pv failed kubernetes#72393

8458bb6

kubernetes#72393

superseb mentioned this issue Nov 27, 2020

failed to resize ceph rbd image rancher/rke#2341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resize pv failed #72393

resize pv failed #72393

z8772083 commented Dec 28, 2018

z8772083 commented Dec 28, 2018

k8s-ci-robot commented Dec 28, 2018

mlmhl commented Dec 29, 2018

z8772083 commented Dec 29, 2018

mlmhl commented Dec 29, 2018

mlmhl commented Jan 4, 2019

polym commented Jan 8, 2019

mlmhl commented Jan 9, 2019

lucaim commented Jan 29, 2019 •

edited

Loading

hasonhai commented Feb 5, 2019

lucaim commented Feb 5, 2019

xiyangxdy commented Mar 2, 2019

lucaim commented Mar 6, 2019

xiyangxdy commented Mar 7, 2019

lucaim commented Mar 7, 2019

cschramm commented Jul 1, 2019 •

edited

Loading

Bluewind commented Jul 17, 2019

msau42 commented Jul 17, 2019

k8s-ci-robot commented Jul 17, 2019

humblec commented Jul 18, 2019

Bluewind commented Jul 18, 2019

Bluewind commented Jul 18, 2019

lucaim commented Jul 18, 2019

adampl commented Oct 11, 2019

fejta-bot commented Jan 9, 2020

fejta-bot commented Feb 8, 2020

fejta-bot commented Mar 10, 2020

k8s-ci-robot commented Mar 10, 2020

adampl commented Dec 9, 2020

resize pv failed #72393

resize pv failed #72393

Comments

z8772083 commented Dec 28, 2018

z8772083 commented Dec 28, 2018

k8s-ci-robot commented Dec 28, 2018

mlmhl commented Dec 29, 2018

z8772083 commented Dec 29, 2018

mlmhl commented Dec 29, 2018

mlmhl commented Jan 4, 2019

polym commented Jan 8, 2019

mlmhl commented Jan 9, 2019

lucaim commented Jan 29, 2019 • edited Loading

hasonhai commented Feb 5, 2019

lucaim commented Feb 5, 2019

xiyangxdy commented Mar 2, 2019

lucaim commented Mar 6, 2019

xiyangxdy commented Mar 7, 2019

lucaim commented Mar 7, 2019

cschramm commented Jul 1, 2019 • edited Loading

Bluewind commented Jul 17, 2019

msau42 commented Jul 17, 2019

k8s-ci-robot commented Jul 17, 2019

humblec commented Jul 18, 2019

Bluewind commented Jul 18, 2019

Bluewind commented Jul 18, 2019

lucaim commented Jul 18, 2019

adampl commented Oct 11, 2019

fejta-bot commented Jan 9, 2020

fejta-bot commented Feb 8, 2020

fejta-bot commented Mar 10, 2020

k8s-ci-robot commented Mar 10, 2020

adampl commented Dec 9, 2020

lucaim commented Jan 29, 2019 •

edited

Loading

cschramm commented Jul 1, 2019 •

edited

Loading