Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use intree-rbd to expand pvc failed #88643

Closed
huchengze opened this issue Feb 28, 2020 · 11 comments · Fixed by #92027
Closed

Use intree-rbd to expand pvc failed #88643

huchengze opened this issue Feb 28, 2020 · 11 comments · Fixed by #92027
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@huchengze
Copy link
Contributor

What happened:

When I use intree-rbd to expand a pvc from 20Gi to 30Gi, it has been in the Resizing state for a long time.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbd
  creationTimestamp: "2020-02-28T02:36:26Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: intree-rbd-pvc
  namespace: default
  resourceVersion: "113553071"
  selfLink: /api/v1/namespaces/default/persistentvolumeclaims/intree-rbd-pvc
  uid: 1e3887cd-59d3-11ea-b7ad-eeeeeeeeeeee
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi
  storageClassName: intree-rbd-sc
  volumeMode: Filesystem
  volumeName: pvc-1e3887cd-59d3-11ea-b7ad-eeeeeeeeeeee
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 20Gi
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-02-28T02:47:59Z"
    status: "True"
    type: Resizing
  phase: Bound

And kube-controller-manager reported this error:

E0228 02:48:02.557189       1 nestedpendingoperations.go:267] Operation for "\"1e3887cd-59d3-11ea-b7ad-eeeeeeeeeeee\"" failed. No retries permitted until 2020-02-28 02:48:03.05705946 +0000 UTC m=+327.117830161 (durationBeforeRetry 500ms). Error: "error expanding volume \"default/intree-rbd-pvc\" of plugin \"kubernetes.io/rbd\": rbd info failed, error: parse rbd info output failed: 2020-02-28 02:47:59.305808 7f769ab000 -1 did not load config file, using default settings.\n2020-02-28 02:47:59.320080 7f769ab000 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory\n2020-02-28 02:47:59.321712 7f747ee850  0 -- :/243049777 >> 172.16.2.105:6789/0 pipe(0x1213c9fa0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x1213c7e00).fault\n{\"name\":\"kubernetes-dynamic-pvc-11614948-59d4-11ea-a0dd-eeeeeeeeeeee\",\"size\":21474836480,\"objects\":5120,\"order\":22,\"object_size\":4194304,\"block_name_prefix\":\"rbd_data.b372176b8b4567\",\"format\":2,\"features\":[\"layering\"],\"flags\":[]}, invalid character '-' after top-level value"
I0228 02:48:02.557190       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"intree-rbd-pvc", UID:"1e3887cd-59d3-11ea-b7ad-eeeeeeeeeeee", APIVersion:"v1", ResourceVersion:"113553071", FieldPath:""}): type: 'Warning' reason: 'VolumeResizeFailed' error expanding volume "default/intree-rbd-pvc" of plugin "kubernetes.io/rbd": rbd info failed, error: parse rbd info output failed: 2020-02-28 02:47:59.305808 7f769ab000 -1 did not load config file, using default settings.
2020-02-28 02:47:59.320080 7f769ab000 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2020-02-28 02:47:59.321712 7f747ee850  0 -- :/243049777 >> 172.16.2.105:6789/0 pipe(0x1213c9fa0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x1213c7e00).fault
{"name":"kubernetes-dynamic-pvc-11614948-59d4-11ea-a0dd-eeeeeeeeeeee","size":21474836480,"objects":5120,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.b372176b8b4567","format":2,"features":["layering"],"flags":[]}, invalid character '-' after top-level value

The most important msg is:

rbd info failed, error: parse rbd info output failed

I found this err:

return 0, fmt.Errorf("parse rbd info output failed: %s, %v", string(output), err)

After I read this part of the code, I realized that the output of command "rbd info" is not a struct.

What you expected to happen:

Pvc expand successful.

@huchengze huchengze added the kind/bug Categorizes issue or PR as related to a bug. label Feb 28, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 28, 2020
@huchengze
Copy link
Contributor Author

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 28, 2020
@tedyu
Copy link
Contributor

tedyu commented Feb 28, 2020

invalid character '-' after top-level value

@humblec
Copy link
Contributor

humblec commented Mar 24, 2020

/assign @humblec

@humblec
Copy link
Contributor

humblec commented Mar 24, 2020

@huchengze can you please let me know whats the ceph cluster version being used in your setup ? I am working on the fix.

@huchengze
Copy link
Contributor Author

@humblec I totally forgot this...

ceph> version
ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)

@humblec
Copy link
Contributor

humblec commented Mar 24, 2020

@huchengze by any chance you have a nautilus cluster or any plans to migrate? if thats the case I can directly check on nautilus.

@huchengze
Copy link
Contributor Author

@humblec I don't have a nautilus cluster,but I think you can check on nautilus.
By the way, I have fix this in #88753.
But I forgot to fix and add some testcase in rbd_test.go.

@juliantaylor
Copy link
Contributor

isn't the problem simply the use of CombinedOutput in https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/rbd/rbd_util.go#L711

that includes the stderr line of rbd info (which starts with -1) which is not json, the json is only on stdout

@tedyu
Copy link
Contributor

tedyu commented Mar 25, 2020

How about the following change ?

diff --git a/pkg/volume/rbd/rbd_util.go b/pkg/volume/rbd/rbd_util.go
index 94b409aa88a..6228d34399b 100644
--- a/pkg/volume/rbd/rbd_util.go
+++ b/pkg/volume/rbd/rbd_util.go
@@ -708,7 +708,7 @@ func (util *rbdUtil) rbdInfo(b *rbdMounter) (int, error) {
        //
        klog.V(4).Infof("rbd: info %s using mon %s, pool %s id %s key %s", b.Image, mon, b.Pool, id, secret)
        output, err = b.exec.Command("rbd",
-               "info", b.Image, "--pool", b.Pool, "-m", mon, "--id", id, "--key="+secret, "-k=/dev/null", "--format=json").CombinedOutput()
+               "info", b.Image, "--pool", b.Pool, "-m", mon, "--id", id, "--key="+secret, "-k=/dev/null", "--format=json").Output()

        if err, ok := err.(*exec.Error); ok {
                if err.Err == exec.ErrNotFound {

@juliantaylor
Copy link
Contributor

if the only stderr line being parsed on your cluster is -1 did not load config file, using default settings. you can work around that by providing the controller-manager an empty /etc/ceph/ceph.conf

@huchengze
Copy link
Contributor Author

@juliantaylor @tedyu
If you have time to make a PR for this bug, please just do it.
I'm too busy these days, and my network is so poor that I can't even clone the code.
Thanks a lot.

juliantaylor added a commit to juliantaylor/kubernetes that referenced this issue Jun 11, 2020
Ignore stderr of rbd info --format=json as without a ceph.conf it will
print messages about no configuration onto stderr which break the
json parsing.

The actual json information the function wants is always on stdout.

Closes: kubernetesgh-88643

Signed-off-by: Julian Taylor <juliantaylor108@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
5 participants