Cannot create CephObjectStore with external ceph cluster #13827

achernya · 2024-02-27T19:30:08Z

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

I have an external ceph cluster that I imported using the instructions at https://rook.io/docs/rook/v1.13/Getting-Started/intro/. The cluster has rbd and cephfs services installed and exposed, and those were imported successfully. However, this ceph cluster does not have an existing rgw running.

I then went and followed the instructions on https://rook.io/docs/rook/latest-release/CRDs/Object-Storage/ceph-object-store-crd/ to create a CephObjectStore, placing the resource in the rook-ceph-external namespace.

This resulted in the operator having the following logs:

2024-02-27 19:12:08.437400 I | ceph-spec: detecting the ceph image version for image quay.io/ceph/ceph:v18.2.1...
2024-02-27 19:12:12.865292 I | ceph-spec: detected ceph image version: "18.2.1-0 reef"
2024-02-27 19:12:15.686430 I | ceph-object-controller: reconciling object store deployments
2024-02-27 19:12:15.854261 I | ceph-object-controller: ceph object store gateway service running at 10.254.252.62
2024-02-27 19:12:15.854313 I | ceph-object-controller: reconciling object store pools
2024-02-27 19:12:16.762490 E | ceph-object-controller: failed to reconcile CephObjectStore "rook-ceph-external/my-store". failed to create object store deployments: failed to create object pools: failed to create metadata pools: failed to create pool "my-store.rgw.control": failed to create replicated crush rule "my-store.rgw.control": failed to create crush rule my-store.rgw.control: exit status 13

I wasn't sure what exit status 13 meant, so I enabled debug logs, which didn't help as CephToolCommand.Run doesn't seem to log its output anywhere I can tell inside createReplicationCrushRule

and I ended up strace'ing outside the operator container to figure out what ceph command the operator was running, which turned out to be

/usr/bin/ceph status --connect-timeout=15 --cluster=rook-ceph-external --conf=/var/lib/rook/rook-ceph-external/rook-ceph-external.config --name=client.admin --keyring=/var/lib/rook/rook-ceph-external/client.admin.keyring --format json

If I run that command myself, I get

2024-02-27T18:58:37.787+0000 7ff5feaf2700 -1 auth: unable to find a keyring on /var/lib/rook/rook-ceph-external/client.admin.keyring: (2) No such file or directory
2024-02-27T18:58:37.787+0000 7ff5feaf2700 -1 AuthRegistry(0x7ff5f8064978) no keyring found at /var/lib/rook/rook-ceph-external/client.admin.keyring, disabling cephx
2024-02-27T18:58:37.791+0000 7ff5feaf2700 -1 auth: unable to find a keyring on /var/lib/rook/rook-ceph-external/client.admin.keyring: (2) No such file or directory
2024-02-27T18:58:37.791+0000 7ff5feaf2700 -1 AuthRegistry(0x7ff5f80680f0) no keyring found at /var/lib/rook/rook-ceph-external/client.admin.keyring, disabling cephx
2024-02-27T18:58:37.795+0000 7ff5feaf2700 -1 auth: unable to find a keyring on /var/lib/rook/rook-ceph-external/client.admin.keyring: (2) No such file or directory
2024-02-27T18:58:37.795+0000 7ff5feaf2700 -1 AuthRegistry(0x7ff5feaf0ea0) no keyring found at /var/lib/rook/rook-ceph-external/client.admin.keyring, disabling cephx
2024-02-27T18:58:37.795+0000 7ff5fc88e700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2024-02-27T18:58:37.795+0000 7ff5fd08f700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2024-02-27T18:58:37.795+0000 7ff5effff700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2024-02-27T18:58:37.795+0000 7ff5feaf2700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
[errno 13] RADOS permission denied (error connecting to the cluster)

Which makes sense, the external cluster only created client.healthchecker, not client.admin.

None of the documentation makes it clear if this is a supported configuration, and the error reporting leaves a bit to be desired if it is not. It is not clear to me if I should just change my envvars to import-external-cluster.sh to set ROOK_EXTERNAL_ADMIN_SECRET, and what the downsides are of doing that could be.

Expected behavior:
CephObjectStore is created successfully.

How to reproduce it (minimal and precise):

Create an external ceph cluster (in my case, it was created by Proxmox automatically, see https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster)
Import the ceph cluster into rook-ceph
Follow the instructions on https://rook.io/docs/rook/latest-release/CRDs/Object-Storage/ceph-object-store-crd/ to create a CephObjectStore

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary

Logs to submit:

Operator's logs, if necessary
Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read GitHub documentation if you need help.

Cluster Status to submit:

Output of kubectl commands, if necessary

To get the health of the cluster, use kubectl rook-ceph health
To get the status of the cluster, use kubectl rook-ceph ceph status
For more details, see the Rook kubectl Plugin

Environment:

OS (e.g. from /etc/os-release): Debian GNU/Linux 12 (bookworm)
Kernel (e.g. uname -a): 6.1.0-18-cloud-amd64
Cloud provider or hardware configuration: VM hosted on Proxmox
Rook version (use rook version inside of a Rook Pod): rook: v1.13.3
Storage backend version (e.g. for ceph do ceph -v): ceph version 17.2.7 (2dd3854d5b35a35486e86e2616727168e244f470) quincy (stable)
Kubernetes version (use kubectl version): v1.29.2
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm init'd
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

The text was updated successfully, but these errors were encountered:

travisn · 2024-02-27T21:41:46Z

@achernya Did you create the object store with the object-external.yaml example? I suspect you create the object store with object.yaml, which is not for external cluster configuration. The failure looks like from attempting to create a CRUSH rule, which sounds like it's trying to fully create pools in the external cluster, which it doesn't have access to.

See also the Connect to an external object store topic.

achernya · 2024-02-27T21:52:45Z

I used object.yaml, yes. My read of object-external.yaml is it's trying to set up a CRD pointing to an external rgw -- which I don't have. I actually do want rook-ceph to provision the radosgw frontends.

It sounds like this is potentially an unsupported configuration, and I should instead provision rgw externally and then use object-external.yaml?

BlaineEXE · 2024-02-27T21:57:31Z

I believe (but am not certain) that the configuration you describe is possible. It looks like the current issue may be that the Rook cluster might not have an admin key, which is necessary to set up things for running against an external cluster.

It's also possible that there are some internal issues with Rook regarding radosgw-admin and admin API usage that makes Rook unable to fully realize the integration as desired.

When you ran this step, did you specify the Ceph admin key and keyring? https://rook.io/docs/rook/latest-release/CRDs/Cluster/external-cluster/?h=key#1-create-all-users-and-keys

@parth-gr might have some additional thoughts about this as well.

achernya · 2024-02-27T23:01:17Z

@BlaineEXE you are correct the admin key is not present. Or rather, what it's in kubectl get secret -n rook-ceph-external rook-ceph-mon -o json is "admin-secret": "YWRtaW4tc2VjcmV0",, which base64-decodes to admin-secret. This is the behavior I see in the import script: https://github.com/rook/rook/blob/master/deploy/examples/import-external-cluster.sh#L30

I did not specify the admin key and keyring. I ran the export script with --rbd-data-pool, --cephfs-data-pool, and --format=bash. From my read of the documentation and create-external-cluster-resources.py I thought "optional" meant it would be auto-detected.

achernya · 2024-03-01T15:32:47Z

@BlaineEXE I tried passing --keyring and --ceph-conf as you suggested,

python3 ./rook/create-external-cluster-resources.py --rbd-data-pool-name=ssdpool --cephfs-data-pool-name=cephfs_data_ec --format=bash --output=no_key.sh
python3 ./rook/create-external-cluster-resources.py --rbd-data-pool-name=ssdpool --cephfs-data-pool-name=cephfs_data_ec --format=bash --output=key.sh --keyring=/etc/pve/priv/ceph.client.admin.keyring --ceph-conf=/etc/pve/ceph.conf
diff -u no_key.sh key.sh

and there is no output change.

parth-gr · 2024-03-01T17:02:49Z

@achernya you need to pass the --rgw-endpoint while running the python script,
See this https://rook.io/docs/rook/latest-release/CRDs/Cluster/external-cluster/#1-create-all-users-and-keys for more info

achernya · 2024-03-01T17:06:18Z

@parth-gr as I mentioned in my initial comment I do not have an existing radosgw configuration for this external ceph cluster, and my goal is to get rook to provision the radosgw inside the k8s cluster.

parth-gr · 2024-03-04T11:06:50Z

@achernya first of all I would like to ask why did you want this type confguration, if its external ceph it wont be rook responsibility to manage its daemons and I believe there would be checks in the code, if it is external then skip its management,

And still, if you want to test something out of the box I would say this we don't support.

But if you are interested in knowing how the creation can be possible,
You need to update all the caps to * for the health checker here, https://github.com/rook/rook/blob/master/deploy/examples/create-external-cluster-resources.py#L1007

"mon": " allow *"... like this,

Then the ROOK_EXTERNAL_USER_SECRET created would have the admin privilege.

then I think the rgw pool creation will be success.

currently the script requires to have both v2 and v1 port to enable v2 port, but that is not the necessary condition, so removing the chek, and enabling it only v2 is present to successfully configure with v2 only part-of: rook#13827 Signed-off-by: parth-gr <partharora1010@gmail.com>

sometimes user want to use the admin ower to create some resources in the external ceph cluster, so adding a way to use the admin privilege part-of: rook#13827 Signed-off-by: parth-gr <partharora1010@gmail.com>

currently the script requires to have both v2 and v1 port to enable v2 port, but that is not the necessary condition, so removing the chek, and enabling it only v2 is present to successfully configure with v2 only part-of: #13827 Signed-off-by: parth-gr <partharora1010@gmail.com> (cherry picked from commit 117bc76)

achernya · 2024-03-04T21:25:56Z

@parth-gr

I would like to ask why did you want this type confguration

In my environment, I have a hyper-converged setup where the hypervisors have VMs with ceph-rbd storage, and I want the same ceph cluster to be used by the k8s environment. My underlying hypervisors (proxmox) don't set up rgw, as it would want/take advantage of loadbalancers. I was hoping to run the rgw portions of the system in k8s, where my loadbalancers already exist/can be easily set up.

then I think the rgw pool creation will be success

From my strace in my initial report, the creation command was explicitly looking for the client.admin keyring. That leads me to believe that simply granting client.healthchecker these privileges are necessary, but not sufficient to make this work.

sometimes user want to use the admin ower to create some resources in the external ceph cluster, so adding a way to use the admin privilege part-of: rook#13827 Signed-off-by: parth-gr <partharora1010@gmail.com>

parth-gr · 2024-03-06T08:47:13Z

@achernya can you share the logs again, what it is complaining for now?

And restart the rook operator pod after this privileges changes, or sometimes it require node reboot where operator pod is running

sometimes user want to use the admin ower to create some resources in the external ceph cluster, so adding a way to use the admin privilege part-of: rook#13827 Signed-off-by: parth-gr <partharora1010@gmail.com>

github-actions · 2024-05-17T20:02:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

achernya added the bug label Feb 27, 2024

parth-gr mentioned this issue Mar 4, 2024

external: enable the use of only v2 mon port #13856

Merged

6 tasks

parth-gr mentioned this issue Mar 4, 2024

doc: add steps to configure external mode with admin privileges #13858

Merged

6 tasks

github-actions bot added the wontfix label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot create CephObjectStore with external ceph cluster #13827

Cannot create CephObjectStore with external ceph cluster #13827

achernya commented Feb 27, 2024 •

edited

travisn commented Feb 27, 2024

achernya commented Feb 27, 2024

BlaineEXE commented Feb 27, 2024

achernya commented Feb 27, 2024

achernya commented Mar 1, 2024

parth-gr commented Mar 1, 2024

achernya commented Mar 1, 2024

parth-gr commented Mar 4, 2024

achernya commented Mar 4, 2024 •

edited

parth-gr commented Mar 6, 2024 •

edited

github-actions bot commented May 17, 2024

Cannot create CephObjectStore with external ceph cluster #13827

Cannot create CephObjectStore with external ceph cluster #13827

Comments

achernya commented Feb 27, 2024 • edited

travisn commented Feb 27, 2024

achernya commented Feb 27, 2024

BlaineEXE commented Feb 27, 2024

achernya commented Feb 27, 2024

achernya commented Mar 1, 2024

parth-gr commented Mar 1, 2024

achernya commented Mar 1, 2024

parth-gr commented Mar 4, 2024

achernya commented Mar 4, 2024 • edited

parth-gr commented Mar 6, 2024 • edited

github-actions bot commented May 17, 2024

achernya commented Feb 27, 2024 •

edited

achernya commented Mar 4, 2024 •

edited

parth-gr commented Mar 6, 2024 •

edited