Cannot make use of RADOS namespace for external cluster with EC block pool #13633

bauerjs1 · 2024-01-26T17:28:08Z

When using an erasure-coded data pool for RBDs, there must exist an additional replicated metadata pool. Since Ceph does not support RADOS namespaces in EC pools, I only created it in the metadata pool. When I pass the flags

--rbd-data-pool-name ceph-block
--rbd-metadata-ec-pool-name ceph-block-metadata # afaik metadata pool can't be EC so the name is a bit misleading
--rados-namespace staging

to create-external-cluster-resources.py, the script looks for namespace staging in the EC data pool ceph-block (where it cannot exist) instead of the metadata pool.

How can I use tenant isolation with RADOS namespaces for external clusters when the data pool for RBDs is erasure-coded?

Thanks in advance!

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:
create-external-cluster-resources.py script fails with

The provided rados Namespace, 'staging', is not found in the pool 'ceph-block'

Expected behavior:
To my understanding, the script should look for the provided namespace in the metadata block pool if the data pool is erasure-coded (please correct me if I am wrong here)

How to reproduce it:
Apply below resources in source cluster and try to extract information for a Rook external consumer cluster with

python3 create-external-cluster-resources.py --rbd-data-pool-name ceph-block --rbd-metadata-ec-pool-name ceph-block-metadata --rados-namespace staging

File(s) to submit:
Resources created in the source cluster:

# metadata pool (for use in consumer StorageClass.parameters.pool)
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: ceph-block-metadata
spec:
  replicated:
    size: 3
---
# data pool (for use in consumer StorageClass.parameters.dataPool)
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: ceph-block
spec:
  erasureCoded:
    codingChunks: 1
    dataChunks: 2
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPoolRadosNamespace
metadata:
  name: staging
spec:
  blockPoolName: ceph-block-metadata
  name: staging

Environment:
Source cluster:

OS: Ubuntu 20.04
Kernel: 5.15.0
Rook version: 1.13.3
Storage backend version: 18.2.1
Kubernetes version: 1.25
Kubernetes cluster type: bare metal

The text was updated successfully, but these errors were encountered:

bauerjs1 · 2024-02-09T09:35:09Z

Any ideas? I would probably go with a replicated pool for block storage, otherwise.

subhamkrai · 2024-02-09T10:10:29Z

@bauerjs1 Something seems to be wrong, please wait for @parth-gr to response he will back in next week. Thanks

parth-gr · 2024-02-12T08:53:48Z

@bauerjs1 We currently do not support rados namespace with ec block pool, I think even not ceph,

To my understanding, the script should look for the provided namespace in the metadata block pool if the data pool is erasure-coded (please correct me if I am wrong here)

That can be done, but its just the python script check, we need that support from the ceph backend.

I you have any recommendations on how to use it please let us know.

cc @travisn

bauerjs1 · 2024-02-13T14:23:46Z

Yes, afaik Ceph does not support namespaced EC block pool.

To my understanding, the script should look for the provided namespace in the metadata block pool if the data pool is erasure-coded (please correct me if I am wrong here)

That can be done, but its just the python script check, we need that support from the ceph backend.

I you have any recommendations on how to use it please let us know.

I am not sure what you mean here.
Shouldn't it be sufficient that if --rbd-metadata-ec-pool-name is provided, the script should look for the --rados-namespace there (instead of the data pool). This should be supported by Ceph, I was able to do that from the CLI.

However, I am unsure if that construct of having a namespaced metadata pool but a non-namespaced data pool would make sense at all. I am far from being a Ceph expert and that just felt most intuitive to me, tbh.

parth-gr · 2024-02-13T17:26:30Z

I tried creating radosnamesapce in ec pool and it was successful, then I created a ec Storageclass(updating the cluster-id throuh radosnamesapce status) and created the pvc,

But I got this warning in pvc and still in pending state, not sure how it worked for you @bauerjs1

Type     Reason                Age                  From                                                                                                                Message
  ----     ------                ----                 ----                                                                                                                -------
  Normal   Provisioning          8s (x9 over 2m16s)   [openshift-storage.rbd.csi.ceph.com](http://openshift-storage.rbd.csi.ceph.com/)_csi-rbdplugin-provisioner-57488675f8-mvhh6_e21e5f28-22bf-46eb-9469-3fe1119588ee  External provisioner is provisioning volume for claim "openshift-storage/rbd-pvc-test"
  Warning  ProvisioningFailed    8s (x9 over 2m16s)   openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-57488675f8-mvhh6_e21e5f28-22bf-46eb-9469-3fe1119588ee  failed to provision volume with StorageClass "rook-ceph-block-ec": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (cb04b59bcd4b2535c5cb1b1ebec4c59b): missing configuration for cluster ID "cb04b59bcd4b2535c5cb1b1ebec4c59b"
  Normal   ExternalProvisioning  1s (x11 over 2m16s)  persistentvolume-controller                                                                                         Waiting for a volume to be created either by the external provisioner 'openshift-storage.rbd.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
[rider@localhost rbd]$

cc @Madhu-1 should this need fix in csi

parth-gr · 2024-02-13T17:28:57Z

[rider@localhost rbd]$ oc get CephBlockPoolRadosNamespace -o yaml
apiVersion: v1
items:
- apiVersion: ceph.rook.io/v1
  kind: CephBlockPoolRadosNamespace
  metadata:
    creationTimestamp: "2024-02-13T15:09:40Z"
    finalizers:
    - cephblockpoolradosnamespace.ceph.rook.io
    generation: 1
    managedFields:
    - apiVersion: ceph.rook.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:blockPoolName: {}
      manager: kubectl-create
      operation: Update
      time: "2024-02-13T15:09:40Z"
    - apiVersion: ceph.rook.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"cephblockpoolradosnamespace.ceph.rook.io": {}
      manager: rook
      operation: Update
      time: "2024-02-13T15:09:40Z"
    - apiVersion: ceph.rook.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:info:
            .: {}
            f:clusterID: {}
          f:phase: {}
      manager: rook
      operation: Update
      subresource: status
      time: "2024-02-13T15:10:58Z"
    name: namespace-a
    namespace: openshift-storage
    resourceVersion: "3688541"
    uid: 8fde04f9-1a40-4098-b552-a0978e6f4146
  spec:
    blockPoolName: ec-pool
  status:
    info:
      clusterID: cb04b59bcd4b2535c5cb1b1ebec4c59b
    phase: Failure
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

parth-gr · 2024-02-13T17:30:14Z

ohh isee now even the rados namespace is in failurre state so we cant create radosnamesapce on ec pool

bauerjs1 · 2024-02-13T19:36:47Z

Yes, afaik Ceph does not support namespaced EC block pool.

Yes, unfortunately Ceph doesn't support this. So the idea was to create the namespaces only in the metadata pool (which is always replicated and not erasure-coded). I can confirm that Rook's RADOS namespace objects are successfully created on the metadata pool:

apiVersion: ceph.rook.io/v1
kind: CephBlockPoolRadosNamespace
metadata:
  name: staging
  namespace: storage
spec:
  blockPoolName: ceph-block-metadata
  name: staging
status:
  info:
    clusterID: 429b...
  phase: Ready

However, there is currently no possibility to pass this to the create-external-cluster-resources.py script.

currently addded the support for rados namesapce for rbd ec pools upstream closes: rook#13633 Signed-off-by: parth-gr <partharora1010@gmail.com>

parth-gr · 2024-02-14T14:57:17Z

@bauerjs1 is this what you are looking for #13769,
Can you try it now

parth-gr · 2024-02-19T11:35:15Z

@bauerjs1 so can you try and test the new script changes #13769 and lemme know is that fix you wanted, as the requirements were quite confusing.

bauerjs1 · 2024-02-21T20:43:44Z

Sorry for the delay @parth-gr, I'm quite busy with different stuff in the past few weeks but I'm currently on it again. I will let you know as soon as I have tested it.

bauerjs1 · 2024-02-22T15:31:19Z

I suppose it's a different problem, but I am now getting the error

Execution Failed: Only 'v2' address type is enabled, user should also enable 'v1' type as well

so I am not yet able to tell if this works now.

PR #8083 seems to be the origin of the error message but I have no clue, why this fails and how to resolve that. Since I enabled encryption, I passed the flag --v2-port-enable as explained in the docs.

Networking config:

  network:
    connections:
      compression:
        enabled: false
      encryption:
        enabled: true
      requireMsgr2: false

parth-gr · 2024-02-22T16:42:47Z

@bauerjs1 so you were able to test the original change for ec pools?

bauerjs1 · 2024-02-22T17:13:43Z

Yes I tried to test it but I'm afraid I can't tell you whether the original issue is solved because the script still fails, as mentioned above

parth-gr · 2024-02-26T11:45:48Z

@bauerjs1 can you show the output of
ceph quorum_status

bauerjs1 · 2024-02-28T08:54:31Z

This is the output:

{
  "election_epoch": 12,
  "quorum": [
    0,
    1,
    2
  ],
  "quorum_names": [
    "a",
    "b",
    "c"
  ],
  "quorum_leader_name": "a",
  "quorum_age": 505050,
  "features": {
    "quorum_con": "4540138322906710015",
    "quorum_mon": [
      "kraken",
      "luminous",
      "mimic",
      "osdmap-prune",
      "nautilus",
      "octopus",
      "pacific",
      "elector-pinging",
      "quincy",
      "reef"
    ]
  },
  "monmap": {
    "epoch": 3,
    "fsid": "49b5d4a6-a0bb-4c8f-9736-1f57ba3a5425",
    "modified": "2024-02-22T12:29:01.352672Z",
    "created": "2024-02-22T12:28:22.909053Z",
    "min_mon_release": 18,
    "min_mon_release_name": "reef",
    "election_strategy": 1,
    "disallowed_leaders: ": "",
    "stretch_mode": false,
    "tiebreaker_mon": "",
    "removed_ranks: ": "",
    "features": {
      "persistent": [
        "kraken",
        "luminous",
        "mimic",
        "osdmap-prune",
        "nautilus",
        "octopus",
        "pacific",
        "elector-pinging",
        "quincy",
        "reef"
      ],
      "optional": []
    },
    "mons": [
      {
        "rank": 0,
        "name": "a",
        "public_addrs": {
          "addrvec": [
            {
              "type": "v2",
              "addr": "10.233.5.247:3300",
              "nonce": 0
            }
          ]
        },
        "addr": "10.233.5.247:3300/0",
        "public_addr": "10.233.5.247:3300/0",
        "priority": 0,
        "weight": 0,
        "crush_location": "{}"
      },
      {
        "rank": 1,
        "name": "b",
        "public_addrs": {
          "addrvec": [
            {
              "type": "v2",
              "addr": "10.233.44.101:3300",
              "nonce": 0
            }
          ]
        },
        "addr": "10.233.44.101:3300/0",
        "public_addr": "10.233.44.101:3300/0",
        "priority": 0,
        "weight": 0,
        "crush_location": "{}"
      },
      {
        "rank": 2,
        "name": "c",
        "public_addrs": {
          "addrvec": [
            {
              "type": "v2",
              "addr": "10.233.24.101:3300",
              "nonce": 0
            }
          ]
        },
        "addr": "10.233.24.101:3300/0",
        "public_addr": "10.233.24.101:3300/0",
        "priority": 0,
        "weight": 0,
        "crush_location": "{}"
      }
    ]
  }
}

I see that the mons have only v2 ports enabled but I did not find a way to change that. I thought it would be requireMsgr2: false and I bootstrapped a fresh cluster with this setting but it didn't help.

parth-gr · 2024-02-28T10:10:17Z

this is moreover a ceph configuration nothing to relate with external script,

Currently external script either checks if v1 exists or both v1 and v2 exists,

@travisn should we support where only v2 exists?

bauerjs1 · 2024-02-28T14:19:55Z

Since this cluster is set up by rook, is there a possibility to enable v1 ports, e.g. in CephCluster.spec?

travisn · 2024-02-28T18:31:59Z

With the network encryption enabled, the v2 ports are enabled. If you change it to false, the v1 ports should be available.

    encryption:
        enabled: true

parth-gr · 2024-02-29T13:09:00Z

So this is for the external mode, lets discuss

bauerjs1 · 2024-02-29T15:27:33Z

With the network encryption enabled, the v2 ports are enabled. If you change it to false, the v1 ports should be available.
    encryption:
        enabled: true

Hm, I still don't understand, why v1 ports are required. Doesn't this render encryption unusable for external clusters?

According to the docs I'd just have to use the --v2-port-enable flag when using encryption. Which I did, but it does not seem to remove the dependency on v1 ports. Is the documentation wrong here?

parth-gr · 2024-03-04T11:46:05Z

@bauerjs1 yes that is moreover the internal mode settings,

With this fix #13856 you can use just v2 port

parth-gr · 2024-03-05T14:26:55Z

@bauerjs1 so #13856 is fixed can you try out the #13769

bauerjs1 · 2024-03-19T16:42:29Z

Sorry for delay, I'll get to this in the next couple of days, hopefully. Thanks in advance for the effort!

bauerjs1 · 2024-03-21T18:10:56Z

Sorry for the late feedback. I've commented on both PRs.

Still having several issues and errors with the script. Since my source cluster is also created by Rook, I wonder if it is possible to do everything in a declarative way, which would be totally awesome! Afaik, the script mainly creates new users and keys. Can this be done by Rook CRs like CephObjectStoreUsers and CephClients and syncing the generated secrets into the consumer cluster? If that is possible, I'd also be happy to contribute to the ceph-cluster Helm chart, to create the required resources for external access 🙂

parth-gr · 2024-03-22T13:32:31Z

No we dont have admin privilege with rook client in external mode so they cant create anything

bauerjs1 · 2024-03-22T15:47:13Z

Yea, but the source cluster isn't running in external mode, only the consumer cluster. Of course I'd need to create the required CRs in the source cluster. Wouldn't that be possible? Is there any documentation on what the "consumer" Rook needs in the source cluster?

I can also open a new issue on that, if you want, so we don't mix up too many topics here

currently addded the support for rados namesapce for rbd ec pools upstream closes: #13633 Signed-off-by: parth-gr <partharora1010@gmail.com> (cherry picked from commit adf40b1)

bauerjs1 added the bug label Jan 26, 2024

travisn assigned parth-gr Jan 26, 2024

parth-gr added a commit to parth-gr/rook that referenced this issue Feb 14, 2024

externl: add support for rados namespace for rbd ec pools

e1a4485

currently addded the support for rados namesapce for rbd ec pools upstream closes: rook#13633 Signed-off-by: parth-gr <partharora1010@gmail.com>

parth-gr added a commit to parth-gr/rook that referenced this issue Feb 14, 2024

external: add support for rados namespace for rbd ec pools

adf40b1

currently addded the support for rados namesapce for rbd ec pools upstream closes: rook#13633 Signed-off-by: parth-gr <partharora1010@gmail.com>

parth-gr mentioned this issue Feb 14, 2024

external: add support for rados namespace for rbd ec pools #13769

Merged

6 tasks

travisn closed this as completed in #13769 Apr 22, 2024

mergify bot mentioned this issue Apr 22, 2024

external: add support for rados namespace for rbd ec pools (backport #13769) #14113

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot make use of RADOS namespace for external cluster with EC block pool #13633

Cannot make use of RADOS namespace for external cluster with EC block pool #13633

bauerjs1 commented Jan 26, 2024 •

edited

bauerjs1 commented Feb 9, 2024

subhamkrai commented Feb 9, 2024

parth-gr commented Feb 12, 2024 •

edited

bauerjs1 commented Feb 13, 2024

parth-gr commented Feb 13, 2024

parth-gr commented Feb 13, 2024

parth-gr commented Feb 13, 2024

bauerjs1 commented Feb 13, 2024

parth-gr commented Feb 14, 2024 •

edited

parth-gr commented Feb 19, 2024

bauerjs1 commented Feb 21, 2024

bauerjs1 commented Feb 22, 2024 •

edited

parth-gr commented Feb 22, 2024

bauerjs1 commented Feb 22, 2024 •

edited

parth-gr commented Feb 26, 2024

bauerjs1 commented Feb 28, 2024

parth-gr commented Feb 28, 2024

bauerjs1 commented Feb 28, 2024

travisn commented Feb 28, 2024

parth-gr commented Feb 29, 2024

bauerjs1 commented Feb 29, 2024 •

edited

parth-gr commented Mar 4, 2024

parth-gr commented Mar 5, 2024 •

edited

bauerjs1 commented Mar 19, 2024

bauerjs1 commented Mar 21, 2024 •

edited

parth-gr commented Mar 22, 2024

bauerjs1 commented Mar 22, 2024 •

edited

Cannot make use of RADOS namespace for external cluster with EC block pool #13633

Cannot make use of RADOS namespace for external cluster with EC block pool #13633

Comments

bauerjs1 commented Jan 26, 2024 • edited

bauerjs1 commented Feb 9, 2024

subhamkrai commented Feb 9, 2024

parth-gr commented Feb 12, 2024 • edited

bauerjs1 commented Feb 13, 2024

parth-gr commented Feb 13, 2024

parth-gr commented Feb 13, 2024

parth-gr commented Feb 13, 2024

bauerjs1 commented Feb 13, 2024

parth-gr commented Feb 14, 2024 • edited

parth-gr commented Feb 19, 2024

bauerjs1 commented Feb 21, 2024

bauerjs1 commented Feb 22, 2024 • edited

parth-gr commented Feb 22, 2024

bauerjs1 commented Feb 22, 2024 • edited

parth-gr commented Feb 26, 2024

bauerjs1 commented Feb 28, 2024

parth-gr commented Feb 28, 2024

bauerjs1 commented Feb 28, 2024

travisn commented Feb 28, 2024

parth-gr commented Feb 29, 2024

bauerjs1 commented Feb 29, 2024 • edited

parth-gr commented Mar 4, 2024

parth-gr commented Mar 5, 2024 • edited

bauerjs1 commented Mar 19, 2024

bauerjs1 commented Mar 21, 2024 • edited

parth-gr commented Mar 22, 2024

bauerjs1 commented Mar 22, 2024 • edited

bauerjs1 commented Jan 26, 2024 •

edited

parth-gr commented Feb 12, 2024 •

edited

parth-gr commented Feb 14, 2024 •

edited

bauerjs1 commented Feb 22, 2024 •

edited

bauerjs1 commented Feb 22, 2024 •

edited

bauerjs1 commented Feb 29, 2024 •

edited

parth-gr commented Mar 5, 2024 •

edited

bauerjs1 commented Mar 21, 2024 •

edited

bauerjs1 commented Mar 22, 2024 •

edited