Can not start rook-ceph using multiple namespaces #3360

phlogistonjohn · 2019-06-26T18:40:53Z

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

From the provided documentation and example files it is unclear how to start a working ceph cluster via rook in a separate namespace.
Attempting to follow the recommendation of changing common.yaml below the line containing the text "Beginning of cluster-specific resources.".
The previous namespace value of "rook-ceph" is changed to "rook1".

Everything seems to start up ok at first but then the following appears in the logs:

I0626 18:22:48.166453       6 controller.go:818] Started provisioner controller ceph.rook.io/block_rook-ceph-operator-566967f57-7fbc9_6608d036-983f-11e9-958c-c686a7df9e39!
I0626 18:22:48.557850       6 controller.go:818] Started provisioner controller rook.io/block_rook-ceph-operator-566967f57-7fbc9_660922c8-983f-11e9-958c-c686a7df9e39!
2019-06-26 18:22:49.403566 I | operator: successfully started Ceph csi drivers
2019-06-26 18:22:49.404563 I | op-cluster: starting cluster in namespace rook1
2019-06-26 18:22:55.406255 W | op-k8sutil: OwnerReferences will not be set on resources created by rook. failed to test that it can be set. configmaps is forbidden: User "system:serviceaccount:rook-ceph:rook-ceph-system" cannot create resource "configmaps" in API group "" in the namespace "rook1"
2019-06-26 18:22:55.422701 I | op-k8sutil: waiting for job rook-ceph-detect-version to complete...
2019-06-26 18:24:35.455210 E | op-cluster: unknown ceph major version. failed to get version job log to detect version. failed to read from stream. pods "rook-ceph-detect-version-7pr96" is forbidden: User "system:serviceaccount:rook-ceph:rook-ceph-system" cannot get resource "pods/log" in API group "" in the namespace "rook1"
2019-06-26 18:24:37.412338 I | op-k8sutil: Removing previous job rook-ceph-detect-version to start a new one
2019-06-26 18:24:37.440975 I | op-k8sutil: batch job rook-ceph-detect-version still exists
2019-06-26 18:24:39.446598 I | op-k8sutil: batch job rook-ceph-detect-version deleted
2019-06-26 18:24:39.461157 I | op-k8sutil: waiting for job rook-ceph-detect-version to complete...

As someone who is not an expert in RBAC its very unclear what else I am supposed to change.

Expected behavior:

More context and guidance in the documentation or comments in the YAML would be greatly welcome.

How to reproduce it (minimal and precise):

A new namespace rook1 was created. Namespace values in the "cluster specific resources" were edited to "rook1". Rook configured with started with ROOK_CURRENT_NAMESPACE_ONLY set to "false" and a new cluster cr is created in the "rook1" namespace. Rook fails to read status of the version discovery pod.

Environment:

OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod): v1.0.0-154.g004f795
Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-12-06T18:30:39Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

The text was updated successfully, but these errors were encountered:

martin31821 · 2019-06-28T06:37:11Z

I built a helm chart to automate this: https://github.com/deinstapel/k8s-rook-ceph

leseb · 2019-06-28T07:02:45Z

@phlogistonjohn please look at this commit: e755695

phlogistonjohn · 2019-06-28T14:12:09Z

@leseb it turns out I got things working late yesterday but hadn't gotten back to update this issue yet. In my case I just studied the templates in the tests and the differences between the old tempates and found there were a few namespace values in the latter half of "common.yaml" I should not have changed to match my 2nd namespace.
However your changes there do seem like they'd do the trick of having a documented way to deploy into different namespaces. More explicit breadcrumbs in the docs might be nice too though.
Thanks to @martin31821 as well but I am not using helm at this time.

leseb · 2019-06-28T14:17:18Z

@phlogistonjohn Thanks, we will close this once my PR gets merged.

stale · 2019-09-26T14:19:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

galexrt · 2019-09-26T16:07:09Z

@leseb Are you talking about the #3346 PR? If so, please close thanks! 🙂

leseb · 2019-09-26T16:11:23Z

@galexrt yes! Thanks!

vsoch · 2023-02-04T20:07:35Z

This same thing is burning me today - the docs do not sufficiently help you create this across namespaces. There is too much guessing and trying to interpret what something means. It has actual costs in terms of the time I have to bring up these clusters and try for the N-th time.

travisn · 2023-02-05T06:36:28Z

@vsoch There are two ways that should help run multiple clusters in different namespaces:

The cluster helm chart should handle creating everything properly in the desired namespace
If you're not installing with helm, see this topic for an example script to modify the namespace.

vsoch · 2023-02-05T06:47:45Z

@travisn so if I understand correctly, if I have an operator (e.g., namespace flux-operator) wouldn't I want to start this in its own namespace and then have the rook-ceph namespace storage available to it? I played around with starting the storage in the same namespace as our operator, and it might work with some tweaks, but I'd rather have separation of management. I can try with helm, but it looks like that article is just detailing how to change the namespace, and now how to access storage across namespaces. Unless I'm misunderstanding and I'm supposed to be deploying two operators to the same namespace?

travisn · 2023-02-06T15:12:25Z

Correct, Rook is usually deployed in its own namespace, separate from the consumers. The storage class is what defines where Rook is running to provide the storage, which can be consumed from any namespace. I realize now I misunderstood your initial question. There is no need to change the default rook namespace.

vsoch · 2023-02-06T15:21:16Z

Gotcha! I think I figured this out through trial and error - at least trying to deploy my operator in the same namespace turned into disaster very quickly! I think I was able to narrow this down to some issue with the OSD "object storage daemon" not staring (see #11617 (comment)) and I suspect that relates to how I created my cluster nodes. I'm wondering if for these tutorials an example command (and/or config) using gcloud /eksctl could be provided? Likely if it can capture some of these details it would save a lot of new users some time in running into these issues!

travisn · 2023-02-06T17:54:35Z

@vsoch If you're running in the cloud, consider the cluster-on-pvc example. Also, have you joined the rook slack?

vsoch · 2023-02-06T18:39:46Z

I did try the cluster-on-pvc.yaml with AWS and got some errors - is there a full example that starts with the eksctl config and walks through the exact steps? As an example:

provisioned from the AWS gp2 storage class.

I don't know how that is setup or created, and I'm not sure if any of these config examples correspond to the one shipped by the repository that I tried. I could join the slack, but I think the memory of our conversation (and others finding it) would be easier to find on GitHub issues, if that's OK?

travisn · 2023-02-06T18:45:20Z

How about opening a new discussion instead of discussing on so many different old github issues? thanks

vsoch · 2023-02-06T19:06:51Z

I was posting on the GitHub issues with the relevant errors I was hitting - sorry this is wrong I couldn't have known. Usually when someone is google searching they will come upon the issue that matches their problem. I have a main issue here that I would consider akin to opening a discussion, but it's less discussion and more issue because there is probably some follow up work to docs needed. #11617 (comment)

phlogistonjohn added the bug label Jun 26, 2019

leseb mentioned this issue Jun 28, 2019

External Ceph cluster #3346

Merged

9 tasks

stale bot added the wontfix label Sep 26, 2019

stale bot removed the wontfix label Sep 26, 2019

leseb closed this as completed Sep 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not start rook-ceph using multiple namespaces #3360

Can not start rook-ceph using multiple namespaces #3360

phlogistonjohn commented Jun 26, 2019

martin31821 commented Jun 28, 2019

leseb commented Jun 28, 2019

phlogistonjohn commented Jun 28, 2019

leseb commented Jun 28, 2019

stale bot commented Sep 26, 2019

galexrt commented Sep 26, 2019

leseb commented Sep 26, 2019

vsoch commented Feb 4, 2023

travisn commented Feb 5, 2023

vsoch commented Feb 5, 2023

travisn commented Feb 6, 2023

vsoch commented Feb 6, 2023

travisn commented Feb 6, 2023 •

edited

vsoch commented Feb 6, 2023

travisn commented Feb 6, 2023

vsoch commented Feb 6, 2023

Can not start rook-ceph using multiple namespaces #3360

Can not start rook-ceph using multiple namespaces #3360

Comments

phlogistonjohn commented Jun 26, 2019

martin31821 commented Jun 28, 2019

leseb commented Jun 28, 2019

phlogistonjohn commented Jun 28, 2019

leseb commented Jun 28, 2019

stale bot commented Sep 26, 2019

galexrt commented Sep 26, 2019

leseb commented Sep 26, 2019

vsoch commented Feb 4, 2023

travisn commented Feb 5, 2023

vsoch commented Feb 5, 2023

travisn commented Feb 6, 2023

vsoch commented Feb 6, 2023

travisn commented Feb 6, 2023 • edited

vsoch commented Feb 6, 2023

travisn commented Feb 6, 2023

vsoch commented Feb 6, 2023

travisn commented Feb 6, 2023 •

edited