Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not start rook-ceph using multiple namespaces #3360

Closed
phlogistonjohn opened this issue Jun 26, 2019 · 16 comments
Closed

Can not start rook-ceph using multiple namespaces #3360

phlogistonjohn opened this issue Jun 26, 2019 · 16 comments
Labels

Comments

@phlogistonjohn
Copy link
Contributor

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

From the provided documentation and example files it is unclear how to start a working ceph cluster via rook in a separate namespace.
Attempting to follow the recommendation of changing common.yaml below the line containing the text "Beginning of cluster-specific resources.".
The previous namespace value of "rook-ceph" is changed to "rook1".

Everything seems to start up ok at first but then the following appears in the logs:

I0626 18:22:48.166453       6 controller.go:818] Started provisioner controller ceph.rook.io/block_rook-ceph-operator-566967f57-7fbc9_6608d036-983f-11e9-958c-c686a7df9e39!
I0626 18:22:48.557850       6 controller.go:818] Started provisioner controller rook.io/block_rook-ceph-operator-566967f57-7fbc9_660922c8-983f-11e9-958c-c686a7df9e39!
2019-06-26 18:22:49.403566 I | operator: successfully started Ceph csi drivers
2019-06-26 18:22:49.404563 I | op-cluster: starting cluster in namespace rook1
2019-06-26 18:22:55.406255 W | op-k8sutil: OwnerReferences will not be set on resources created by rook. failed to test that it can be set. configmaps is forbidden: User "system:serviceaccount:rook-ceph:rook-ceph-system" cannot create resource "configmaps" in API group "" in the namespace "rook1"
2019-06-26 18:22:55.422701 I | op-k8sutil: waiting for job rook-ceph-detect-version to complete...
2019-06-26 18:24:35.455210 E | op-cluster: unknown ceph major version. failed to get version job log to detect version. failed to read from stream. pods "rook-ceph-detect-version-7pr96" is forbidden: User "system:serviceaccount:rook-ceph:rook-ceph-system" cannot get resource "pods/log" in API group "" in the namespace "rook1"
2019-06-26 18:24:37.412338 I | op-k8sutil: Removing previous job rook-ceph-detect-version to start a new one
2019-06-26 18:24:37.440975 I | op-k8sutil: batch job rook-ceph-detect-version still exists
2019-06-26 18:24:39.446598 I | op-k8sutil: batch job rook-ceph-detect-version deleted
2019-06-26 18:24:39.461157 I | op-k8sutil: waiting for job rook-ceph-detect-version to complete...

As someone who is not an expert in RBAC its very unclear what else I am supposed to change.

Expected behavior:

More context and guidance in the documentation or comments in the YAML would be greatly welcome.

How to reproduce it (minimal and precise):

A new namespace rook1 was created. Namespace values in the "cluster specific resources" were edited to "rook1". Rook configured with started with ROOK_CURRENT_NAMESPACE_ONLY set to "false" and a new cluster cr is created in the "rook1" namespace. Rook fails to read status of the version discovery pod.

Environment:

  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod): v1.0.0-154.g004f795
  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-12-06T18:30:39Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
@martin31821
Copy link

I built a helm chart to automate this: https://github.com/deinstapel/k8s-rook-ceph

@leseb
Copy link
Member

leseb commented Jun 28, 2019

@phlogistonjohn please look at this commit: e755695

@phlogistonjohn
Copy link
Contributor Author

@leseb it turns out I got things working late yesterday but hadn't gotten back to update this issue yet. In my case I just studied the templates in the tests and the differences between the old tempates and found there were a few namespace values in the latter half of "common.yaml" I should not have changed to match my 2nd namespace.
However your changes there do seem like they'd do the trick of having a documented way to deploy into different namespaces. More explicit breadcrumbs in the docs might be nice too though.
Thanks to @martin31821 as well but I am not using helm at this time.

@leseb leseb mentioned this issue Jun 28, 2019
9 tasks
@leseb
Copy link
Member

leseb commented Jun 28, 2019

@phlogistonjohn Thanks, we will close this once my PR gets merged.

@stale
Copy link

stale bot commented Sep 26, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Sep 26, 2019
@galexrt
Copy link
Member

galexrt commented Sep 26, 2019

@leseb Are you talking about the #3346 PR? If so, please close thanks! 🙂

@stale stale bot removed the wontfix label Sep 26, 2019
@leseb
Copy link
Member

leseb commented Sep 26, 2019

@galexrt yes! Thanks!

@leseb leseb closed this as completed Sep 26, 2019
@vsoch
Copy link

vsoch commented Feb 4, 2023

This same thing is burning me today - the docs do not sufficiently help you create this across namespaces. There is too much guessing and trying to interpret what something means. It has actual costs in terms of the time I have to bring up these clusters and try for the N-th time.

@travisn
Copy link
Member

travisn commented Feb 5, 2023

@vsoch There are two ways that should help run multiple clusters in different namespaces:

  • The cluster helm chart should handle creating everything properly in the desired namespace
  • If you're not installing with helm, see this topic for an example script to modify the namespace.

@vsoch
Copy link

vsoch commented Feb 5, 2023

@travisn so if I understand correctly, if I have an operator (e.g., namespace flux-operator) wouldn't I want to start this in its own namespace and then have the rook-ceph namespace storage available to it? I played around with starting the storage in the same namespace as our operator, and it might work with some tweaks, but I'd rather have separation of management. I can try with helm, but it looks like that article is just detailing how to change the namespace, and now how to access storage across namespaces. Unless I'm misunderstanding and I'm supposed to be deploying two operators to the same namespace?

@travisn
Copy link
Member

travisn commented Feb 6, 2023

Correct, Rook is usually deployed in its own namespace, separate from the consumers. The storage class is what defines where Rook is running to provide the storage, which can be consumed from any namespace. I realize now I misunderstood your initial question. There is no need to change the default rook namespace.

@vsoch
Copy link

vsoch commented Feb 6, 2023

Gotcha! I think I figured this out through trial and error - at least trying to deploy my operator in the same namespace turned into disaster very quickly! I think I was able to narrow this down to some issue with the OSD "object storage daemon" not staring (see #11617 (comment)) and I suspect that relates to how I created my cluster nodes. I'm wondering if for these tutorials an example command (and/or config) using gcloud /eksctl could be provided? Likely if it can capture some of these details it would save a lot of new users some time in running into these issues!

@travisn
Copy link
Member

travisn commented Feb 6, 2023

@vsoch If you're running in the cloud, consider the cluster-on-pvc example. Also, have you joined the rook slack?

@vsoch
Copy link

vsoch commented Feb 6, 2023

I did try the cluster-on-pvc.yaml with AWS and got some errors - is there a full example that starts with the eksctl config and walks through the exact steps? As an example:

provisioned from the AWS gp2 storage class.

I don't know how that is setup or created, and I'm not sure if any of these config examples correspond to the one shipped by the repository that I tried. I could join the slack, but I think the memory of our conversation (and others finding it) would be easier to find on GitHub issues, if that's OK?

@travisn
Copy link
Member

travisn commented Feb 6, 2023

How about opening a new discussion instead of discussing on so many different old github issues? thanks

@vsoch
Copy link

vsoch commented Feb 6, 2023

I was posting on the GitHub issues with the relevant errors I was hitting - sorry this is wrong I couldn't have known. Usually when someone is google searching they will come upon the issue that matches their problem. I have a main issue here that I would consider akin to opening a discussion, but it's less discussion and more issue because there is probably some follow up work to docs needed. #11617 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants