Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying custom Ceph user and secret name for mounting #2216

Merged
merged 1 commit into from
Dec 4, 2018

Conversation

galexrt
Copy link
Member

@galexrt galexrt commented Oct 14, 2018

Description of your changes:

Introduce mount security mode for basic multi tenancy

This adds three new parameters/options to StorageClass/flexvolume entry:

  • mountUser
  • mountSecret
  • mountSecretNamespace

Will test in a minute or two but I'm currently "rebuilding" my vendor directory which takes quite an amount of time.

Which issue is resolved by this Pull Request:
Resolves #2164.

Checklist:

  • Documentation has been updated, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.
  • Comments have been added or updated based on the standards set in CONTRIBUTING.md

/cc @dimm0 took me a bit longer than said in the community meeting but here it is

@galexrt
Copy link
Member Author

galexrt commented Oct 14, 2018

One thing to add here, this change does not only affect filesystem mounting but also block storage or to put it differently I thought why stop at filesystem.

There seems to be one more quirk I have to fix in the code to get the tests green. Will look on Monday, but in general this is ready for review.

@galexrt galexrt requested a review from travisn October 15, 2018 08:54
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much simpler would this be if it was only supported for CephFS? If there isn't a request for block to support this, seems like we should keep it simple. The doc example also only discusses how to use it for the file system, right?

Since mounting cephfs requires using flex directly instead of a storage class seems like this would simplify a number of places.

  • The username and secret would be set here
  • If they are not set, the admin account would be used, which is the behavior today and would be the new default
  • No need for an env var in the operator.yaml that controls the policy

Documentation/advanced-configuration.md Outdated Show resolved Hide resolved
Documentation/advanced-configuration.md Outdated Show resolved Hide resolved
Documentation/advanced-configuration.md Outdated Show resolved Hide resolved
Documentation/advanced-configuration.md Outdated Show resolved Hide resolved
cmd/rookflex/cmd/root.go Outdated Show resolved Hide resolved
@@ -155,14 +174,30 @@ func (vm *VolumeManager) Detach(image, pool, clusterNamespace string, force bool
return nil
}

if id == "" && key == "" {
return fmt.Errorf("no id nor keyring given, can't unmount without credentials")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ummapping requires the keyring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to verify, though the unmap command has the keyring flag set on it so I just added it to make sure the credentials are set as in the map command.

Documentation/advanced-configuration.md Show resolved Hide resolved
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion in huddle, if nobody needs the feature for block, let's keep it simple and only support it for cephfs.

Documentation/advanced-configuration.md Outdated Show resolved Hide resolved
# (Optional) Specify an existing Ceph user that will be used for mounting storage with this StorageClass.
#mountUser: user1
# (Optional) Specify an existing Kubernetes Secret name containing just one key holding the Ceph user secret.
# The secret must exist in each namespace(s) where the storage will be consumed!
Copy link
Member

@bassam bassam Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there also be a secret namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bassam No, the namespace of the Pod mounting the storage will be used to get the secret. This is in "conformance" with some existing in-tree plugins looking up the secret in the namespace of the pod using storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all storage classes that have a secret also specify the namespace. why this different? https://kubernetes.io/docs/concepts/storage/storage-classes/#ceph-rbd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be new for the user provided mount secret ..
https://v1-11.docs.kubernetes.io/docs/concepts/storage/storage-classes/#ceph-rbd
Sure, I'll add that too though would default it to pod namespace.

@galexrt galexrt force-pushed the fix_2164 branch 3 times, most recently from 548d0c2 to 5688c1d Compare October 18, 2018 19:28
@galexrt
Copy link
Member Author

galexrt commented Oct 18, 2018

@bassam @travisn I updated the PR with the doc changes and the mountSecretNamespace paramter/option. PTAL

@galexrt
Copy link
Member Author

galexrt commented Oct 24, 2018

@bassam I just realized a security problem with the mountSecretNamespace option.
Example:

  • Namespace ABC has a Secret my-ceph-secret.
  • User A has only access to namespace XYZ.
  • Because the agent needs to have access to "all" namespaces in which a mount secret will be, the agent has access to namespace ABC and XYZ.
  • If User A wants to use the secret for Namespace ABC, the user just needs to set mountSecretNamespace: ABC and would be able to use the secret from that namespace.

I would be for removing the mountSecretNamespace option as it is too much of a security issue.

@travisn
Copy link
Member

travisn commented Oct 26, 2018

@galexrt what would it take to update a filesystem integration test to generate and mount with the user creds? It seems like it wouldn't be too big a work item.

@galexrt galexrt force-pushed the fix_2164 branch 5 times, most recently from e374797 to 76a074c Compare October 29, 2018 05:37
@dimm0
Copy link
Contributor

dimm0 commented Oct 31, 2018

Came here to whine about authentication...

@travisn
Copy link
Member

travisn commented Nov 6, 2018

@galexrt looking at the integration tests, here are a few suggestions. It's not clear yet what the underlying issue is for deleting the mds pod. I am not able to repro the issue locally, similarly to your findings. Only jenkins seems to be having the issue anytime deleting the mds pods. It doesn't seem to be inter-related between the integration tests because each test uses the labels to only look for mds pods that it created.

To troubleshoot further, i would suggest printing the output of pod describe for the mds pod(s). For example, add the following line at the end of the ** method where we see the message about timing out waiting for the pod to be deleted around line 518 in k8s_helper.go. Another idea is to print the logs for an mds pod.

	k8sh.PrintPodDescribe(namespace, "-l", label)

Perhaps on an unrelated note, the operator log shows we are calling an obsolete command to deactivate the mds.

2018-11-06 23:09:06.859733 I | exec: Running command: ceph mds deactivate smoke-test-fs:1 --cluster=smoke-ns --conf=/var/lib/rook/smoke-ns/smoke-ns.config --keyring=/var/lib/rook/smoke-ns/client.admin.keyring --format json --out-file /tmp/994480105
2018-11-06 23:09:07.158089 I | exec: Error ENOTSUP: command is obsolete; please check usage and/or man page

@galexrt galexrt force-pushed the fix_2164 branch 2 times, most recently from 2081085 to d2b2700 Compare November 10, 2018 22:18
@jjgraham
Copy link

Any chance we will have mount security for basic multi tenancy soon ?

@galexrt
Copy link
Member Author

galexrt commented Dec 2, 2018

@travisn I just had three green, restarted CI..

https://jenkins.rook.io/blue/organizations/jenkins/rook%2Frook/detail/PR-2216/60/pipeline/

If it fails another time in 1.8 and 1.9 I'll take a closer look at why it failed.
To already document:

@galexrt
Copy link
Member Author

galexrt commented Dec 2, 2018

Oh well now I have 4 of 5 green..

aws 1.9 failed with Expected nil, but got: &errors.errorString{s:"rgw did not start via crd. Giving up waiting for pod with label rook_object_store=default in namespace helm-ns to be running"}, I'll look into if I can get some clues from the logs of what causes this error right now.

Documentation/advanced-configuration.md Outdated Show resolved Hide resolved
Documentation/advanced-configuration.md Outdated Show resolved Hide resolved
Documentation/advanced-configuration.md Show resolved Hide resolved
@@ -154,7 +154,7 @@ def RunIntegrationTest(k, v) {
export PATH="/tmp/rook-tests-scripts-helm/linux-amd64:$PATH" \
KUBECONFIG=$HOME/admin.conf
kubectl config view
_output/tests/linux_amd64/integration -test.v -test.timeout 2400s --host_type '''+"${k}"+''' --helm /tmp/rook-tests-scripts-helm/linux-amd64/helm 2>&1 | tee _output/tests/integrationTests.log'''
_output/tests/linux_amd64/integration -test.v -test.timeout 7200s --host_type '''+"${k}"+''' --helm /tmp/rook-tests-scripts-helm/linux-amd64/helm 2>&1 | tee _output/tests/integrationTests.log'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a timeout of 2h seems too long. how about 60 minutes for now?

Copy link
Member Author

@galexrt galexrt Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had problems with a timeout of "just" 60 minutes in the CI. As written below, I just "cranked" it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the typical time for the integration tests is currently 35 minutes. If it goes over 60 minutes, seems like jenkins should kill it because the tests are surely failing. The longer we wait, the jenkins backlog just has the potential to get longer. Hopefully we can fix the resource issue with jenkins soon...

@CGO_ENABLED=0 $(GOHOST) test -v -i $(GO_STATIC_FLAGS) $(GO_INTEGRATION_TEST_PACKAGES)
@CGO_ENABLED=0 $(GOHOST) test -v $(GO_TEST_FLAGS) $(GO_STATIC_FLAGS) $(GO_INTEGRATION_TEST_PACKAGES) $(TEST_FILTER_PARAM) 2>&1 | tee $(GO_TEST_OUTPUT)/integration-tests.log
CGO_ENABLED=0 $(GOHOST) test -v -i $(GO_STATIC_FLAGS) $(GO_INTEGRATION_TEST_PACKAGES)
CGO_ENABLED=0 $(GOHOST) test -v -timeout 7200s $(GO_TEST_FLAGS) $(GO_STATIC_FLAGS) $(GO_INTEGRATION_TEST_PACKAGES) $(TEST_FILTER_PARAM) 2>&1 | tee $(GO_TEST_OUTPUT)/integration-tests.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this timeout? if someone is running the integration tests locally, they could just cancel them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn Well.. during local testing when I ran the tests I got test timeouts, so I just "cranked" it up.

cluster/examples/kubernetes/ceph/operator.yaml Outdated Show resolved Hide resolved
cluster/examples/kubernetes/ceph/storageclass.yaml Outdated Show resolved Hide resolved
tests/framework/installer/ceph_installer.go Show resolved Hide resolved
tests/framework/utils/k8s_helper.go Outdated Show resolved Hide resolved
Introduce mount security mode for basic multi tenancy

Fixes rook#2164.

This adds three new parameters/options to StorageClass/flexvolume entry:
* `mountUser`
* `mountSecret`
* `mountSecretNamespace`

Signed-off-by: Alexander Trost <galexrt@googlemail.com>
@galexrt
Copy link
Member Author

galexrt commented Dec 4, 2018

@travisn please merge the PR when the CI is green now, as I'm heading to bed now and want to get the PR finally merged.

@dimm0
Copy link
Contributor

dimm0 commented Dec 4, 2018

Crossed all the fingers I have

@galexrt
Copy link
Member Author

galexrt commented Dec 4, 2018

@travisn it's green! Finally.

@galexrt galexrt merged commit 18b2da5 into rook:master Dec 4, 2018
@galexrt galexrt deleted the fix_2164 branch December 4, 2018 22:54
@baseyou
Copy link

baseyou commented Dec 13, 2018

So,If the Node reboot,Then will lost security mode env?

@galexrt
Copy link
Member Author

galexrt commented Dec 13, 2018

@planeo1105 No. You are probably running a master image which is a moving tag which can and will 100% cause issues at one point.
If you switch to a release tagged image, e.g., v0.9.0 there env var should be there and understandable for all Rook Ceph Pods.

Please create a new issue with your problem next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag operator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shared FS client path restriction
6 participants