New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Ceph rados block device (rbd) volume plugin #6578

Merged
merged 1 commit into from May 21, 2015

Conversation

Projects
None yet
9 participants
@rootfs
Copy link
Member

rootfs commented Apr 8, 2015

@thockin @vishh

Updated. Ready for your review. Thank you!

@googlebot googlebot added the cla: yes label Apr 8, 2015

@vishh vishh self-assigned this Apr 8, 2015

@rootfs rootfs force-pushed the rootfs:wip-rbd branch 7 times, most recently from ba9c29f to 792a9af Apr 8, 2015

// RBDVolumeSource represents a Rados Block Device Mount that lasts the lifetime of a pod
type RBDVolumeSource struct {
// Required: CephMonitor is the ip:port of the ceph cluster monitor
CephMonitor string `json:"monitor" description:"ip:port of the ceph cluster monitor"`

This comment has been minimized.

@liewegas

liewegas Apr 8, 2015

There are normally N monitors; is this a comma separated list, or a json list? Alternatively ceph can also take a dns name and it'll form a list from the A or AAAA records, but most users don't have DNS set up that way for their clusters.

This comment has been minimized.

@rootfs

rootfs Apr 8, 2015

Member

this is the center of another topic. In gluster, we create an endpoints (i.e. array of hosts) to represent the cluster. But the endpoint concept is being debated now.

Add @thockin @smarterclayton


If you don't have a Ceph cluster, you can set up a [containerized Ceph cluster](https://github.com/rootfs/docker-ceph)

Then get the keyring from the Ceph cluster and copy it to */etc/ceph/keyring*.

This comment has been minimized.

@liewegas

liewegas Apr 8, 2015

The key can also be passed to the 'rbd map ...' command like: 'rbd -m --key map ...'

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented Apr 9, 2015

@liewegas

Sage: The Ceph monitors is now represented by endpoints. Passing secret is also allowed. If both keyring and secret are passed, secret overrides keyring.

// RBDVolumeSource represents a Rados Block Device Mount that lasts the lifetime of a pod
type RBDVolumeSource struct {
// Required: EndpointsName is the endpoint name that represents a collection of Ceph monitors
EndpointsName string `json:"endpoints" description:"a collection of Ceph monitors"`

This comment has been minimized.

@vishh

vishh Apr 9, 2015

Member

@thockin: Given that we are re-considering using endpoints for representing external entities, should we switch to using headless service here instead of endpoints?

This comment has been minimized.

@thockin

thockin Apr 13, 2015

Member

In general I think a headless service is preferred to raw Endpoints, even if just to prevent collisions. But the discussion never concluded. My last comment was essentially "As written, the endpoints must be in the same namespace as the pod. I don't think this is what you want to do." We need to rethink how we want to describe connecting to these sorts of services, and I don't think we should commit this (and frankly neither gluster) until we have an asnwer. Gluster will now have to be backwards compatible or else break compat.

This comment has been minimized.

@smarterclayton

smarterclayton Apr 13, 2015

Contributor

So at a minimum so far it sounds like we should have used an object reference. If we did that, and stated that we will for a period of time support endpoints as the Kind while we solve this, does that reduce the concern about API stability?

On Apr 13, 2015, at 2:46 PM, Tim Hockin notifications@github.com wrote:

In pkg/api/v1beta1/types.go:

@@ -1522,3 +1526,27 @@ type GlusterfsVolumeSource struct {
// the Glusterfs volume to be mounted with read-only permissions
ReadOnly bool json:"readOnly,omitempty" description:"Glusterfs volume to be mounted with read-only permissions"
}
+
+// RBDVolumeSource represents a Rados Block Device Mount that lasts the lifetime of a pod
+type RBDVolumeSource struct {

  • // Required: EndpointsName is the endpoint name that represents a collection of Ceph monitors
  • EndpointsName string json:"endpoints" description:"a collection of Ceph monitors"
    In general I think a headless service is preferred to raw Endpoints, even if just to prevent collisions. But the discussion never concluded. My last comment was essentially "As written, the endpoints must be in the same namespace as the pod. I don't think this is what you want to do." We need to rethink how we want to describe connecting to these sorts of services, and I don't think we should commit this (and frankly neither gluster) until we have an asnwer. Gluster will now have to be backwards compatible or else break compat.


Reply to this email directly or view it on GitHub.

This comment has been minimized.

@thockin

thockin Apr 14, 2015

Member

I'm not super concerned about API compat with gluster - we could PROBABLY take the hit and just break compat. Or make a new field that is an ObjectRef, deprecate the old field, but use it preferentially if specified. That's not too hard, just need to pick a name that you don't hate, and be consistent across all of these scenarios.

This comment has been minimized.

@pmorie

pmorie Apr 14, 2015

Member

@smarterclayton @thockin Why not use an object reference to a headless service? That would give you ns flexibility.

type RBDVolumeSource struct {
// Required: EndpointsName is the endpoint name that represents a collection of Ceph monitors
EndpointsName string `json:"endpoints" description:"a collection of Ceph monitors"`
// Optioanl: RBDPool is the rados pool name,default is rbd

This comment has been minimized.

@vishh

vishh Apr 9, 2015

Member

nit: typo: Optional

// Ex. "ext4", "xfs", "ntfs"
// TODO: how do we prevent errors in the filesystem from compromising the machine
FSType string `json:"fsType,omitempty" description:"file system type to mount, such as ext4, xfs, ntfs"`
// Optional: Defaults to false (read/write). ReadOnly here will force

This comment has been minimized.

@vishh

vishh Apr 9, 2015

Member

Suggestion: How about listing all the required fields first before the optional ones?

// Required: RBDImage is the rados image name
RBDImage string `json:"image" description:"rados image name"`
// Optional: RBDUser is the rados user name, default is admin
RadosUser string `json:"user" description:"rados user name"`

This comment has been minimized.

@vishh

vishh Apr 9, 2015

Member

@thockin: Do we want to emphasize the 'Optional' part along with the default value in the json description? If we expect swagger to be the primary way for accessing API information, then I think the json description should indicate if a field is optional or not.

"github.com/GoogleCloudPlatform/kubernetes/pkg/volume"
"github.com/golang/glog"
)

This comment has been minimized.

@vishh

vishh Apr 9, 2015

Member

nit: Can you add a file level comment that describes the logic in this file? The same would be useful for the disk_manager.go as well.

This comment has been minimized.

@rootfs

rootfs Apr 13, 2015

Member

fixed

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Apr 9, 2015

Curious: Is it possible to run a Ceph cluster inside a kube cluster?

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Apr 9, 2015

Should be. Same for gluster.

----- Original Message -----

Curious: Is it possible to run a Ceph cluster inside a kube cluster?


Reply to this email directly or view it on GitHub:
#6578 (comment)

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Apr 9, 2015

Being able to setup a ceph/gluster cluster in kube will help setup e2e testing for these plugins.

@rootfs rootfs force-pushed the rootfs:wip-rbd branch from cb15ede to 626be82 Apr 13, 2015

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented Apr 13, 2015

@vishh i am working on containerizing ceph cluster

@rootfs rootfs force-pushed the rootfs:wip-rbd branch from 0f0e9b4 to 5d0cc80 Apr 13, 2015

// RBDVolumeSource represents a Rados Block Device Mount that lasts the lifetime of a pod
type RBDVolumeSource struct {
// Required: EndpointsName is the endpoint name that represents a collection of Ceph monitors
EndpointsName string `json:"endpoints"`

This comment has been minimized.

@thockin

thockin Apr 13, 2015

Member

This i subject to the same discussion as gluster - current implication is that it can only access in-namespace Endpoints. Is that really sufficient?

// Optional: Keyring is the path to key ring for RBDUser, default is /etc/ceph/keyring
Keyring string `json:"keyring"`
// Optional: Secret is the authentication secret for RBDUser, default is empty.
Secret string `json:"secret" description:"secret is authentication secret. If provided, it overrides keyring"`

This comment has been minimized.

@thockin

thockin Apr 13, 2015

Member

Putting secrets into API objects other than the Secret object seems like a bad idea. Could we instead integrate with Secret and SecretVolumeSource?

This comment has been minimized.

@thockin

This comment has been minimized.

@pmorie

pmorie Apr 13, 2015

Member

We use SecretName in the secret volume plugin to address a secret; I believe @markturansky uses XName some something similar.

This comment has been minimized.

@thockin

thockin Apr 14, 2015

Member

I meant that maybe this should be a reference to a Secret object, rather than a secret value

This comment has been minimized.

@pmorie

pmorie Apr 14, 2015

Member

@thockin @rootfs I may have misread this; re-reading.

This comment has been minimized.

@markturansky

markturansky Apr 21, 2015

Member

@pmorie you're right it needs to be on builder. I'm wading through some of these implications right now.

This comment has been minimized.

@pmorie

pmorie Apr 21, 2015

Member

@markturansky cool, i will rebase the kubelet-in-container PR onto yours. I
guess one thing we might conceivably get into with this is if you had a
plugin which wrapped another and they both need different secrets, but I
don't think this exists today - we can cross that bridge when we come to
it.
On Tue, Apr 21, 2015 at 4:10 PM Mark Turansky notifications@github.com
wrote:

In pkg/api/types.go
#6578 (comment)
:

  • EndpointsName string json:"endpoints"
  • // Required: RBDImage is the rados image name
  • RBDImage string json:"image"
  • // Required: Filesystem type to mount.
  • // Must be a filesystem type supported by the host operating system.
  • // Ex. "ext4", "xfs", "ntfs"
  • // TODO: how do we prevent errors in the filesystem from compromising the machine
  • FSType string json:"fsType,omitempty"
  • // Optional: RBDPool is the rados pool name,default is rbd
  • RBDPool string json:"pool"
  • // Optional: RadosUser is the rados user name, default is admin
  • RadosUser string json:"user"
  • // Optional: Keyring is the path to key ring for RBDUser, default is /etc/ceph/keyring
  • Keyring string json:"keyring"
  • // Optional: Secret is the authentication secret for RBDUser, default is empty.
  • Secret string json:"secret" description:"secret is authentication secret. If provided, it overrides keyring"

@pmorie https://github.com/pmorie you're right it needs to be on
builder. I'm wading through some of these implications right now.


Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/6578/files#r28816969
.

This comment has been minimized.

@smarterclayton

smarterclayton Apr 21, 2015

Contributor

Do we really need to extract secrets right now out of volume source? It's nice, but I don't think it gates this pull request, and has other API change implications. Not everything needs a secret - what does it mean to apply a secret to a type that doesn't need it?

I would prefer we focus on the minimal change here.

On Apr 21, 2015, at 4:13 PM, Paul Morie notifications@github.com wrote:

In pkg/api/types.go:

  • EndpointsName string json:"endpoints"
  • // Required: RBDImage is the rados image name
  • RBDImage string json:"image"
  • // Required: Filesystem type to mount.
  • // Must be a filesystem type supported by the host operating system.
  • // Ex. "ext4", "xfs", "ntfs"
  • // TODO: how do we prevent errors in the filesystem from compromising the machine
  • FSType string json:"fsType,omitempty"
  • // Optional: RBDPool is the rados pool name,default is rbd
  • RBDPool string json:"pool"
  • // Optional: RadosUser is the rados user name, default is admin
  • RadosUser string json:"user"
  • // Optional: Keyring is the path to key ring for RBDUser, default is /etc/ceph/keyring
  • Keyring string json:"keyring"
  • // Optional: Secret is the authentication secret for RBDUser, default is empty.
  • Secret string json:"secret" description:"secret is authentication secret. If provided, it overrides keyring"
    @markturansky cool, i will rebase the kubelet-in-container PR onto yours. I guess one thing we might conceivably get into with this is if you had a plugin which wrapped another and they both need different secrets, but I don't think this exists today - we can cross that bridge when we come to it.


    Reply to this email directly or view it on GitHub.

This comment has been minimized.

@markturansky

This comment has been minimized.

@markturansky

markturansky Apr 22, 2015

Member

Added the test. See #7150.

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented Apr 14, 2015

@thockin please correct me if my understanding goes wrong.

  • For this PR, convert Secret from plain text string to a Secret object reference
  • convert EndpointsName from a string to a EndpointObjectReference , for both ceph and gluster (gluster change will be in a separate PR).
@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Apr 14, 2015

Ugh, cross namespace secrets. @pmorie, we need to decide whether we should force every consumer of a cluster volume to have their own copy of the secrets (proving they have access) or whether it's better to embed the secret directly in the volume definition, or if we want to allow cross namespace secrets. I really don't think we can do the latter, because in order to secure that we need to have a way to say repo X has access to the secret. However, if we take the first option, then when you want to use a persistent volume defined by a cluster admin (which presumably has secrets attached to it) the assignment process also has to copy the secrets into your namespace (which is possible, but somewhat ugly). The middle solution - embedding the secret directly - is the least impactful now, but leaves secrets coupled to volumes.

----- Original Message -----

@thockin please correct me if my understanding goes wrong.

  • For this PR, convert Secret from plain text string to an Secret
    object reference
  • convert EndpointsName from a string to a EndpointObjectReference , for both ceph and gluster (gluster change will
    be in a separate PR).

Reply to this email directly or view it on GitHub:
#6578 (comment)

@pmorie

This comment has been minimized.

Copy link
Member

pmorie commented Apr 14, 2015

@smarterclayton @erictune @thockin @rootfs

I think this is a new use-case for secrets from all of the ones we called out in the proposal. It's close to the 'secret used on behalf of the pod by the kubelet' in the sense that it doesn't touch the container; ie, the container never knows about it. However, it's different from the cases I had personally imagined for secrets used by the kubelet on behalf of the container in a couple ways:

  1. The use-cases we had in mind for the kubelet performing actions on behalf of the pod were focused on service accounts and things like private docker registry pull, LOAS daemon, etc.
  2. This is a separate volume type which needs to use a secret to initialize the volume, but which does not actually expose the secret data in said volume.

So, that said, should we update the proposal to include this variant of use-case?

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Apr 14, 2015

Before we force Huamin and Sage to go redesign everything, yes. This is definitely a secret the infrastructure may have access to, which the user doesn't really need in the container. Should the rules be different than for regular uses of secrets? Probably not. It may just mean that the persistent volume controller, upon binding a claim into the namespace, also needs to bind and update a secret.

On Apr 14, 2015, at 11:26 AM, Paul Morie notifications@github.com wrote:

@smarterclayton @erictune @thockin @rootfs

I think this is a new use-case for secrets from all of the ones we called out in the proposal. It's close to the 'secret used on behalf of the pod by the kubelet' in the sense that it doesn't touch the container; ie, the container never knows about it. However, it's different from the cases I had personally imagined for secrets used by the kubelet on behalf of the container in a couple ways:

The use-cases we had in mind for the kubelet performing actions on behalf of the pod were focused on service accounts and things like private docker registry pull, LOAS daemon, etc.
This is a separate volume type which needs to use a secret to initialize the volume, but which does not actually expose the secret data in said volume.
So, that said, should we update the proposal to include this variant of use-case?


Reply to this email directly or view it on GitHub.

@pmorie

This comment has been minimized.

Copy link
Member

pmorie commented Apr 14, 2015

@smarterclayton Yep, getting to your comments, three ways were offered:

  1. Force consumers to have their own copy of secrets
  2. Allow cross-namespace secrets
  3. Allow embedding the secret where you need it

I don't like (2) for the same reason you don't: you would have to define policies that express that user X has access to secret Y in a different ns.

I don't think (3) is great because one of the points of secrets is to keep the secret data orthogonal to where it's used.

I think we should go with (1) and facilitate copying secrets into an NS on things like PV volume claim.

Any thoughts of the above, @markturansky?

@rootfs rootfs force-pushed the rootfs:wip-rbd branch 4 times, most recently from 5ee14a9 to 4899463 May 20, 2015

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented May 20, 2015

@pmorie @vishh @smarterclayton

converted secret to LocalObjectReference as suggest by @bgrant0607

// Optional: Keyring is the path to key ring for RBDUser, default is /etc/ceph/keyring
Keyring string `json:"keyring"`
// Optional: SecretRef is name of the authentication secret for RBDUser, default is empty.
SecretRef LocalObjectReference `json:"secretName"`

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

This should probably be a pointer if it is optional. If it is not, then the godoc comment should be updated.

// Optional: RadosPool is the rados pool name,default is rbd
RBDPool string `json:"pool" description:"rados pool name"`
// Optional: RBDUser is the rados user name, default is admin
RadosUser string `json:"user" description:"rados user name"`

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

Include the default values in the description text

var err error
devicePath := strings.Join([]string{"/dev/rbd", rbd.pool, rbd.image}, "/")
exist := waitForPathToExist(devicePath, 1)
if exist == false {

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

!exist

// modprobe
_, err = rbd.plugin.execCommand("modprobe", []string{"rbd"})
if err != nil {
glog.Errorf("rbd: failed to modprobe rbd error:%v", err)

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

Generally if you return an error there is no need to log it - higher level functions should be responsible for that.

}
// mount it
globalPDPath := rbd.manager.MakeGlobalPDName(rbd)
mountpoint, err := rbd.mounter.IsMountPoint(globalPDPath)

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

This error should be checked and logged, or returned. It is not checked elsewhere later in the code. Why would the mount check fail, and is it ignorable? If it's ignorable, make the variable '_' to indicate it is ignored and add a comment

This comment has been minimized.

@rootfs

rootfs May 21, 2015

Member

error check added

return err
}

err = rbd.mounter.Mount(devicePath, globalPDPath, rbd.fsType, nil)

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

This should probably be a single block error (if err := ...) and in the block you should transform or return the error instead of logging (if you want the info to be included, wrap the error, otherwise return straight for this function.

}
cnt--
// if device is no longer used, see if can unmap
if cnt == 0 {

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

It would be more idiomatic go to have "if cnt <= 1" here than a decrement then comparison to zero.

@rootfs rootfs force-pushed the rootfs:wip-rbd branch from 4899463 to 7a2257e May 21, 2015

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented May 21, 2015

It looks like some of my comments weren't addressed - specifically the logging is unnecessary if you return errors (and if you want the errors to be better, wrap them with fmt.Errorf("specific message: %v", err).

Also, remove the cnt--

@rootfs rootfs force-pushed the rootfs:wip-rbd branch from f059884 to 71ad6f2 May 21, 2015

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented May 21, 2015

@smarterclayton i see what you mean now, please see the latest. thanks.

// Optional: Keyring is the path to key ring for RBDUser, default is /etc/ceph/keyring
Keyring string `json:"keyring" description:"Keyring is the path to key ring for rados user. default is /etc/ceph/keyring"`
// Optional: SecretRef is name of the authentication secret for RBDUser, default is empty.
SecretRef *LocalObjectReference `json:"secretName" description:"The name of secret. secret is authentication secret. If provided, it overrides keyring"`

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

'secretRef' not 'secretName'

// Optional: Keyring is the path to key ring for RBDUser, default is /etc/ceph/keyring
Keyring string `json:"keyring" description:"Keyring is the path to key ring for rados user. default is /etc/ceph/keyring"`
// Optional: SecretRef is name of the authentication secret for RBDUser, default is empty.
SecretRef *LocalObjectReference `json:"secretName" description:"The name of secret. secret is authentication secret. If provided, it overrides keyring"`

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

secretRef

// Optional: Keyring is the path to key ring for RBDUser, default is /etc/ceph/keyring
Keyring string `json:"keyring" description:"Keyring is the path to key ring for rados user. default is /etc/ceph/keyring"`
// Optional: SecretRef is name of the authentication secret for RBDUser, default is empty.
SecretRef *LocalObjectReference `json:"secretName" description:"The name of secret. secret is authentication secret. If provided, it overrides keyring"`

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

secretRef, also, for all of these can you update the description to follow the convention "starts with lowercase, separate default and other behavioral comments with semicolons" that we use on descriptions?

I.e. maybe description:"name of a secret to authenticate the RBD user; if provided overrides keyring; optional"

// Optional: Keyring is the path to key ring for RBDUser, default is /etc/ceph/keyring
Keyring string `json:"keyring" description:"Keyring is the path to key ring for rados user. default is /etc/ceph/keyring"`
// Optional: SecretRef is name of the authentication secret for RBDUser, default is empty.
SecretRef *LocalObjectReference `json:"secretName" description:"The name of secret. secret is authentication secret. If provided, it overrides keyring"`

This comment has been minimized.

@smarterclayton

smarterclayton May 21, 2015

Contributor

secretRef and description

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented May 21, 2015

One last set of comments. After that this is LGTM, @vishh do you have any additional feedback?

@rootfs rootfs force-pushed the rootfs:wip-rbd branch from eb698c6 to e104bc5 May 21, 2015

add rados block device(rbd) volume plugin
Signed-off-by: Huamin Chen <hchen@redhat.com>

@rootfs rootfs force-pushed the rootfs:wip-rbd branch from e104bc5 to 4a800fd May 21, 2015

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented May 21, 2015

LGTM

@smarterclayton smarterclayton added the lgtm label May 21, 2015

smarterclayton added a commit that referenced this pull request May 21, 2015

Merge pull request #6578 from rootfs/wip-rbd
add Ceph rados block device (rbd) volume plugin

@smarterclayton smarterclayton merged commit f3dd404 into kubernetes:master May 21, 2015

2 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
coverage/coveralls Coverage pending from Coveralls.io
Details
Shippable Shippable builds completed
Details
cla/google All necessary CLAs are signed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment