Design Proposal: ObjectBucketClaim CRD and Controller #2483

copejon · 2019-01-11T20:10:58Z

**Description of your

Design documentation for the addition of an ObjectBucketClaim CRD and controller for the provisioning of S3 buckets. This design should be agnostic to the underlying provider.

Which issue is resolved by this Pull Request:
Resolves #2480

Checklist:

Documentation has been updated, if necessary.
Pending release notes updated with breaking and/or notable changes, if necessary.
Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
Code generation (make codegen) has been run to update object specifications, if necessary.
Comments have been added or updated based on the standards set in CONTRIBUTING.md

[skip ci]

add design doc for new CephObjectBucket CRD and controller Signed-off-by: Jon Cope <jcope@redhat.com>

travisn · 2019-01-14T20:04:53Z

design/object-bucket/object-bucket.md

+
+This design closes the loop for Rook-Ceph object stores with a CephObjectBucket controller and CRD.  Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.  One ConfigMap will be created per CephObjectBucket.  ConfigMap names will be deterministic and based on the name of the CephObjectBucket.  This gives users the opportunity to define their Pod spec at the same time as the CephObjectBucket.  Additionally, the CephObjectBucket and Pod can safely be deployed at the same time thanks to built in synchronization in Kubernetes.
+
+Bucket deletion in the early stages of this controller design will not be addressed beyond cleaning up generated resources.  Deleting a CephObjectBucket will only cause deletion of the associated ConfigMap.  The controller will not delete the bucket from the object store. 


Avoiding deletion is because it is considered an admin action, correct? I saw that elsewhere, but you might mention it here as well.

That's correct. The consensus in early drafts was that it was preferable to suspend the bucket on API resource deletion. Ceph seems to supply this functionality. The reason being that it would be far too easy to accidentally destroy your buckets.

travisn · 2019-01-14T20:12:20Z

design/object-bucket/object-bucket.md

+1. The user deletes the CephObjectBucket.  This triggers an automated cleanup sequence:
+1. The CephObjectBucket is marked for deletion first and left in the foreground. Deletion is blocked by a finalizer on the CephObjectBucket, pending step (b).
+1. The respective ConfigMap is deleted.
+1. The finalizer in the CephObjectBucket is removed and the object is garbage collected.


Since we're not deleting the underlying bucket, do we still need the finalizer? If we're only cleaning up K8s resources, we shouldn't need a finalizer.

Good point, I'll reword this. The intent is to set an ownerRef on the CM as a child of the CephObjectBucket and blockOwnerDeletion: true. It seems like a fair safeguard against orphaning COBs or their CMs.

travisn · 2019-01-14T20:16:54Z

design/object-bucket/object-bucket.md

+### CephObjectBucket
+
+```yaml
+apiVersion: object.k8s.io/v1alpha


Currently we're adding all Ceph CRDs to the apiVersion ceph.rook.io/v1 since a single api group must all have the same version. But separately I'd like to find a way to allow new CRDs to bake until we really think they're stable.

I'd also appreciate that. Adding this as a v1 when additions / changes are going to be made would be misleading. I'll change it to v1 for the time being since that's the standard practice.

travisn · 2019-01-14T20:18:56Z

design/object-bucket/object-bucket.md

+  finalizers:
+  - rook-ceph.io/provisioner/my-bucket-1
+data:
+  BUCKET_HOST: http://my-store-url-or-ip


For naming consistency it would be nice if these vars weren't all caps. Is this necessary to make it easier to mount the configmap as env vars with the expected name?

This was done to adhere to bash conventions on env var names when mounting CMs with envFrom: where the entire CM is exported into the env. This is also done in k8s docs

As for the field names, if these are a little to generic, I'm fine with renaming them to something more explicit.

These names look good. My only thought is that we might add the ROOK_ prefix to them if we think there might be a conflict with other env vars, but not sure it's worth worrying about.

I think that's probably worth the small effort to change it. Will do.

travisn · 2019-01-14T20:20:26Z

design/object-bucket/object-bucket.md

+   provisioner: kubernetes.io/rook-ceph-operator
+   parameters:
+     # namespace/objectStoreService
+     objectStore: rook-ceph.io/objectStoreName


Instead of namespace/objectStoreName, why not have two separate properties for namespace and objectStore?

was rook-ceph.io intended for the namespace or should this be rook-ceph?

Instead of namespace/objectStoreName, why not have two separate properties for namespace and objectStore?

I'm fine with that, will change.

was rook-ceph.io intended for the namespace or should this be rook-ceph?

nope! that's a typo, will fix

…params into ns and name Signed-off-by: Jon Cope <jcope@redhat.com>

copejon · 2019-01-15T15:26:13Z

design/object-bucket/object-bucket.md

+    uid: 1234-qwer-4321-rewq
+    apiVersion: object.k8s.io/v1
+    kind: CephObjectBucket 
+  finalizers:


@travisn We've tried to make use of finalizers to make the system more robust against user error. Do you think this is a better option than either:

let it be deleted, do nothing

let it be deleted but regenerate it

Is there a common practice for this in rook?

In Rook we assume the user is an admin and they will have full access to the resources. If they delete resources rook relies on, it could kill the cluster. I don't recall hearing feedback from users that they think this is a problem.

However, in this case, it's not the admin who would be affected. Perhaps end users might be more prone to accidental operations? In my experience, finalizers should be used very sparingly. You have to rely on the rook operator to automate removal of the finalizer. If the operator is stopped before a finalizer is removed, the resource will live forever. As an example, we do have a finalizer on the CephCluster CRD and because users might stop the operator before removing the cluster, we end up with special cleanup documentation.

@travisn For now I'm ok with this level of manual cleanup in cases where the operator is unavailable. At least the finalizer can be removed and the protected object deleted, but normally the operator would handle deletion of the finalizer.

jbw976 · 2019-01-15T15:49:43Z

design/object-bucket/object-bucket.md

+### Assumptions
+
+- The `storageClass` field has been added to the CephObjectStoreUser API
+- Rook-Ceph Object has been deployed in a Kubernetes Cluster ([_see:_ quick start](https://rook.github.io/docs/rook/master/object.html))


this link is broken for me, did you mean https://rook.io/docs/rook/master/ceph-object.html?

jbw976 · 2019-01-16T01:15:03Z

design/object-bucket/object-bucket.md

+
+### Assumptions
+
+- The `storageClass` field has been added to the CephObjectStoreUser API


is this strictly necessary to implement this new bucket CRD and controller? it looks like this proposal could still be implemented without the StorageClass being added to the CephObjectStoreUser CRD.

@jbw976 Addindg storage class to the user CRD is not necessary for the bucket design. The bucket CRD just needs to know the object store service and secret. ~~As long as the name of the key-pair secret is deterministic the CephObjectStoreUser is not required at all.~~

Update: per some discussions with @copejon it is not that important that the secret name be deterministic. Although that's possible with a minor (breaking) change to the rook object user, the secret name may not be deterministic for other object stores. Also, considering the brown-field case where the user wants to wrap and ObjectBucketClaim around an existing bucket and creds, the secret name may typically be non-deterministic. It's not much of a burden on the app pod developer to list the target secret before creating her pod.

jbw976 · 2019-01-16T01:16:14Z

design/object-bucket/object-bucket.md

+
+- The `storageClass` field has been added to the CephObjectStoreUser API
+- Rook-Ceph Object has been deployed in a Kubernetes Cluster ([_see:_ quick start](https://rook.github.io/docs/rook/master/object.html))
+- The `CephObjectStoreUser` has been created in the developer's namespace and the associated credentials are stored in a `Secret`


what do you think of the possibility of generating a user on-demand for the bucket if there is no user specified? or do we want to be very explicit and keep that as a separate and required pre-requisite step?

@copejon and I like that idea and it has been mentioned before. What is your opinion on phasing this to a later pr, or would this be the more common use case?

@jbw976 @jeffvance Given that we're changing direction to a generalized provisioner rather than a ceph specific one, this would be a step back for us. The reason being that we'd be coupled (again) to rook-ceph CRDs which is what we're trying to avoid.

...we'd be coupled (again) to rook-ceph CRDs which is what we're trying to avoid.

+1

jbw976 · 2019-01-16T01:20:34Z

design/object-bucket/object-bucket.md

+
+[1] `bucketName` and `generateBucketName` are mutually exclusive. It is expected that `generateBucketName` be the common usage. This field produces a random bucket name beginning with the value of this stanza. If bucketName is defined then that value alone is used as the name of the bucket. Within an object store, buckets must be unique, so if bucketName is used there may be a higher chance of name collision. The api will continue retrying if the bucket cannot be created.
+
+### CephObjectBucketConfigMap


what do you think about capturing all connection information, including both location information and credentials information, into a single secret? that may be an easier model for apps/pods to consume everything they need to connect from a single k8s object. thoughts?

We've considered adding the endpoint to the secret and eliminating the configMap. A couple of thoughts:

the endpoint value doesn't need to be encrypted/hidden so it better "fits" the configMap use case.

if I have one user (and thus 1 secret) and I want to create two buckets for the same user, then the secret would need a list of endpoints. But if the real bucket name is randomized then the secret needs to associate the endpoint to the bucket CRD name.

@jbw976 Do you mean copying the access keys into a new secret? It would definitely simplify things but it does mean that changes to the secret (changning credentials) wouldn't propagate to the bucket-key secrets.

It also means that we assume the app should always have ownership rights of the bucket.

Here's a ^contrived case for using the configMap where I have 2 secrets for 2 different key pairs:

owner: full perms

web frontend: readonly

I would use the owner key pair secret to provision the claim.
Then manually bind the web frontend keypair to the bucket with a readOnly policy. (We don't have plans yet for doing this in k8s)
Finally, I would mount the readonly key secret and the config map.

So storing the access keys and bucket data together reduces our flexibility.

Signed-off-by: Jon Cope <jcope@redhat.com>

jeffvance · 2019-01-16T20:25:26Z

design/object-bucket/object-bucket.md

+
+One ConfigMap will be created per `ObjectBucketClaim` instance.  ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name.  This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`.  Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.
+
+Bucket deletion in the early stages of this controller design will not be addressed beyond cleaning up API objects.  Deletion of the bucket within the object store is left as an administrator responsibility.  This is to prevent the accidental destruction of data by users.


typo: "deletion is in the early..."

not actually a typo :)

jeffvance · 2019-01-16T20:29:49Z

design/object-bucket/object-bucket.md

+![work flow](./ObjectBucketClaim.png)
+
+1. The Rook Operator detects a new `ObjectBucketClaim` instance.  
+    1. The operator uses `objectBucketClaim.spec.accessKeySecretName` to get the S3 access keys secret.  


is accessKey a necessary part of this property name? I'm thinking that secretName is clear enough. Also, there are both access keys and secret keys so this name could imply there's another secret containing the secret key?

Maybe. It feels like it makes it more clear at a glance without hurting readability.

The keys are "access key id" and "secret access key", so it's not too far off. Maybe "accessKeyPairSecret"?

jeffvance · 2019-01-16T20:32:01Z

design/object-bucket/object-bucket.md

+  namespace: dev-user
+  labels:
+    rook.io/bucket-provisioner:
+    rook.io/object-bucket-claim: my-bucket-1 <sup>1</sup>


<sup> pair visible

jeffvance · 2019-01-16T20:32:39Z

design/object-bucket/object-bucket.md

+spec:
+  storageClassName: some-object-store
+  accessKeySecretName: my-s3-access-keys
+  generateBucketName: prefix<sup>2</sup>


ditto re sup

jeffvance · 2019-01-16T20:34:48Z

design/object-bucket/object-bucket.md

+  accessKeySecretName: my-s3-access-keys
+  generateBucketName: prefix<sup>2</sup>
+status:
+  phase: ["creating", "available", "error", "unknown"]


consider this k8s issue: Eliminate Phase and simplify Conditions

There's some good points and counter points in there. Rethinking it, maybe we should drop status for the v1 implementation to focus on core mechanics.

jeffvance · 2019-01-16T20:38:09Z

design/object-bucket/object-bucket.md

+  labels:
+    rook.io/object-bucket-claim-controller:
+    rook.io/object-bucket-claim: my-bucket-1
+  ownerReferences: <sup>1</sup>


sup visible

Signed-off-by: Jon Cope <jcope@redhat.com>

jeffvance · 2019-01-17T00:51:41Z

LGTM. Waiting on comments from others...

travisn · 2019-01-17T20:46:03Z

design/object-bucket/object-bucket.md

+
+Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.  
+
+One ConfigMap will be created per `ObjectBucketClaim` instance.  ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name.  This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`.  Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.


typo: thanks "to" in-built...

travisn · 2019-01-17T20:47:26Z

design/object-bucket/object-bucket.md

+
+**Use Case: Provision a Bucket** 
+
+_As a Kubernetes user, I want leverage the Kubernetes API to create S3 buckets. I expect to get back the bucket connection information in a ConfigMap._


typo: I want "to" leverage...

travisn · 2019-01-17T20:54:14Z

design/object-bucket/object-bucket.md

+    1. The operator uses `objectBucketClaim.spec.storageClassName` to get the `Service` endpoint of the object store.
+1. The operator uses the object store endpoint and access keys for an S3 "make bucket" call.
+1. The operator creates a ConfigMap in the namespace of the `ObjectBucketClaim` with relevant connection data of the bucket.
+1. An app Pod may then mount the Secret and the ConfigMap to begin accessing the bucket. 


If the bucket claim is created, there will be a small delay (normally less than a few seconds, but possibly longer if the operator is busy) before the secret is generated and the pod can start. What will be the behavior if the app pod and bucket claim are created at the same time? Will the pod retry starting or crash loop until the secret exists? Or is there something more graceful we can do?

I did some testing this morning to 2x check: when a pod is created that references a configMap or a key within a configMap (i.e. the pod.spec.container[0] uses envFrom.configMapRef or env[0].valueFrom.ConfigMapKeyRef) before the configMap exists, the pod's status will be CreateContainerConfigError.

Once the configMap is created, the kubelet will start the pod as usual.

This eventual consistency means that users can define their pod spec with references to the credential secret and bucket configmap (with deterministic names) in a single yaml.

Just to add - when mounting as volume, the pod will sit in containerCreating status until the CM exists.

@copjon and I assume the behavior you tested above for config maps also applies to secrets that aren't yet available to a pod?

travisn · 2019-01-17T20:56:10Z

design/object-bucket/object-bucket.md

+_As a Kubernetes user, I want to delete `ObjectBucketClaim` instances and cleanup generated API resources._
+
+1. The user deletes the `ObjectBucketClaim` via `kubectl delete ...`.
+1. The `ObjectBucketClaim` is marked for deletion and left in the foreground.


I'm still not sure there is a need for a finalizer here. Regenerating a lost secret would be simpler than managing a finalizer.

In this case, we won't be setting the finalizer ourselves. Instead, we'll set and ownerReference with blockOwnerDeletion: true in the configMap to make it a child of the bucket claim. The kubelet will be responsible for finalizers. This keeps configMaps from being orphaned when the claim is deleted.

blockOwnerDeletion is more of a cherry on top since the configMap isn't blocked from deletion when referenced by a pod (as env var or volume).

travisn · 2019-01-17T20:58:29Z

design/object-bucket/object-bucket.md

+  generateBucketName: prefix [2]
+```
+
+1. Added by the rook operator.


Not sure I would expect the operator to add a label. What about adding the generated bucket name to the crd status section? The status is easily mutable, but it's best to treat the user-defined crd as immutable.

Actually, doesn't the corresponding configmap already contain the bucket name?

What about adding the generated bucket name to the crd status section? The status is easily mutable, but it's best to treat the user-defined crd as immutable.

Fair point. status: edits by users are ignored by controllers. metadata is a grey area I think. I'm fairly certain that k8s core controllers add annotations to user defined objects. I'm not as sure about labels.

The intent was to provide a clean way for users/admins to get all API objects related to the claim operator or a specific bucket: kubectl get cm,objectbucketclaims --all-namespaces -l 'rook.io/object-bucket-claim-controller:' and similarly in code.

I know the CNV's Container Data Importer (CDI) controller adds a label. Adding annotations is a common practice from what I've seen.

travisn · 2019-01-17T20:59:38Z

design/object-bucket/object-bucket.md

+```
+
+1. Added by the rook operator.
+1. As with `metadata.generateName`, the operator will append a hyphen followed by random characters to the string given here.


Is generateBucketName optional? Or we will always generate the bucket name?

In general, nouns are preferred over verbs in the crd. What about calling this bucketNamePrefix?

Is generateBucketName optional? Or we will always generate the bucket name?

We will always generate the name to prevent collisions in the object store. The name itself will be stored in key:values or env vars so it will be abstracted from the user somewhat.

In general, nouns are preferred over verbs in the crd. What about calling this bucketNamePrefix?

I'm okay with that, will change.

travisn · 2019-01-17T21:06:15Z

design/object-bucket/object-bucket.md

+kind: Secret
+metadata:
+  name: my-s3-key-pair
+  namespace: dev-user


Shouldn't the secret also add the ownerReference?

I don't think so. It wouldn't be a child of the operator, claim, or configmap. Since this is a generalized design, the secret may be user or operator defined, so it's dependencies aren't deterministic from our point of view.

travisn · 2019-01-17T21:12:36Z

design/object-bucket/object-bucket.md

+
+## Design
+
+### Work Flow


Could you add a section in the workflow for the admin creating the storage class that exposes the object store for bucket creation? This will help illustrate that until the admin 1) creates the storage class and 2) starts the operator that responds to the storage class, no buckets can be provisioned. This will also help explain why it is a general solution and not specific to a provider such as ceph.

Signed-off-by: Jon Cope <jcope@redhat.com>

jbw976 · 2019-01-18T20:17:40Z

design/object-bucket/object-bucket.md

+
+## Proposed Feature
+
+**A generalized method of S3 bucket provisioning through the implementation of an `ObjectBucketClaim` CRD and controller.** 


will this proposal work for Minio and for EdgeFS buckets as well?

In theory, yes :) The controller will import the aws go sdk so stores that present an S3 interface should be compatible.

jbw976 · 2019-01-18T20:21:28Z

design/object-bucket/object-bucket.md

+
+![work flow](./ObjectBucketClaim.png)
+
+1. The Rook Operator detects a new `ObjectBucketClaim` instance.  


which operator is watching for new ObjectBucketClaim instances to be created? Does that operator/controller have the general responsibility of provisioning buckets for any s3 compatible object store? or will ceph, minio, edgeFS all have their own running and watching?

which operator is watching for new ObjectBucketClaim instances to be created?

The claim controller will run inside it's own operator the same as the other rook operators (ceph, nfs, edgefs, etc).

Does that operator/controller have the general responsibility of provisioning buckets for any s3 compatible object store?

yes. the operator should be agnostic to the store provider. the only requirement is that the provider expose an S3 interface

Hmm, wouldn't each provisioner only understand one particular object store type (ceph in rook's case, at least initially)? If we want to provision minio or s3 buckets we'd set the StorageClass provisioner set to some other provisioner (rook minio, or something that does s3, or whatever).

wouldn't each provisioner only understand one particular object store type (ceph in rook's case, at least initially)?

As far as the object store, yes since each store provider has its own requirements for provisioning (CLI vs API calls, etc). However, buckets creation is abstracted thru the S3 interface. A single bucket operator can remain unaware of the underlying store type so long as they expose and S3 endpoint.

jbw976 · 2019-01-18T20:23:54Z

design/object-bucket/object-bucket.md

+  name: some-object-store
+provisioner: rook.io/object-bucket-claim-controller
+parameters:
+  objectStoreService: my-store


I'm not sure how this StorageClass could be used by a general bucket provisioner to determine the s3 service endpoint in a general way, these values look specific to the RGW object store config information. How would this work for Minio or EdgeFS?

In EdgeFS serviceName "my-store" will be exposed as generic Kubernetes service. Having service name defined would help EdgeFS operator (or maybe even be done in a generic way also) to determine URL endpoint to call. The bucket then can be created via general API call which I hope can be part of Rook framework itself. Essentially API has to mimic "curl http://URL:PORT/bucketName -X PUT" semantic, handle authorization and errors.

I think I'm misunderstanding the question. Looking at minio and edgefs: they both generate Service API objects. Referencing the Service in the StorageClass enables the operator to Get() the Service, from which it will get the IP and Port that the S3 app is listening on. In this way, admins can "publish" one or more object stores and the user is not required to know anything about them.

jbw976 · 2019-01-18T20:26:28Z

design/object-bucket/object-bucket.md

+
+Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.  
+
+One ConfigMap will be created per `ObjectBucketClaim` instance.  ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name.  This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`.  Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.


is the deterministic rules for determining the configMap name defined in this proposal?

Good point, will add pattern for naming

I have a feeling we over designing this a bit. Pretty much all object storage systems have to handle dynamic bucket creation via AWS S3 API call. If they don't - then it means it is incompatible, and we should not bother. Rook framework would need to provide v2 and v4 authentication support, allow a user to select that and initiate the HTTP(S) call using CRD provided authorization credentials. Why do we need to introduce a storage class?

That's true that objects storage systems should generally provide a standard API for bucket creation, but that's not the case for user creation (both radosgw and minio do it via a CLI, for example), and the provisioner needs to do both users and buckets.

The purpose of the StorageClass is to describe which object store to provision against. It might be ceph cluster foo, AWS S3 region us-east, whatever.

The storage class contains a provisioner property ~~which can be used by the bucket controller to decide how to create the bucket~~ Error on my part: the provisioner field is essentially ignored. It could be checked by the controller to make sure it's the expected value, but its value is just a hard-coded string that implies bucket provisioning. The CRD operator is just watching for bucket CRD events and any event on this type of CRD is s3-compatible bucket related.

The parameters property names the object store service and namespace, which relieves the app developer from these concerns.

a standard API for bucket creation, but that's not the case for user creation (both radosgw
and minio do it via a CLI, for example), and the provisioner needs to do both users and buckets.

@liewegas Is the implication that bucket creation would be implemented redundantly in each rook object provisioner? It's clear why we need to do this for users however bucket creation could be handled by a single operator with a AWS S3 client.

Signed-off-by: Jon Cope <jcope@redhat.com>

dyusupov · 2019-01-18T23:21:15Z

There is a way to provision via CLI and management APIs. But should we go this route? In the case of AWS S3 API, user name itself is pretty much encoded into the v2/v4 authorization. Why would we mix user provisioning into dynamic bucket creation? For instance, in the case of multi-tenant storage (e.g. EdgeFS), in addition to region namespace, we would need to specify tenant name, type of user handling database (built-in, LDAP or AD), permission. Isn't it would be better to isolate user provisioning from bucket creation to keep things a bit simpler? If we stick to AWS S3 compatible API, then we do not need Storage Class indirection IMHO. Having said that, I do see benefits of doing it in provider dependent ways, like for instance in EdgeFS operator would be able to override Tenant's properties such as Erasure Encoding schemas to use, Replication Count, Encryption schema etc. If we want to get to that level of simplification - then maybe. I'd suggest we should stick to generic Rook provisioning CRD, which would support all the S3 compatible stores via AWS S3 API. And make this proposal be optional, or perhaps make it operator specific, to begin with.

…

On Fri, Jan 18, 2019 at 3:03 PM Sage Weil ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In design/object-bucket/object-bucket.md <#2483 (comment)>: > @@ -0,0 +1,137 @@ +# ObjectBucketClaim CRD and Controller + +## Proposed Feature + +**A generalized method of S3 bucket provisioning through the implementation of an `ObjectBucketClaim` CRD and controller.** + +Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket. + +One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes. That's true that objects storage systems should generally provide a standard API for bucket creation, but that's not the case for user creation (both radosgw and minio do it via a CLI, for example), and the provisioner needs to do both users and buckets. The purpose of the StorageClass is to describe *which* object store to provision against. It might be ceph cluster foo, AWS S3 region us-east, whatever. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2483 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABNTV_9vE9pQ7KDT0_o4uKL2zhFA3D_aks5vElKlgaJpZM4Z8GxU> .

jeffvance · 2019-01-19T02:54:36Z

Isn't it better to isolate user provisioning from bucket creation to keep things a bit simpler?

This bucket provisioning proposal does not depend on a user. It just needs a secret (creds) and an object store service.

copejon · 2019-01-21T17:08:21Z

I'd suggest we should stick to generic Rook provisioning CRD, which would
support all the S3 compatible stores via AWS S3 API.

This is the core design point of this proposal. By leveraging the S3 API, we can generalized bucket provisioning in a single operator.

And make this proposal be optional, or perhaps make it operator specific, to begin with.

This seems to reverse the above quote. Bucket creation via the S3 API is the core concept of this proposal. That said, I'm only familiar with ceph and minio and not edgeFS. Does it provide bucket provisioning operations outside of the S3 API?

Signed-off-by: Jon Cope <jcope@redhat.com>

jeffvance · 2019-01-21T20:00:25Z

design/object-bucket/object-bucket.md

@@ -45,6 +43,7 @@ The admin ...

 - creates the ObjectBucketClaim Operator (a Deployment API object)
 - creates a StorageClass with `parameters[serviceName]=objectstore-service` and `parameters[serviceNamespace]=some-namespace`
+  - `provisioner` field is left blank.  Since only the ObjectBucketClaim operator is watching ObjectBucketClaim instances, defining the operator is unnecessary. 


"...defining the operator/provisioner is unnecessary, and If a value is provided it is silently ignored."

Agree?

jeffvance · 2019-01-21T23:31:00Z

design/object-bucket/object-bucket.md

 parameters:
  serviceName: my-store
  serviceNamespace: some-namespace
 ```
+
+1. Since ObjectBucketClaim instances are only watched by the ObjectBucketClaim operator, defining the provisioner is unnecessary.


jeffvance · 2019-01-21T23:34:08Z

design/object-bucket/object-bucket.md

+
+**A generalized method of S3 bucket provisioning through the implementation of an ObjectBucketClaim CRD and controller.** 
+
+Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.  


nit: keep all singular or all plural. Eg.
Users will request a bucket via the Kubernetes API and have returned a ConfigMap with connection information for that bucket , or
Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for each bucket

jeffvance · 2019-01-21T23:37:27Z

design/object-bucket/object-bucket.md

+
+One ConfigMap will be created per ObjectBucketClaim instance.  ConfigMap names will be semi-deterministic and derived from the ObjectBucketClaim name.  If an ObjectBucketClaim is created with the `generateName` field, then the name cannot be known ahead of time.
+
+Additionally, the ObjectBucketClaim and Pod can be deployed at the same time thanks to in-built synchronization in Kubernetes. Pods mounting the configMap as either `envFrom:` or `env[0].valueFrom.ConfigMapKeyRef` will have a status of `CreateContainerConfigError` until the configMap is created. Pods mounting as a volume will have a status of `ContainerCreating` until the configMap is created. In both cases, the Pod will start once the configMap exists.


nit: Pods mounting as a volume will have ... --> "Pods mounting a configMap as a volume will have"

jeffvance · 2019-01-21T23:41:59Z

design/object-bucket/object-bucket.md

+#### Assumptions
+
+- An object store has been provisioned with a reachable endpoint.  ([e.g. the Rook-Ceph Object provisioner](https://rook.github.io/docs/rook/master/ceph-object.html)) This may also be an external endpoint such as AWS S3.
+- A Service and/or Endpoint API object exists to route connections to the object store.  Typically, object store operators generate these. In cases where the is external to the cluster, they must be configured manually.


nit: A Service and/or Endpoint API object exists I'd be explicit and say "A Kubernetes Service and/or Endpoint API object exists"

jeffvance · 2019-01-22T00:13:38Z

design/object-bucket/object-bucket.md

+
+**Use Case: Expose an Object Store Endpoint**
+
+_As a Kubernetes admin, I want to expose an object store to cluster users for bucket provisioning via the Kubernetes API._


I don't see a difference between this use-case and the one above.

editing mistake, will fix

jeffvance · 2019-01-22T01:37:50Z

design/object-bucket/object-bucket.md

+    rook.io/bucket-provisioner:
+    rook.io/object-bucket-claim: my-bucket-1 [1]
+spec:
+  storageClassName: some-object-store


Do you see value in supporting both storageClass and store? See 2482 comment regarding providing both. A kick-the-tires use case might be better served if the object store service name can be supplied directly without reliance on a storage admin to create a storage class.

It's a good point. Instead of store: though, it should either be a store url or a kubernetes service name. I lean towards URL.

jeffvance · 2019-01-22T01:41:00Z

design/object-bucket/object-bucket.md

+  namespace: dev-user
+  labels:
+    rook.io/bucket-provisioner:
+    rook.io/object-bucket-claim: my-bucket-1 [1]


do you think the operator should also add an anno containing the configMap name? It can be easily derived so not essential...

It's a reasonable thought. Another way to get the associated CM would be by rook.io/object-bucket-claim: my-bucket-1 label selector. It's slightly more obtuse - what do you think?

jeffvance · 2019-01-22T01:42:58Z

design/object-bucket/object-bucket.md

+```yaml
+apiVersion: rook.io/v1alpha2
+kind: ObjectBucketClaim
+metadata:


should there be an ownerReferences value pointing to the ceph cluster here? IOW, when the cluster is deleted this CRD would also be deleted. Or is it better long term to not tie this CRD to the rook-ceph cluster?

Signed-off-by: Jon Cope <jcope@redhat.com>

dyusupov · 2019-01-22T15:34:08Z

That said, I'm only familiar with ceph and minio and not edgeFS. Does it provide bucket provisioning operations outside of the S3 API?

It does. This proposal makes sense indeed. How are we going to deal with region/namespace and tenantName? Will we pass it down to storage provider level?

copejon · 2019-01-24T16:22:28Z

How are we going to deal with region/namespace ...

IMO region/namespace would fit best in the StorageClass as an optional parameter. This puts the onus on the cluster admin to know if the object provider they are using supports/expects regions.

For n regions, a cluster admin would define <=n storageClasses, each representing a different region in the region parameter field. A storage class naming convention should be encouraged in documentation. e.g. s3-object-store-us-east-1

Through RBAC, cluster admins can define access to each region per namespace by limiting access to a subset of storageClasses. This is accomplished with the resources[ ] and resourceNames[ ] fields in (cluster)roles.

and tenantName

After some cursory reading, my understanding is that tenant names are optional in aws s3, mino, ceph, and edgefs. If left empty, an empty string is applied as the tenant name. I think we have a few options in how we could address the feature:

Left to later versions of the operator. Bucket requests are submitted without a specified tenant

in the alphav1 version of this controller, this might be preferable to deliver an mvp with plans to support it in later versions

represented as a separate CRD and provisioned by another controller inside this proposal's operator

adds complexity to the early design but would start the project out with a common feature
abstracting tenants at the kube layer provides users kubectl CRUD ops on their tenants
objectBucketClaims would reference the tenant API object by name ( co-namespaced is assumed)

defined in the objectBucketClaim as an optional field. If defined, the tenant is created on demand by the operator, then creates the bucket with that tenant.

streamlined, doesn't increase user actions on bucket creation
does not provide kubectl CRUD ops for tenants.
clean up left to users/admins or automated in the controller

I'm undecided on which would be the most reasonable choice (or if there are others I haven't thought of).

@dyusupov @liewegas @travisn @jeffvance thoughts?

dyusupov · 2019-01-24T16:51:43Z

How are we going to deal with region/namespace ...

Through RBAC, cluster admins can define access to each region per namespace by limiting access to a subset of storageClasses. This is accomplished with the resources[ ] and resourceNames[ ] fields in (cluster)roles.

This works for edgefs. The difference I guess from other providers is that EdgeFS needs namespace (btw, can be any and not necessarily limited by AWS dictations) as well as tenantName in terms of to build universally unique object path, e.g. NAMESPACE/TENANT/BUCKET/OBJECT. It is called Name Hash ID and used for cross-sites lookups, metadata-only transfers, data I/O mesh policy, etc.

and tenantName

1 .represented as a separate CRD and provisioned by another controller inside this proposal's operator

defined in the objectBucketClaim as an optional field. If defined, the tenant is created on demand by the operator, then creates the bucket with that tenant.

I would do (1) and (2). We could defer (1) to a separate pull request while having (2) be predefined as a part of this pull request.

copejon · 2019-01-25T16:18:15Z

How are we going to deal with region/namespace ...

Through RBAC, cluster admins can define access to each region per namespace by limiting access to a subset of storageClasses. This is accomplished with the resources[ ] and resourceNames[ ] fields in (cluster)roles.

This works for edgefs. The difference I guess from other providers is that EdgeFS needs namespace (btw, can be any and not necessarily limited by AWS dictations) as well as tenantName in terms of to build universally unique object path, e.g. NAMESPACE/TENANT/BUCKET/OBJECT. It is called Name Hash ID and used for cross-sites lookups, metadata-only transfers, data I/O mesh policy, etc.

and tenantName

represented as a separate CRD and provisioned by another controller inside this proposal's operator

defined in the objectBucketClaim as an optional field. If defined, the tenant is created on demand by the operator, then creates the bucket with that tenant.

I would do (1) and (2). We could defer (1) to a separate pull request while having (2) be predefined as a part of this pull request.

I think this seems reasonable. Implementing (2) early on would lay a framework for handling defined tenants. (1) would follow up with a change in how the information is delivered to that framework.

I'll add these changes to the doc

jeffvance · 2019-01-25T18:55:37Z

Re tenants: I haven't thought this through, so just pitching it as a possibility. Perhaps options 0 and 2 can be combined. My thinking is that I don't want to slow down the dev of bucket provisioning and option 2 could be done later.
What I mean by "combining" 0 and 2 is that we don't yet support tenants but we do support a tenant field as an optional parameter. The api checks that if tenant is provided its value must be empty, otherwise an error is returned. This means that no one can be using tenants in phase 1 but the field exists and can be added in phase 2 without a major version roll. Thoughts?

copejon · 2019-01-25T19:59:29Z

The downside of 2 on its own is that users/admins would be expected to perform tenant cleanup manually, outside of k8s. I realize that we set this requirement for buckets; should it extend to tenants as well?

cc @liewegas @travisn

pierreozoux · 2019-01-28T05:03:13Z

design/object-bucket/object-bucket.md

+
+A cluster user will request bucket via the Kubernetes API and have returned a ConfigMap with connection information for that bucket.  
+
+One ConfigMap will be created per ObjectBucketClaim instance.  ConfigMap names will be semi-deterministic and derived from the ObjectBucketClaim name.  If an ObjectBucketClaim is created with the `generateName` field, then the name cannot be known ahead of time.


Why not creating an ObjectBucket object? Like we do with PVC and PV.

The K8s Pod API expects a mountable/attachable k8s primitive (configMap, secret) for data injection, so we have to use a core API resource instead of a PV analog CRD.

pierreozoux · 2019-01-28T05:06:06Z

design/object-bucket/object-bucket.md

+  namespace: dev-user
+data:
+  accessKeyId: <base64 encoded string>
+  secretAccessKey: <base64 encoded string>


I'd advocate against the use of AWS secrets.
1st AWS is working on binding IAM to pods.
2nd in the meantime, I'm fine with scheduling the pod on nodes that have the right instance profile.

I'd advocate against the use of AWS secrets.
1st AWS is working on binding IAM to pods.

Will this work for object stores besides AWS S3 (ceph object, edgefs, minio, etc)?

2nd in the meantime, I'm fine with scheduling the pod on nodes that have the right instance profile.

Does this assume multiple cluster users would use the same instance profile? If user A and user B work within their own namespaces and require access to an object store, would both of their pods potentially use the same instance profile?

jbw976 · 2019-02-02T03:19:14Z

Thank you for meeting to discuss this further today. To summarize the discussion, we think that this support for generalized bucket provisioning is something that the greater Kubernetes community wants to see and has value. We also agree that being able to demonstrate something working, as opposed to just having a "grand idea", is more effective at getting community buy-in and rallying support.

We think this is going in the right direction, but the concern here is that this work would be better served in a different venue, it might not be a strong fit in the scope of the Rook project.

Bucket provisioning would be better suited at the workload level, supported by a scheduler, topology aware placement, dynamic/static resource matching and binding, separation of concerns (admin creds to create bucket, app creds to consume bucket, user management), etc. Essentially more machinery around the consuming workload. A design that has the abstraction closer to the app level and scheduling/provisioning is a stronger approach.

That would be similar to the PVC/PV model where the claim abstraction is closer to the workload and scheduler. Rook should be focusing on orchestrating and integrating concrete implementations of storage. Rook should be bringing new providers of storage into this ecosystem that implement these abstractions, but Rook shouldn't go beyond the storage (e.g., into scheduling and workload machinery).

This would likely be a better for the following venues:

We are very open to collaborating and incorporating this effort into the Crossplane project. You can read more in the Crossplane vision and architecture doc.

We need to figure out what to split out and what to keep from the design in this PR. I think the generalized bucket abstraction and a bucket controller make sense split out while a concrete Ceph bucket CRD makes more sense here in Rook. We can discuss further and once again thank you for your efforts and driving this initiative.

copejon · 2019-02-06T23:07:58Z

@jbw976 @bassam @liewegas @travisn @eboyd @jeffvance

Thank you for sharing your time with us; we took a lot away from the meeting and it's given us a better since of the direction this project should head. We firmly agree that a generalized implementation of a bucket provisioner would be better implemented somewhere other than Rook.

With that in mind, I'm going to go ahead and close this PR so that we can start a fresh dialogue within the scope of a concrete Ceph CRD design. The benefit being that the conversation won't contain the history of drastic changes from specific to generalized and back to specific design scope.

I'll reference this PR in the next so that we can still keep track of the overall history of the proposal.

bassam · 2019-02-07T04:41:58Z

Thanks @copejon and @eboyd! Looking forward to seeing how this generalized bucket approach develops!

rook-ceph: Object Bucket CRD proposal

c588c65

add design doc for new CephObjectBucket CRD and controller Signed-off-by: Jon Cope <jcope@redhat.com>

travisn requested changes Jan 14, 2019

View reviewed changes

fixed typos, dropped finalizer from configMap, broke up storageClass …

b2a2e84

…params into ns and name Signed-off-by: Jon Cope <jcope@redhat.com>

copejon commented Jan 15, 2019

View reviewed changes

jbw976 reviewed Jan 16, 2019

View reviewed changes

copejon force-pushed the object-bucket-proposal branch from 96054da to 203cbdf Compare January 16, 2019 20:05

copejon changed the title ~~Design Proposal: CephObjectBucket CRD and Controller~~ Design Proposal: ObjectBucketClaim CRD and Controller Jan 16, 2019

Redirect proposal toward general bucket provisioning

91677fe

Signed-off-by: Jon Cope <jcope@redhat.com>

copejon force-pushed the object-bucket-proposal branch from 203cbdf to 91677fe Compare January 16, 2019 20:09

jeffvance reviewed Jan 16, 2019

View reviewed changes

typos, dropped status: field, added example secret

c9c1ea4

Signed-off-by: Jon Cope <jcope@redhat.com>

copejon force-pushed the object-bucket-proposal branch from 6e89649 to c9c1ea4 Compare January 16, 2019 21:54

travisn reviewed Jan 17, 2019

View reviewed changes

add admin use case, typos

f769de9

Signed-off-by: Jon Cope <jcope@redhat.com>

jbw976 reviewed Jan 18, 2019

View reviewed changes

copejon added 2 commits January 18, 2019 16:11

expand admin usecase

314d946

Signed-off-by: Jon Cope <jcope@redhat.com>

add naming convention

e969381

Signed-off-by: Jon Cope <jcope@redhat.com>

remove provisioner requirement, expand on deterministic naming

ea2f05e

Signed-off-by: Jon Cope <jcope@redhat.com>

jeffvance reviewed Jan 21, 2019

View reviewed changes

jeffvance reviewed Jan 22, 2019

View reviewed changes

editing bug fixes, header adjustments

7d9a59e

Signed-off-by: Jon Cope <jcope@redhat.com>

pierreozoux reviewed Jan 28, 2019

View reviewed changes

copejon closed this Feb 6, 2019

copejon mentioned this pull request Feb 6, 2019

Rook-Ceph: Bucket Provisioning CRD #2625

Closed

5 tasks


		This design closes the loop for Rook-Ceph object stores with a CephObjectBucket controller and CRD. Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket. One ConfigMap will be created per CephObjectBucket. ConfigMap names will be deterministic and based on the name of the CephObjectBucket. This gives users the opportunity to define their Pod spec at the same time as the CephObjectBucket. Additionally, the CephObjectBucket and Pod can safely be deployed at the same time thanks to built in synchronization in Kubernetes.

		Bucket deletion in the early stages of this controller design will not be addressed beyond cleaning up generated resources. Deleting a CephObjectBucket will only cause deletion of the associated ConfigMap. The controller will not delete the bucket from the object store.


		### Assumptions

		- The `storageClass` field has been added to the CephObjectStoreUser API


		[1] `bucketName` and `generateBucketName` are mutually exclusive. It is expected that `generateBucketName` be the common usage. This field produces a random bucket name beginning with the value of this stanza. If bucketName is defined then that value alone is used as the name of the bucket. Within an object store, buckets must be unique, so if bucketName is used there may be a higher chance of name collision. The api will continue retrying if the bucket cannot be created.

		### CephObjectBucketConfigMap


		One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.

		Bucket deletion in the early stages of this controller design will not be addressed beyond cleaning up API objects. Deletion of the bucket within the object store is left as an administrator responsibility. This is to prevent the accidental destruction of data by users.


		Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.

		One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.


		Use Case: Provision a Bucket

		_As a Kubernetes user, I want leverage the Kubernetes API to create S3 buckets. I expect to get back the bucket connection information in a ConfigMap._


		## Proposed Feature

		A generalized method of S3 bucket provisioning through the implementation of an `ObjectBucketClaim` CRD and controller.


		![work flow](./ObjectBucketClaim.png)

		1. The Rook Operator detects a new `ObjectBucketClaim` instance.


		One ConfigMap will be created per ObjectBucketClaim instance. ConfigMap names will be semi-deterministic and derived from the ObjectBucketClaim name. If an ObjectBucketClaim is created with the `generateName` field, then the name cannot be known ahead of time.

		Additionally, the ObjectBucketClaim and Pod can be deployed at the same time thanks to in-built synchronization in Kubernetes. Pods mounting the configMap as either `envFrom:` or `env[0].valueFrom.ConfigMapKeyRef` will have a status of `CreateContainerConfigError` until the configMap is created. Pods mounting as a volume will have a status of `ContainerCreating` until the configMap is created. In both cases, the Pod will start once the configMap exists.


		Use Case: Expose an Object Store Endpoint

		_As a Kubernetes admin, I want to expose an object store to cluster users for bucket provisioning via the Kubernetes API._


		A cluster user will request bucket via the Kubernetes API and have returned a ConfigMap with connection information for that bucket.

		One ConfigMap will be created per ObjectBucketClaim instance. ConfigMap names will be semi-deterministic and derived from the ObjectBucketClaim name. If an ObjectBucketClaim is created with the `generateName` field, then the name cannot be known ahead of time.

Design Proposal: ObjectBucketClaim CRD and Controller #2483

Design Proposal: ObjectBucketClaim CRD and Controller #2483

Conversation

copejon commented Jan 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon Jan 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon Jan 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Jan 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Jan 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon Jan 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon Jan 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Jan 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Jan 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance commented Jan 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon Jan 18, 2019 • edited Loading

Choose a reason for hiding this comment

jeffvance Jan 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon Jan 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon Jan 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon commented Jan 11, 2019 •

edited

Loading

copejon Jan 14, 2019 •

edited

Loading

copejon Jan 14, 2019 •

edited

Loading

jeffvance Jan 16, 2019 •

edited

Loading

jeffvance Jan 19, 2019 •

edited

Loading

copejon Jan 16, 2019 •

edited

Loading

copejon Jan 16, 2019 •

edited

Loading

jeffvance Jan 16, 2019 •

edited

Loading

jeffvance Jan 16, 2019 •

edited

Loading

copejon Jan 18, 2019 •

edited

Loading

jeffvance Jan 19, 2019 •

edited

Loading

copejon Jan 18, 2019 •

edited

Loading

copejon Jan 18, 2019 •

edited

Loading

jeffvance Jan 19, 2019 •

edited

Loading

copejon Jan 21, 2019 •

edited

Loading