Proposal Doc: Composable integration in Operator-SDK #2340

vazirim · 2019-12-17T02:23:22Z

Description of the change:

Proposal doc to integrate Composable SDK into Operator-SDK.

Motivation for the change:

Composable SDK allows operator developers to add cross-resource references to their CRDs,
enabling dynamic configuration.

camilamacedo86 · 2020-01-08T18:21:20Z

doc/proposals/composable-integration-in sdk.md

+Using the Composable SDK means that the CRD needs to have permission to access objects that are being referenced.
+This means that the operator developer needs to consider what kinds of objects will be permitted for references and
+add appropriate permissions for the RBAC rules of the CRD.
+


Hi @vazirim,

Really thank you for share your ideas and contribute with, first at all 🥇.

I have a few questions/concerns which follow. Please let me know if I misunderstood something.

So, with this solution, in a CR of a CRD for example ( Memcached ), we can map many references for pre-exisent resources in the cluster and then, it will be used to find the data by the ResolveObject. Following code from the tutorial just to clarify my questions :-)

// Resolve the memcached instance resolved := &cachev1alpha1.MemcachedResolved{} comperr := sdk.ResolveObject(r.client, r.config, memcached, resolved, request.Namespace) if comperr != nil { if comperr.ShouldBeReturned { return reconcile.Result{}, comperr.Error } reqLogger.Info("Error during resolve, but not returned", comperr.Error) return reconcile.Result{}, nil }

How it will deal with error scenarios, for example, if 3 of 10 resources mapped for Memcached cannot be found? Will the Memcached resource be created or not? Has it a way to tell that some ref is optional?

If the object, for example, a ConfigMap be found but it has not the key mapped in the CR how will it work?

How would be the performance of it if the user adds too many resources refs to be "resolved"?

How to validate the data in the CR? Has it already something implemented for that?

Also, I think that would not be possible to call forever the reconcile in the case of something nothing works.

And, in real scenarios, I am afraid that would not possible to solve it by changing the options in the manager to re-trigger the reconcile every few seconds as in the demo. It would need to have a watch/observer impl for each resource referenced.

WDYT folks @estroz @joelanford @shawn-hurley ?

I would add one more, which is should there be some way to specify a default if it can not be resolved?

Hi @camilamacedo86,

Thanks for your feedback!

If 3 out of 10 resources cannot be found, then ResolveObject returns an error. It either resolves it all, or nothing. It also employs caching to get a consistent view. The cache is made fresh for each invocation of ResolveObject.

If the object is found, but the data in it doesn't exist, then ResolveObject also returns an error.

The performance hasn't been an issue so far because we have only seen CRs with a handful of references and no more. Resolving those references amounts to looking up in etcd. In any case, caching also alleviates performance.

We have a validation that happens inside the Composable operator controller (so that the underlying template is well-formed). For the Composable-SDK, since the developer is modifying the CRD definition, and the open-api spec of the CRD, we get validation through that.

The decision to retry is up to the consumer of the composable-SDK. But any operator will have a similar problem -- you need to decide for various error scenarios whether it makes sense to requeue another Reconcile cycle or not. So this is not specific to composable-SDK.

Sure, having watchers works as well. In the operators we have developed, we like to resync periodically to check for the health of what the operator is managing.

--Mandana

@shawn-hurley, yes we had thought of having defaults in the context of the Composable operator because it helped with validation (we could do a dry-run deployment). But we opted for not having defaults. The idea being that we want to have the ability of resolving data-dependencies dynamically. So if a resources has not been created yet, we retry until it's online. We have found this use case to be useful.

@shawn-hurley The latest version of the Composable SDK (v.0.1.4) has support for default values.

joelanford · 2020-01-10T15:43:53Z

doc/proposals/composable-integration-in sdk.md

+The Composable SDK offers the following function for resolving the value of cross-resource references.
+
+```golang
+func ResolveObject(r client.Client, config *rest.Config, object interface{}, resolved interface{}, composableNamespace string) *ComposableError 


What does ResolveObject() use the rest.Config parameter for? I'm curious if we can get by passing in just the client.Client?

The config is used to get a CachedDiscoveryInterface (k8s.io/client-go/discovery), which is used to lookup a resource kind given a kind and an apiversion. This in turn gives information that is used to lookup the reference object.

So, it means that it will not respect the operator scope. For example, if we pass just an ns for the manager here it would be looking for in all ns of the cluster. Am I right?

If yes, IHMO it is something that needs to be changed.

If the object reference have no namespace specified, then it will look only in the namespace of the manager. This gets passed as the composableNamespace parameter to ResolveObject(). That's the default namespace to use.

If the object reference has a namespace specified, it will look there instead.

The Config is used to get a CachedDiscoveryInterface and all we need to know is to see if the kind of the object reference is namespaced or not. This will inform how to create a NamespacedName for object lookup. If it is namespaced, then the convention above is used to determine the namespace.

Hi @vazirim,

Tks for the clarifications.

IHMO in order to the Compasable work aligned with it would need to use the manager/cache which is provided instead of a new cache. Otherwise, shows that It would allow for example, besides the operator be set up in the manager/cache with a List of namespaces looking for the resource out of this scope. Also, it can have more options setup by default which IHMO should be respected in all project. Is it make sense?

PS.: The manager can be set up with some options. Regards the ns it can be "" which will allow cache all cluster, or a list of namespaces or just 1 namespace. See: https://godoc.org/sigs.k8s.io/controller-runtime/pkg/manager#Options

Please, let me know if I misunderstood something as well.

The ResolveObject is indeed using the client provided by the controller:

func ResolveObject(r client.Client, config *rest.Config, object interface{}, resolved interface{}, composableNamespace string) *ComposableError {

The CachedDiscoveryInterface is obtained from the controller's client as well (the client passed into the ResolveObject)...

I am not sure I understand the concern (?)

Thanks for the explanation @vazirim! I see now why the rest.Config is important. But now I have a follow-up. :)

Have you considered making a ResolvesObject interface:

For example,

type ResolvesObject interface { // ResolveObject resolves object references. It uses a context // for cancellation. ResolveObject(ctx context.Context, in, out interface{}) error }

And a struct that implements it:

type KubernetesResourceResolver struct { defaultNamespace string client client.Client resourcesClient discovery.ServerResourcesInterface }

With this, the resolver could be setup and included as a field in the reconciler implementation, it would simplify the user-facing API, and it would make it more extensible for other possible future use cases (and writing unit tests).

@joelanford The latest version of the Composable SDK (v0.1.4) now has an interface like you have suggested. In addition to the ResolveObject interface, we have the struct that implements it:

type KubernetesResourceResolver struct { Client client.Client ResourcesClient discovery.ServerResourcesInterface }

We simplified and got rid of the default namespace here.
The resolveObject now has this interface:

func (k KubernetesResourceResolver) ResolveObject(ctx context.Context, object interface{}, resolved interface{}) error

The assumption is that the object passed has a namespace, and that serves as the default namespace. We also got rid of ComposableError and added functions to determine the nature of the error (see below).

The assumption is that the object passed has a namespace

Forgive me if this has already been answered or addressed. Does this mean that cluster-scoped resources are not supported? Or can I pass an object where namespace is empty and that means cluster-scoped?

Cluster-scoped resources are supported as well. We check to see if the reference is to a cluster-scoped resource or not. If not, and if the namespace is omitted from the GetValueFrom reference, then we use the default namespace, which is the namespace of the object itself (that contains the reference).

doc/proposals/composable-integration-in sdk.md

joelanford · 2020-01-10T16:12:47Z

doc/proposals/composable-integration-in sdk.md

+
+The `ResolveObject` function uses caching for looking up object kinds, as well as for the objects themselves, in order
+to ensure that a consistent view of the data is obtained. If any data is not available at the time of the lookup,
+it returns an error. So this function either resolves the entire object or it doesn't -- there are no partial results.


I think there's at least one interesting scenario that operator developer's using these object references would need to consider, and that is that the object resolves successfully and is created, and then at some point later, one of the referenced objects in a GetValueFrom field is deleted.

In that case, it looks like ResolveObject() will fail, but the caller may want to have some custom logic to handle the specific reason for the failure.

For example, if the error was a temporary connection issue, a caller would want to retry. But if the error is that a referenced object is not found, I probably still want to start retrying, but I might also want to delete the other objects that I created during a previous reconciliation that relied the resolved object that is now no longer resolving. Does that make sense?

Does the returned ComposableError.Error field have enough detail to allow this sort of decision making in the calling code?

Currently ComposableError does not have this level of detail.

joelanford · 2020-01-10T16:20:46Z

doc/proposals/composable-integration-in sdk.md

+}
+```
+
+The type `ComposableError` contains the error, if any, and a boolean `ShouldBeReturned` to indicate whether the calling reconciler should return this error or not. This is used to distinguish errors that are minor and could be fixed by immediately retrying from errors that may indicate a stronger failure, such as an ill-formed yaml (in which case there is no need to retry right away). If there are no errors, then the `ResolveObject` returns nil.


Given my previous comment, it seems like there could be a gray area for deciding how ShouldBeReturned should be set. I wonder if it would make sense to expose a separate helper function in the composable SDK to handle this such that ResolveObject() could return the error directly.

For example:

func IsUnresolvableError(err error) bool { ... }

err := composable.ResolveObject( ... ) if composable.IsUnresolvableError(err) { return reconcile.Result{}, err } else if apierrors.IsNotFound(err) { // do some extra logic } else { return reconcile.Result{}, err }

This way callers could use composable's helper function OR define their own custom error handling, and it would get error handling logic out of ResolveObject.

Yes it's a good idea to separate this concern, and add more detail to the kinds of error situations that can happen. Will open a work item.

@joelanford The latest version of Composable SDK (v0.1.4) got rid of ComposableError. We now have a series of functions as you suggested:

func IsIllFormedRef(err error) bool func IsKindNotFound(err error) bool func IsObjectNotFound(err error) bool func IsValueNotFound(err error) bool func IsRefNotFound(err error) bool

IsIllFormedRef means there is a problem with the way the cross-resource reference has been written in the yaml. IsKindNotFound means the kind for the reference does not exist. IsObjectNotFound means the object does not exist, and IsValueNotFound means the value does not exist within the object. The last one simply means either IsKindNotFound, or IsObjectNotFound, or IsValueNotFound is true.

camilamacedo86 · 2020-01-14T20:19:47Z

doc/proposals/composable-integration-in sdk.md

+Rather than hardwiring the same value everywhere, I can define cross-resource references to a unique configmap that contains 
+the data. This makes it easier to change the data if needed, and can allow all resources to be configured
+dynamically (i.e. changes in the configmap are absorbed and resource that point to it are updated).
+


The same need could be solved by just GET object in the controller and then, updating the CR resource with the values. Not necessary is required to add all values used in the CR. It can be obtained programmatically, dynamically and in realtime as well.

IHMO the user's stories needs to be improved because the idea here would not be allowed something that is not possible today. It is just to add a facility, I mean another way to get RefFrom without need to implement it.

The idea is that rather than hardwiring that in the controller, each CR of the CRD could have fields that point to different objects. So this gives a lot of flexibility.

camilamacedo86 · 2020-01-15T20:39:58Z

doc/proposals/composable-integration-in sdk.md

+
+To add support for cross-resource references in CRD definitions and controllers (Composable SDK) and make this
+functionality available through the Operator-SDK.
+


I think it is missing: How it would be added in the SDK?

We would add the types that are here:
https://github.com/IBM/composable/blob/master/sdk/composable-types.go

so that operator developers can use the ObjectRef to specify that fields have this type.

We also add the method ResolveObject (possibly with a modified interface so that the receiver is the client instead of the first argument) as one more way in which the developer can access objects in etcd (like Get, Update etc...)

I think would be nice to have your thoughts on how it should be added to the project too in the doc. Really tks for all.

ok will add that.
Thanks for all the feedback!

@camilamacedo86 I changed the proposal doc according to the changes above for the interface.

dmesser · 2020-01-20T13:32:10Z

doc/proposals/composable-integration-in sdk.md

+## Summary
+
+Kubernetes object specifications often require constant values for their fields. When deploying an entire application
+with many different resources, this limitation often results in the need for staged deployments, because some resources have to be deployed first in order to determine what data to provide for the specifications of dependent resources. This undermines the declarative nature of Kubernetes object specification and requires workflows, manual step-by-step instructions and/or brittle automated scripts for the deployment of applications as a whole.


What does it mean for the referencing object and its lifecycle when the referenced object does not exist yet? Will it be created?

@dmesser If the referenced object does not exist, then the referencing object's controller will keep retrying until that object is created. When that object is created, then the controller can resolve the referencing object and continue with whatever it needs to do.

Interesting! How long would it keep trying?

That depends on the consumer of the library. In the memcached tutorial, this is done indefinitely.

We can imagine having a message in the Status of the object saying that it is waiting for some object to be created. If there is a bug in the spec of the object reference, the user could catch that and fix it. In any case, when the error is fixed or the object is created, then the controller can resolve the referencing object and move forward.

Makes sense. @joelanford I think if we were to take this in, it would probably be a good default behavior to write out a status when waiting on referenced values.

Signed-off-by: Mandana Vaziri <mvaziri@us.ibm.com>

vazirim · 2020-04-07T04:04:12Z

I wonder if there are any plans for merging this proposal?
TIA

estroz · 2020-04-23T19:21:25Z

@vazirim some thoughts on this contribution:

The SDK should definitely make composable visible to operator authors, as it solves advanced use cases. However I'm concerned about the maturity of the composable operator and the community commitment required to support a potentially immature project in the SDK itself. My suggestion is to add the helpers proposed here to an operator-framework library and/or discuss composable usage in documentation.

To further discuss how to proceed I'd like to nail down the specifics of contributing, with graduation criteria, in #2770. For now this PR should stay open while we figure out a contribution methodology.

vazirim · 2020-04-24T16:02:35Z

@estroz would it help if I did a code PR against operator-sdk?

estroz · 2020-04-24T19:18:33Z

@vazirim not quite. To put what I said above differently: until we figure out a general framework for contributing code from external projects to operator-framework ones, neither this proposal nor its code should be merged.

joelanford · 2020-04-29T21:11:03Z

@vazirim Apologies for being MIA on this proposal and leaving it hanging so far! And thanks for popping it to the top of the stack.

We haven't had a community proposal of this scale before so you're unfortunately the guinea pigs and we were unfortunately caught flat footed and unprepared when all of the questions were answered and the next step was to merge. 😳 🤣

In the coming weeks we will be putting together docs and guidelines for accepting community-contributed features of this scale and establish some governance and procedures.

vazirim · 2020-04-30T13:38:21Z

@joelanford, @estroz Thanks for the update!

openshift-bot · 2020-07-29T13:50:53Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

vazirim · 2020-07-29T14:13:19Z

Hi @joelanford, this PR is going stale!
Any news on the integration?

camilamacedo86 · 2020-07-29T14:34:32Z

/remove-lifecycle stale

joelanford · 2020-07-29T15:06:42Z

@vazirim I opened operator-framework/community#22. I should have linked that in this PR already!

It adds some governance around projects in the operator-framework org. Our current thinking is to bring this (and other similar features) in as "incubating" projects with a separate repository to let them mature and build a community within operator framework before "graduating".

I think we need to get the project governance agreed upon and merged, and then we can proceed with this as the first candidate for incubation. Sorry this is taking so long!

openshift-bot · 2020-10-31T19:35:55Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-11-30T21:32:12Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-12-30T23:21:34Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-12-30T23:21:36Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Composable integration in Operator-SDK

da714e9

openshift-ci-robot requested review from camilamacedo86 and theishshah December 17, 2019 02:23

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 17, 2019

vazirim changed the title ~~Composable integration in Operator-SDK~~ Proposal: Composable integration in Operator-SDK Dec 17, 2019

vazirim changed the title ~~Proposal: Composable integration in Operator-SDK~~ Proposal Doc: Composable integration in Operator-SDK Dec 20, 2019

camilamacedo86 reviewed Jan 8, 2020

View reviewed changes

joelanford reviewed Jan 10, 2020

View reviewed changes

camilamacedo86 reviewed Jan 14, 2020

View reviewed changes

camilamacedo86 reviewed Jan 15, 2020

View reviewed changes

dmesser reviewed Jan 20, 2020

View reviewed changes

vazirim added 3 commits February 11, 2020 15:31

Changes to composable proposal doc

4d1b78d

Signed-off-by: Mandana Vaziri <mvaziri@us.ibm.com>

Merge remote-tracking branch 'upstream/master'

9d36f71

fix to composable proposable doc

4917ce4

Signed-off-by: Mandana Vaziri <mvaziri@us.ibm.com>

camilamacedo86 added the needs discussion label Feb 26, 2020

metacosm mentioned this pull request Apr 24, 2020

Investigate which features (if any) could be included to ease operator development #2770

Closed

1 task

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2020

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 30, 2020

openshift-ci-robot closed this Dec 30, 2020


		To add support for cross-resource references in CRD definitions and controllers (Composable SDK) and make this
		functionality available through the Operator-SDK.

Uh oh!

Proposal Doc: Composable integration in Operator-SDK #2340

Proposal Doc: Composable integration in Operator-SDK #2340

Uh oh!

Conversation

vazirim commented Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camilamacedo86 Jan 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vazirim Feb 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joelanford Jan 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

camilamacedo86 Jan 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

vazirim commented Dec 17, 2019 •

edited

Loading

camilamacedo86 Jan 8, 2020 •

edited

Loading

vazirim Feb 11, 2020 •

edited

Loading

joelanford Jan 10, 2020 •

edited

Loading

camilamacedo86 Jan 14, 2020 •

edited

Loading