Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Updated Binding Credentials #66

Closed
jkbschmid opened this issue Aug 12, 2021 · 31 comments
Closed

Question: Updated Binding Credentials #66

jkbschmid opened this issue Aug 12, 2021 · 31 comments

Comments

@jkbschmid
Copy link
Contributor

jkbschmid commented Aug 12, 2021

Hello,

there are some services that require a re-bind after a certain time period to update the service credentials (e.g. for renewing certs).
Does the operator support a mechanism (e.g. via an annotation) to trigger such a re-bind?

In theory, it is possible to create a second binding with a different name, but this would require changing the deployment yaml on each update. That's why it would be much more convenient to have an annotation or similar.
Thank you!

@pavelmaliy
Copy link
Collaborator

Hi,
No there is no such mechanism.
I am not sure how it should be implemented, you mean that periodically BtpServiceOperator will delete bindings and create new ones with the same name according to annotations?
There is no update for binding

@jkbschmid
Copy link
Contributor Author

Hi there,

BtpServiceOperator will delete bindings and create new ones with the same name according to annotations?

Yes, that's it.
Our use case is this:
We bind to a service that supplies us with a certificate that is valid up to 1 month. So each month we are forced to "recreate" the binding credentials to obtain a valid certificate.
If we delete the binding and create it again, pods will not be able to start after the old secret's deletion until the new secret is created.
If we create a new service binding we have to modify the CF manifest to reference the new corresponding secret.

There is no update for binding

You are right, CF does not provide a mechanism to update a binding.

However, if the operator kept a mapping between a ServiceBinding in K8s and the actual binding in the service manage the operator could transparently update the binding, e.g. like so:

  • Update is triggered, e.g. via a "rebind"-annotation on the K8s ServiceBinding
  • Create a new binding "my-binding-green" in the service manager
  • Update the K8s secret with the new credentials
  • Delete the old binding in the service manager (that has the name "my-binding" or "my-binding-blue")
  • Delete rebind-annotation

@pavelmaliy
Copy link
Collaborator

Thanks for the feedback, we'll raise your request to our PO and try to find appropriate solution to cred rotation.

@loewenstein
Copy link

Wouldn't it anyway be safer to enforce a Deployment rollout @jkbschmid? If you mount the Secret as directory the container will see the binding credentials updated (which at least was not the case when mounting the Secret as a file), but the application also needs to watch the file system and read the new content.

In contrast, if you simply rollout the Deployment with the new Secret, the newly started application process will pick up the new certs automatically.

@jkbschmid
Copy link
Contributor Author

@loewenstein I think any kind of Pod restart to make the application re-read binding creds is necessary, no matter if rollout or blue/green.
IMO, the question is more: How do I update the actual binding secrets in K8s? I'd like to avoid creating a new binding, because then I'd have to adapt the deployment.yaml where the new binding must be referenced.

@loewenstein
Copy link

Ah, ok @jkbschmid. I think I understand the use case better now. I am not sure who could implement this and how. It is something on the edge between service instance/binding management and workload management. I.e. you must not delete the binding as long as any Pod is still using the credentials of the old binding. Unfortunatly, that is knowledge only the owner of the workloads has.

@MatthiasSchmalz
Copy link
Member

It is important to only delete old bindings after there is no app instance left which uses its credentials. For certificate based credentials this is not critical, but there can still be services with binding level secrets. Those would become invalid in the moment the binding is dropped and cause errors if it is still in use.
Therefore the correct sequence is:

  • Create new bindings
  • Restart all app instances and ensure the all are using the new credentials
  • Delete old bindings

In CF this problem is solved as a side effect of how blue green deployment is done by deploy service:

  • Create new apps (apps have suffix -blue or -green)
  • Bind them (new bindings)
  • Start the new apps
  • Shut down old apps
  • Unbind old apps and delete them

@loewenstein
Copy link

I am wondering if the Kubernetes Service Binding Spec could be part of a solution for this.

The btp-service-operator ServiceBinding would need to conform to a Provisioned Service, providing a reference to the Secret via status.binding.
The application workloads and any libraries used would need to find credentials considering the Workload projection definition. I.e. searching for mounted credentials in $SERVICE_BINDING_ROOT either by knowing the binding name (and use $SERVICE_BINDING_ROOT/<binding-name>) or by searching the directory for bindings with specific tags (like some libraries might already do searching the VCAP_SERVICES JSON structure).
A service binding controller (there's a reference implementation at https://github.com/servicebinding/service-binding-controller, but we might need to engage/invest into this), watches resources and provides the glue.

The btp-service-operator ServiceBinding could have a configuration for regular and/or on-demand credential rotation. When rotation happens it would create a new OSB binding delivering a new Secret to the cluster and update the status.binding.
The service-binding-controller would take the new Secret and mutate the volume and volume mounts in the workload, which would trigger a rollout.
The service-binding-controller would watch the rollout and update the Ready condition.
The btp-service-operator could watch the Ready condition and delete the old OSB binding, including the Secret.

I am still checking if all my assumptions are correct (see Kubernetes Slack)

@github-actions
Copy link

github-actions bot commented Nov 8, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Nov 8, 2021
@MatthiasSchmalz
Copy link
Member

I assume this issue is still valid and should not be closed.

@pavelmaliy pavelmaliy removed the Stale label Nov 8, 2021
@github-actions
Copy link

github-actions bot commented Dec 9, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Dec 9, 2021
@loewenstein
Copy link

I believe this is still relevant.

@github-actions github-actions bot removed the Stale label Dec 10, 2021
@github-actions
Copy link

github-actions bot commented Jan 9, 2022

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 9, 2022
@loewenstein
Copy link

I believe this is still relevant.

@github-actions github-actions bot removed the Stale label Jan 10, 2022
@pavelmaliy
Copy link
Collaborator

After consulting with our PO, we'll have BLI for 2A for cred rotation.

@loewenstein
Copy link

I wasn't really reviewing that PR, just being curious as to the solution and the constraints involved.

Having placed some thoughts and a little research into it, I am sure there is no easy, straightforward solution anyway.

If I understand your reply correctly, the constraint on workloads (or better on workload owners) is that they must not rotate credentials more than once without marking sure all instances of the workload were restarted and had a chance to pick up the new binding secret.

I suppose this is the best you can get without requiring coordination with the workloads. But it does indeed sound limited in a sense that fully automated rotation with a given schedule are dangerous and can easily cause outages.

@pavelmaliy
Copy link
Collaborator

pavelmaliy commented Jan 16, 2022

As I see it,
you can always delete and create the binding as much as you want,
If you prefer automated rotation you need to enable it and provide rotation interval, during that period you need to adjust
your workloads (if they're using volumes it will be updated automatically) because in next rotation the old binding will be removed and current binding will become old.

I expect that rotation interval will be months and not minutes or hours so it should be enough.

But it does indeed sound limited in a sense that fully automated rotation with a given schedule are dangerous and can easily cause outages

Isn't it all about? otherwise you can create your own secret, reference workloads to it and copy binding secrets into it using whatever strategy suits you.

@jkbschmid
Copy link
Contributor Author

Great to hear you are working on this issue. Thanks, @pavelmaliy !
Once you're ready, I'd also love to see a design document or some explanation of how you're planning to implement the change. This way, we can address upcoming questions and potential issues already in the design phase.

@pavelmaliy
Copy link
Collaborator

pavelmaliy commented Jan 17, 2022

Hi,
We'll start working on detailed design on 2A currently I am just playing around, basically there are 2 approaches:

Rotate the binding directly in SM and override the current binding secret

  • Pros: Simple approach need to "trick" BTP operator that there is no binding in SM and the operator will create it again.
  • Cons:
    1. There are actually two bindings but in cluster you'll see one binding and in cockpit 2
    2. Workloads need to adjust themselves during next rotation interval as the old binding will be removed
    3. You have no access to old binding it automatically deleted next rotation or on deletion of the CR.

Introduce new CRD RotatableServiceBinding which will maintain 2 or more bindings underneath and maintain his own secret which will be updated with latest binding (Resembles to user-provided-service in CF)

  • Pros: What you see is what you get cockpit and cluster are aligned
  • Cons:
    1. Another CRD which has almost duplicate structure as service binding with rotation config.
    2. Need to propagate the originating user to broker somehow because the real binding is created by the system.
    3. Some properties can not be set by user anymore like ExternalName (name you see in cockpit) as it must be unique
    so it will be generated now.

Thoughts?

@freegroup
Copy link

freegroup commented Feb 3, 2022

we have the very same problem for the binding rotation and build a small operator for the KLACKS Validation App which handles this for us. https://github.tools.sap/kernelservice-validation/core-binding-rotation ( or https://pages.github.tools.sap/kernelservice-validation/documentation/setup/svc.html#./content/090-ias-credential-rotation/README ) Maybe this is an option until the rotation is part of the operator.

BUT - if the operator rotates the binding, you need maybe another operator to do related stuff. e.g. push the cred to the SubscriptionManager...in case you are using the IAS as OSB.

@freegroup
Copy link

I see that different approaches are also being discussed here. With the components I mentioned, we follow the approach that we have two valid bindings at a certain time. This is important for us, because at the moment of the invalidation/update of the binding we may establish a connection to the server to which the binding belongs. If we invalidate the binding at this time, the service may reject the connection request. Which happens sometime in our setup (rare).

So there is always a short time when a binding is not valid. In our approach, the obsolete binding is deleted only after all consumers are working with the new binding. Thus here never a connection request runs into a HTTP-401

@jkbschmid
Copy link
Contributor Author

Thanks for sharing the drafts, @pavelmaliy !

Thoughts?

Regarding option 1):
I'm not sure if I understand the solution 100%.
Does "direct rotation" mean the "old" binding will be deleted once the new one is there?

For me the question is how the "rotation interval" is defined, i.e. when will the operator delete bindings.
I think it would make sense to allow the consumer to control this setting, since different bindings have different expiration dates.
For example, some certificates expire in one month, other credentials may expire sooner.

I also think that, in the long run, revocation should be supported. I.e., a consumer should be able to instantly trigger a binding rotation.

Adding these features to the existing CRD would make it too complex, that's why I'd prefer option 2 with a separate CRD.

@pavelmaliy
Copy link
Collaborator

pavelmaliy commented Feb 3, 2022

Hi,
Thanks for your valuable inputs, we had an internal discussion,
In general adding new CRD not so nice approach because it pretty much makes ServiceBinding redundant most of the consumers will want cred rotation (at least in SAP),

Also it's not so nice behavior if I create one CR and get in my namespace multiple ServiceBindings and Secrets that consumer not sure what they are.

Basically the idea is to add additional configuration to ServiceBinding something like:

"credRotation":{ "enabled":true, "every":"3M" "keepOld":"2W" }

During the transition there will be 2 bindings:
B1 and B1_OLD after "keepOld" time ended B1_OLD is deleted

Technically what allows this implementation be easy is the cluster recovery flow we have, which in case no bindingID on the status tries to lookup binding in SM and if it's not found, BTP Operator will create new binding in SM.
So rename the original binding in SM and removing bindingID from status will trigger the rotation without any code required.

I also think that, in the long run, revocation should be supported. I.e., a consumer should be able to instantly trigger a binding rotation.

It should be easy one, we can use some sort of annotation to trigger the rotation right now which will be removed after successful rotation by the operator.

Will it fit your requirements?

@patrickhuy
Copy link
Member

That sounds nice to me. There should be some way to learn about when the rotation has happened (maybe a field in the ServiceBinding status?) so that it can be watched/reacted on.

Would it be possible for the operator to also trigger an application restart/rollout (the way kubectl rollout restart does it - which is basically adding an annotation to the deployment/statefulset spec) or would that be completely out of scope? Effectively this is likely what most applications will need to do and just "hoping" that a regular deployment/restart happens in the "keepOld" period sounds not very sustainable.

@loewenstein
Copy link

In general adding new CRD not so nice approach because it pretty much makes ServiceBinding redundant most of the consumers will want cred rotation (at least in SAP),

Also it's not so nice behavior if I create one CR and get in my namespace multiple ServiceBindings and Secrets that consumer not sure what they are.

Just a thought on the redundant CR issue. Maybe we could have a ServiceBindingSet that has, among the rotation configuration, a ServiceBinding template - just like ReplicaSet has a Pod template.
I haven't really thought this through though and beside that I agree with @patrickhuy - that not integrating this with workload rollout could be risky - the credRotation proposal above with options for when rotation happens and when cleanup happens doen't seem completely off either.

@pavelmaliy
Copy link
Collaborator

@loewenstein actually ReplicaSet and Pod this is the exact example we discussed, the problem here is that user can't defines names of bindings in SM, and then in cloud cockpit or CLIs he'll see bunch of guids it's not acceptable by our PO,

Another problem is that Brokers need user info during binding but the binding created by "ServiceBindingSet" so the user needs to be propagated.

@patrickhuy

There should be some way to learn about when the rotation has happened (maybe a field in the ServiceBinding status?) so that it can be watched/reacted on.

of course this is the plan

regarding trigger an application restart/rollout I think not in the first phase, volumeMounts will be updated automatically though and its the best practice.

@patrickhuy
Copy link
Member

Just a thought on the redundant CR issue. Maybe we could have a ServiceBindingSet that has, among the rotation configuration, a ServiceBinding template - just like ReplicaSet has a Pod template.

that would mean that we would get one secret per ServiceBinding, right? Consequently it would mean that we need to change the secret referenced from the Deployment, that would not be great.

regarding trigger an application restart/rollout I think not in the first phase, volumeMounts will be updated automatically though and its the best practice.
How is volumeMounts best practice? Configuration via the environment is a 12 factor app capability. (https://12factor.net/config)
Even when using volumeMounts most applications are likely not going to monitor the file system for changes and be able to live reload it (do you see that as a requirement?)

I do agree that restarts/rollouts are not needed in the first phase. Nevertheless no matter whether the binding is exposed via volumeMounts or via the environments a restart of the workload is oftentimes the right thing after the change in bindings (for comparison: Cloud Foundry changes bindings during re-staging) - I think this is only feasible if at maximum the task is to trigger a rollout of a Deployment/StatefulSet, changing the used secrets is likely too complicated.

@pavelmaliy
Copy link
Collaborator

that would mean that we would get one secret per ServiceBinding, right? Consequently it would mean that we need to change the secret referenced from the Deployment, that would not be great.

No ServiceBindingSet will have his own secret which always updated with the latest ServiceBinding and deployment will use this Secret but it was just a thought I don't believe this option will be chosen

@patrickhuy I agree regarding the monitoring the file system besides I can't control how secrets are consumed and where.
For now (if at all) its not responsibility of BTP operator to find all consumers of the Secret (not necessarily in same namespace see reflector) and restart/rollout them.

@patrickhuy
Copy link
Member

patrickhuy commented Feb 7, 2022

@patrickhuy I agree regarding the monitoring the file system besides I can't control how secrets are consumed and where. For now (if at all) its not responsibility of BTP operator to find all consumers of the Secret (not necessarily in same namespace see reflector) and restart/rollout them.

I agree! I think if the operator would be able to restart Deployments we should list the Deployments for it to restart within the ServiceBinding (probably via a selector?) and it should be limited to the current namespace (because otherwise it makes security considerations harder)

@pavelmaliy
Copy link
Collaborator

Hi,
I've merged the cred rotation feature, please note it won't be available until SM reaches live (a week),
So it not released yet, the documentation was updated.

@pavelmaliy
Copy link
Collaborator

please take the latest version cred rotation available there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants