Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: OADP non-admin use cases #1043

Open
10 tasks
weshayutin opened this issue May 31, 2023 · 8 comments · May be fixed by migtools/oadp-non-admin#18
Open
10 tasks

Design: OADP non-admin use cases #1043

weshayutin opened this issue May 31, 2023 · 8 comments · May be fixed by migtools/oadp-non-admin#18
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Milestone

Comments

@weshayutin
Copy link
Contributor

weshayutin commented May 31, 2023

Context:

The work here should be build a CRD's and controllers that would intercept all backup, restore, bsl creation and check if the user has the correct permissions to do the action.

New CRD's

Controller: https://github.com/migtools/oadp-non-admin/

  • All non-admin cr's are routed to new controller, user info taken.
    • Checks permission for user A to do Action X
    • For example User A wants to backup namespace foo. If user A has view/edit access to namespace foo, create the velero backup CR and send to velero.
    • If the user does NOT have permission, send an error back on the creation of the OADP/Velero non-admin cr.

Scope:

  • non-admins will ONLY be able to backup in a single namespace. Any cluster objects like a PV wil be included by the "auto" selection in the velero backup cr.
  • non-admin is not supported cross cluster

Spec:

  • non-admin backup cr spec ( super set of velero backup cr w/ user, namespace, permissions )
    • annotate backup cr's with user information
  • non-admin restrore cr spec ( super set of velero restore cr w/ user, namespace, permissions )
  • Action: Create / Delete backups by non-admin
  • Action: Create / Delete restores by non-admin
  • Create BSL by non-admin
  • Provide Status ( but not logs )?
  • Upstream Velero Design or discussion post
  • client tool like velero to list backups and restores.
  • non-admin operator - 3 controllers in 1 pod. 1 backup, 1 restore, 1 bsl, possible garbage collection
  • [ ]

Related work:
https://issues.redhat.com/browse/OADP-203
https://github.com/weshayutin/oadp-operator/tree/tekton-non-admin/docs/tekton-oadp-nonadmin
https://github.com/shawn-hurley/prototype-non-admin

@shubham-pampattiwar
Copy link
Member

  • We should discuss more about what kind of user info do we need ?
  • How do we decide the particular user has appropriate access to a resource to be backed up, eg PVC etc ?
  • Should we make use of admission webhooks for backup/Restore CR creations ?
  • Will there be a different BSL per user or a shared one ? How will this scenario work ?
  • What if there is a change in during the Backup/Restore operation, how do we handle this scenario ?

@shawn-hurley
Copy link
Contributor

I would suggest that a potential to consider here is that a new CR that is specifically for this purpose is used. Something like NamespacedBackup, or something similar, that when a user has access to that CR they are able to backup the Namespace. this will allow you to create a NamespaceRestore that must reference the NamespacedBackup as a local reference.

I think that some things to be very concerned with, is that Velero is single thread AFAIK and I think it needs to be a prequisite to make it so that it can backup multiple namespaces at a time. If this is not fixed, then no amount of good secure implementation here is going to matter IMO because it will be unusable.

@shawn-hurley
Copy link
Contributor

shawn-hurley commented Jun 14, 2023

@shubham-pampattiwar Answes as I see them:

We should discuss more about what kind of user info we need ?

if you use the webhook, you can do the SAR requests to the objects that you want to make sure the user has access to. The user info that you get in the webhook is all you need to ask these questions of the RBAC system.

How do we decide the particular user has appropriate access to a resource to be backed up, eg PVC etc ?

I would think that we make it clear that a user who has access to this resource in this namespace gets access to all the resources in the namespace, similar to admin. One thing that we need to make sure of is if a role binding is bound to a cluster role, we need to either downgrade to a role OR we need to assume that something else can restore that. and that we don't back that up and that you can't restore it. From a user/non-admin restore operation.

Should we make use of admission webhooks for backup/Restore CR creations ?

Yeah, I agree; this needs to be there to at least do some permissions if you don't create new CRs and attempt to re-use the existing ones.

Will there be a different BSL per user or a shared one ? How will this scenario work ?

I think that a Namespaced Backup can only use a BSl that is defined in the namespace of the backup, OR we create a ClusterBackupStorageLocation, that an admin can define for everyone.

What if there is a change during the Backup/Restore operation? How do we handle this scenario ?

I need more context, but do you mean like the RBAC system? I think this is an interesting use case, I think that if they are removed they would be unable to use the objects anymore so I don't think there is a escalation concern but we should make sure.

I hope that this helps, I have thought about this in the past, and we are having similar concerns that we had with OLM, where some stuff is just ClusterScoped by nature and tried to make it non-admin.

@weshayutin
Copy link
Contributor Author

@shubham-pampattiwar Answes as I see them:

We should discuss more about what kind of user info we need ?

if you use the webhook, you can do the SAR requests to the objects that you want to make sure the user has access to. The user info that you get in the webhook is all you need to ask these questions of the RBAC system.

How do we decide the particular user has appropriate access to a resource to be backed up, eg PVC etc ?

I would think that we make it clear that a user who has access to this resource in this namespace gets access to all the resources in the namespace, similar to admin. One thing that we need to make sure of is if a role binding is bound to a cluster role, we need to either downgrade to a role OR we need to assume that something else can restore that. and that we don't back that up and that you can't restore it. From a user/non-admin restore operation.

I agree that access is namespace based and it's 100% access to the namespace or no access.

Should we make use of admission webhooks for backup/Restore CR creations ?

Yeah, I agree; this needs to be there to at least do some permissions if you don't create new CRs and attempt to re-use the existing ones.

I'm not sure if you are referring to dynamic admission control, I think you are. If the admission control is scoped to only check api requests for velero than we might be ok. A large customer already nacked using dynamic admission control via policy, the context here is a little different I think. Needs discussion.

Will there be a different BSL per user or a shared one ? How will this scenario work ?

I think that a Namespaced Backup can only use a BSl that is defined in the namespace of the backup, OR we create a ClusterBackupStorageLocation, that an admin can define for everyone.

Let's talk about this one, I think we want a non-admin to create their own bsl and only have access to bsl's they create. Not 100% sure, needs discussion.

What if there is a change during the Backup/Restore operation? How do we handle this scenario ?

I need more context, but do you mean like the RBAC system? I think this is an interesting use case, I think that if they are removed they would be unable to use the objects anymore so I don't think there is a escalation concern but we should make sure.

  • more context.. but supposing.
  • A backup is running and permission to the namespace is removed, it would be too late to stop the backup.
  • what other scenarios are you thinking about here?

I hope that this helps, I have thought about this in the past, and we are having similar concerns that we had with OLM, where some stuff is just ClusterScoped by nature and tried to make it non-admin.

@shawn-hurley
Copy link
Contributor

shawn-hurley commented Jun 14, 2023

A backup is running and permission to the namespace is removed, it would be too late to stop the backup.
what other scenarios are you thinking about here?

If that does happen things should be fine, because the user would be unable to use the backup in anyone once finished right? And they wouldn't be able to restore etc. I think this should be fine and there is nothing to do IMO.

If there is another scenario that someone can come up with though would be interested in hearing about it.

@shawn-hurley
Copy link
Contributor

Something to condiser, if a namespace say X is deleted, how do you resotre that namespace from new namespace X or even new namespace A. This probably should be accomplishable right?

@shawn-hurley
Copy link
Contributor

A large customer already nacked using dynamic admission control via policy, the context here is a little different I think. Needs discussion.

I don't understand what this means or what they nacked, maybe we sould have a seperate discussion on this.

@weshayutin
Copy link
Contributor Author

discussion has moved to https://hackmd.io/qbYw8wq4RZSWEe2KZpzLaw
Once we have solid design written out in the hackmd we'll copy it back to this issue.

@weshayutin weshayutin self-assigned this Sep 20, 2023
@kaovilai kaovilai added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 27, 2023
@weshayutin weshayutin modified the milestones: 1.4.0, OADP-1.4 Dec 5, 2023
@kaovilai kaovilai linked a pull request Mar 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants