-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add multi-tenancy support discussions #1175
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1175 +/- ##
=======================================
Coverage 61.43% 61.43%
=======================================
Files 97 97
Lines 6179 6179
=======================================
Hits 3796 3796
Misses 2051 2051
Partials 332 332 ☔ View full report in Codecov by Sentry. |
36b8f2f
to
6d79fc9
Compare
5. As Ratify pods are isolated by namespace, each team can only access its own logs. | ||
|
||
|
||
#### Cons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another consideration is how do we handle which ED provider is invoked in the constraint template? Will there be a new ED provider per namespace? Will there be a namespace-based naming convention for the providers that must be implemented in the Constraint template to call the correct ED provider?
Or maybe the assumption is that each namespace will have its own dedicated Constraint Template + Constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each Ratify deployment, we should have a Provider
CR applied. And I checked GK code base, looks like for each admission review request, it will iterate through all providers to gather feedback. And the constraint template is clustered resource, we cannot have namespaced one.
### Deployment per Cluster | ||
#### Pros | ||
1. More resource efficent. Since there is only one Ratify deployment in the cluster, all tenants share the same Ratify pods and dependency resources. If the load traffic increases, admin could easily scale out the Ratify service accordingly. | ||
2. GK will work in the current way that sends out one ED request to Ratify for each admission review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does mutation work? For example, some namespace may enable and another may not enable. This would require scoping mutation Assign resources to the exact namespaces that have it enabled. Thus, requiring updates to the Assign Resources every time a new tenant is added. (maybe this is ok but definitely overhead)
3. Users only need to deploye once when enabling Ratify feature in the cluster. Afterwards, admins can configure CRs in their own namespaces. | ||
4. Since GK and Ratify can be deployed to the same namespace, we can enable `cert-controller` of GK to set up mTLS automatically. | ||
|
||
#### Cons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another con: how do we provide more granular control over some configs defined at the ratify service level like cache type and size and ttl and log level? All these are currently configured at ratify startup and not as CRs.
Ratify’s first stable version for production has been officially released, garnering increased adoption as an external data provider among a growing number of customers. In production environments, numerous clients are opting for the implementation of a multi-tenant architecture to enhance resource efficiency within their clusters. To support this scenario, Ratify needs to make some essential refactoring. Enclosed in this document are several proposed design approaches and their respective modifications, warranting further scrutiny and discussion. | ||
|
||
# Multi-tenancy category | ||
Firstly, the concept of multi-tenancy within a Kubernetes cluster can be broadly categorized into two distinct classifications: multi-team and multi-customer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to add this link as a reference for concept details? https://kubernetes.io/docs/concepts/security/multi-tenancy/
3. No need to add namespace to the external data request from GK to Ratify. We can keep the same request struct. | ||
4. Users could easily set up ResourceQuota and LimitRange to assign appropriate resources. | ||
5. As Ratify pods are isolated by namespace, each team can only access its own logs. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have many concerns on Deployment per Namespace. Users are really care about the resource usage and maintenance effort. That's being said users like light-weight solution and lower-maintenance solution.
In addition, Ratify as a an addmission controller, users may concern the single-point failure of the Ratify instance. Users will need to maintain multiple Ratify instances in different namespaces. This would be a pain point for users.
@binbin-li I have updated the e2e tests required based on k8s bump and gk bump. Please merge main when you have a chance. |
6d79fc9
to
9fd820a
Compare
9fd820a
to
58eab0e
Compare
From the perspective of user experience, it makes a great difference between 2 deployment strategy. | ||
|
||
### Deployment per Namespace | ||
1. Cluster admin or team admin must deploy a Ratify service in each required namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how we ensure must
? will we have one ratify controller to make sure each namespace has the ratify
deployment? or it will be cluster admin manual operation?
If it is automatically, will it cause unexpected node memory/cpu usage spike when new namespace is created? The ratify default cpu/memory request is not small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a valid concern. A new namespace would require a new Ratify deployment with related resources. And it might get worse when multiple namespaces are created at the same time
|
||
Notes: a Ratify service includes all dependency service, such as Redis. | ||
|
||
An overview of deployment per namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have detailed design on the CRD behavior change? i guess most of them will have namespaced level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is under discussion, and I'll have a detailed design soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All CRs would be updated to namespaced scoped. And we probably need some refactoring on cosign verifier and certStore to avoid mounting keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the info Binbin. Left some suggestion that helps to finalize the design doc.
* One solution is adding a namespace field to all Ratify logs. Although this option cannot avoid cross-tenant access without disabling `kubectl logs`, it can be used to filter logs in Kibana. | ||
* The other solution is to let Ratify partition logs by path. For example, logs from namespaceA are saved to `/var/log/namespaceA` and logs from namespaceB are saved to `/var/log/namespaceB`. This solution can avoid cross-tenant access without disabling `kubectl logs`. However, it requires Ratify to configure logrus output paths for each team explicitly. However, Ratify has to dynamically configure paths for all namespaces, including creating and deleting directories. | ||
|
||
#### File system/PVC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add priority to the areas that requires multi-tenancy support. The FileSystem is probably higher pri item and should be in listed towards the top of the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial plan was to have another doc for implementation details once we decide the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a brief work break down: https://hackmd.io/@H7a8_rG4SuaKwzu4NLT-9Q/By-jVTBST
58eab0e
to
ce2c330
Compare
684509f
to
34f9308
Compare
2ee1453
to
57675cf
Compare
Description
What this PR does / why we need it:
It's not a final design for multi-tenancy but includes 2 options with comparison for discussion.
Which issue(s) this PR fixes (optional, using
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when the PR gets merged):Investigating: #743
Type of change
Please delete options that are not relevant.
main
branch)How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration
Checklist:
Post Merge Requirements
Helm Chart Change