OCPBUGS-17157: pkg/controller: label RBAC with content hash #3034

stevekuznetsov · 2023-09-15T15:25:42Z

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those.

We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case.

We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC).

As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough.

openshift-ci-robot · 2023-09-15T15:25:47Z

@stevekuznetsov: This pull request references Jira Issue OCPBUGS-17157, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.15.0) matches configured target version for branch (4.15.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianzhangbjz

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those.

We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case.

We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC).

As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough.

Description of the change:

Motivation for the change:

Architectural changes:

Testing remarks:

Reviewer Checklist

Implementation matches the proposed design, or proposal is updated to match implementation

Sufficient unit test coverage

Sufficient end-to-end test coverage

Bug fixes are accompanied by regression test(s)

e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times

tech debt/todo is accompanied by issue link(s) in comments in the surrounding code

Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately

Docs updated or added to /doc

Commit messages sensible and descriptive

Tests marked as [FLAKE] are truly flaky and have an issue

Code is properly formatted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-09-15T15:25:52Z

@stevekuznetsov: This pull request references Jira Issue OCPBUGS-17157, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.15.0) matches configured target version for branch (4.15.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianzhangbjz

In response to this:

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those.

We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case.

We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC).

As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

stevekuznetsov · 2023-09-15T15:27:15Z

pkg/controller/operators/catalog/operator.go

@@ -108,7 +108,6 @@ type Operator struct {
 	client                   versioned.Interface
 	dynamicClient            dynamic.Interface
 	lister                   operatorlister.OperatorLister
-	k8sLabelQueueSets        map[schema.GroupVersionResource]workqueue.RateLimitingInterface


I added this in a previous commit but was in a copy-pasta mode and it's not needed.

stevekuznetsov · 2023-09-15T15:27:30Z

pkg/controller/operators/catalog/operator.go

 			Name: gvr.String(),
 		})
 		queueInformer, err := queueinformer.NewQueueInformer(
 			ctx,
+			queueinformer.WithQueue(queue),


I forgot this in the original PR to add the labeler functionality.

awgreene

The code looks fine, but I have a question regarding a possible edgecase. If we're using the role/roleBinding's spec to create the label, is there an opportunity for a collision if two operators specify roles with the same spec?

awgreene · 2023-09-18T14:06:37Z

pkg/controller/registry/resolver/rbac.go

+func PolicyRuleHashLabelValue(rules []rbacv1.PolicyRule) (string, error) {
+	raw, err := json.Marshal(rules)
+	if err != nil {
+		return "", err
+	}
+	return toBase62(sha256.Sum224(raw)), nil
+}


It seems like this could cause an issue if two operators define the same rules for a clusterRole.

stevekuznetsov · 2023-09-18T14:09:58Z

The code looks fine, but I have a question regarding a possible edgecase. If we're using the role/roleBinding's spec to create the label, is there an opportunity for a collision if two operators specify roles with the same spec?

The code for creating the objects has not changed, so this PR should not have any effect on that problem. In any case, random values are used to name the objects, so it does not seem like there would be any collisions.

The RBAC objects are labelled with a) the CSV that they are created for and b) a hash of the spec. So yes, if someone duplicated the list of permissions they asked for in one CSV, an approach that looked at these labels would not be able to tell those apart - but neither would the previous approach of using the authorizer. Since the question we want to be able to answer is "is the permission satisfied," it does not seem important to be able to distinguish between two identical specs.

awgreene · 2023-09-18T15:01:22Z

The code for creating the objects has not changed, so this PR should not have any effect on that problem. In any case, random values are used to name the objects, so it does not seem like there would be any collisions.

Good point.

The RBAC objects are labelled with a) the CSV that they are created for and b) a hash of the spec. So yes, if someone duplicated the list of permissions they asked for in one CSV, an approach that looked at these labels would not be able to tell those apart - but neither would the previous approach of using the authorizer. Since the question we want to be able to answer is "is the permission satisfied," it does not seem important to be able to distinguish between two identical specs.

Okay cool, if the RBAC has a unique name, a CSV label, and a hash label there's a clear way identify the owner and we can satisfy the "is the permission satisfied" asks.

/approve

openshift-ci · 2023-09-18T15:29:57Z

New changes are detected. LGTM label has been removed.

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist. As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those. We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case. We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions. Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC). As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>

awgreene · 2023-09-18T16:11:44Z

/approve

openshift-ci · 2023-09-18T16:13:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [awgreene]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2023-09-18T16:19:00Z

@stevekuznetsov: Jira Issue OCPBUGS-17157: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

operator-framework/operator-marketplace#530 is open

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-17157 has not been moved to the MODIFIED state.

In response to this:

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those.

We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case.

We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC).

As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Sep 15, 2023

openshift-ci bot requested review from gallettilance, perdasilva and jianzhangbjz September 15, 2023 15:25

stevekuznetsov commented Sep 15, 2023

View reviewed changes

stevekuznetsov force-pushed the skuznets/rbac-hash branch from 69bf96a to c2fc318 Compare September 15, 2023 16:05

awgreene requested changes Sep 18, 2023

View reviewed changes

awgreene reviewed Sep 18, 2023

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 18, 2023

stevekuznetsov added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2023

stevekuznetsov force-pushed the skuznets/rbac-hash branch from c2fc318 to eab15c6 Compare September 18, 2023 15:29

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2023

stevekuznetsov force-pushed the skuznets/rbac-hash branch from eab15c6 to cb178b1 Compare September 18, 2023 15:30

stevekuznetsov added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2023

awgreene self-requested a review September 18, 2023 16:11

awgreene approved these changes Sep 18, 2023

View reviewed changes

openshift-merge-robot merged commit 8eb4f3e into operator-framework:master Sep 18, 2023
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-17157: pkg/controller: label RBAC with content hash #3034

OCPBUGS-17157: pkg/controller: label RBAC with content hash #3034

stevekuznetsov commented Sep 15, 2023 •

edited

openshift-ci-robot commented Sep 15, 2023

openshift-ci-robot commented Sep 15, 2023

stevekuznetsov Sep 15, 2023

stevekuznetsov Sep 15, 2023

awgreene left a comment

awgreene Sep 18, 2023

stevekuznetsov commented Sep 18, 2023

awgreene commented Sep 18, 2023

openshift-ci bot commented Sep 18, 2023

awgreene commented Sep 18, 2023

openshift-ci bot commented Sep 18, 2023

openshift-ci-robot commented Sep 18, 2023

OCPBUGS-17157: pkg/controller: label RBAC with content hash #3034

OCPBUGS-17157: pkg/controller: label RBAC with content hash #3034

Conversation

stevekuznetsov commented Sep 15, 2023 • edited

openshift-ci-robot commented Sep 15, 2023

openshift-ci-robot commented Sep 15, 2023

stevekuznetsov Sep 15, 2023

Choose a reason for hiding this comment

stevekuznetsov Sep 15, 2023

Choose a reason for hiding this comment

awgreene left a comment

Choose a reason for hiding this comment

awgreene Sep 18, 2023

Choose a reason for hiding this comment

stevekuznetsov commented Sep 18, 2023

awgreene commented Sep 18, 2023

openshift-ci bot commented Sep 18, 2023

awgreene commented Sep 18, 2023

openshift-ci bot commented Sep 18, 2023

openshift-ci-robot commented Sep 18, 2023

stevekuznetsov commented Sep 15, 2023 •

edited