Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ztp: CNF-3661 Reduce number of policies #836

Merged
merged 1 commit into from
Dec 20, 2021

Conversation

pixelsoccupied
Copy link
Contributor

@pixelsoccupied pixelsoccupied commented Dec 3, 2021

The current reference PolicyGenTemplates (common, group, and test-sno) result in ~20 policies being created for the cluster. Managing this number of policies per cluster results in increased CPU use on the spoke cluster and scaling issues in the hub.

The work of this story is to reduce the number of policies created by using a common policy name for all policies in each of the common-ranGen.yaml, group-du-sno-ranGen.yaml, and test-sno.yaml files.

Naming convention -->

subscriptions-policy -- Containing all the Namespace, OperatorGroup and Subscription CRs
config-policy -- Containing the rest (OperatorHub, ImageContentSourcePolicy, ReduceMonitoring, etc)

Before -->

[root@jumphost1 ~]# kubectl get policy -n common
NAME                                REMEDIATION ACTION   COMPLIANCE STATE   AGE
common-log-sub-policy       enforce              NonCompliant       2d22h
common-mon-offload-policy   enforce              Compliant          2d22h
common-pao-sub-policy       enforce              NonCompliant       2d22h
common-ptp-sub-policy       enforce              Compliant          2d22h
common-sriov-sub-policy     enforce              Compliant          2d22h

After -->

[root@jumphost1 ~]# kubectl get policy -n common
NAME                                REMEDIATION ACTION   COMPLIANCE STATE   AGE
common-config-policy       enforce              NonCompliant       2d22h

@pixelsoccupied
Copy link
Contributor Author

/cc @imiller0

@openshift-ci openshift-ci bot requested a review from imiller0 December 3, 2021 19:58
@lack
Copy link
Member

lack commented Dec 3, 2021

@pixelsoccupied You should run make install-commit-hooks locally, it sets up a git pre-commit hook that checks that your commit log message matches our repo requirements.

You'll need to edit your commit log so the first line starts with ztp: to pass the ci/prow/ci job, and the commit hooks help remind you early on :)

@pixelsoccupied pixelsoccupied changed the title CNF-3661: Reduce number of policies ztp: CNF-3661 Reduce number of policies Dec 3, 2021
@@ -16,14 +16,14 @@ spec:
mcp: "master"
sourceFiles:
- fileName: SriovNetwork.yaml
policyName: "sriov-nw-fh-policy"
policyName: "config-policy"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each file (common, group, test-sno) will need a uniqe policy name, otherwise they will overwrite one another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah wouldn't the policy names get prepended by common or test-sno and are in different namespaces?

Do have any suggestion for the name? config-policy-common, config-policy-group and config-policy-sno ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, apologies. The generated policy name will be <pgtName>-<policyName>. So common-config-policy, group-du-sno-config-policy and test-sno-config-policy.

@@ -18,50 +18,50 @@ spec:
- fileName: validatorCRs/informDuValidator.yaml
complianceType: musthave
remediationAction: inform
policyName: "du-validator-policy"
policyName: "config-policy"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Missxiaoguo is there any reason this policy needs to be separate from the config policies or can it be combined with them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't combine this policy with other config policies as this is an inform policy but others are enforce and policyGen doesn't support have different types of remedicationAction in one policy.

Even we will eventually have the default to inform, we should leave it separated for LO to know this policy shouldn't be copied for enforce.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -72,7 +72,7 @@ spec:
displayName: disconnected-redhat-operators
image: registry.example.com:5000/disconnected-redhat-operators/disconnected-redhat-operator-index:v4.9
- fileName: DisconnectedICSP.yaml
policyName: "registry-policy"
policyName: "config-policy"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user chooses to adapt an existing deployment to this new convention I believe it will result in duplicate/parallel policies. We don't have a good way to automatically "move" already existing policies to the new names. At a minimum we need to document for the user how to clean up the old policies, but we should think about ways to make this easier if we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically we need to help user migrate and since this feature is actually benefiting everyone without a loss in functionality it could be made default?

To help migrate: we can introduce a special variable migrateToMakeItBetter (maybe added using patch), which basically deletes the old ones and recreates policies with the same name?

To help make it default: on the docs say policyName is "something you shouldn't be explicit about" and then we programatically add config-policy to it when it's missing/empty.

Migration might be overkill (used by small number of users and for a limited amount of time?) but default behaviour should be implanted I think. Seeing policyName: "config-policy" in so many places is bad UX IMO

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the new image is instantiated, couldn't we just have a one shot cleanup that deletes the redundant policies ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Automating the move/cleanup/rename/etc is something we could (and should) do. I suspect that we will need (or want) traceability between source PolicyGenTemplate and created artifacts to make this easier.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 5, 2021
Copy link

@browsell browsell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to reconsider these groupings.

We are grouping s/w installation and operator config together. For upgrades for instance we want a single policy that encapsulates all the olm subscriptions but I don't think that should include other config, so many something like:

common-subscriptions
common-config

@@ -49,12 +49,12 @@ spec:
type: "fluentd"
fluentd: {}
- fileName: MachineConfigSctp.yaml
policyName: "mc-sctp-policy"
policyName: "config-policy"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a policy for this. It should be applied as part of the day 1 extra manifests, delete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Removing them from the templates is being done under PR #834.

@@ -17,10 +17,10 @@ spec:
mcp: "master"
sourceFiles:
- fileName: ConsoleOperatorDisable.yaml
policyName: "console-policy"
policyName: "config-policy"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a group vs common policy ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about making it common but the point was raised that in a deployment with a mix of SNO and standard clusters they may want the console for standard(?). In that case a separate group could/would be made for group-du-std which doesn't contain the Console disable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point

@@ -64,14 +64,14 @@ spec:
ptp4lOpts: "-2 -s --summary_interval -4"
phc2sysOpts: "-a -r -n 24"
- fileName: SriovOperatorConfig.yaml
policyName: "sriov-conf-policy"
policyName: "config-policy"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this group ? Mandatory for all DU deployments.

Copy link
Contributor

@imiller0 imiller0 Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is in the group because it contains the mapping to a machine config pool and the disableDrain flag. The thinking was that these might be different between SNO and standard clusters.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, although we do not expose the selector in our current reference example,

spec:
disableDrain: true
- fileName: MachineConfigAcceleratedStartup.yaml
policyName: "mc-accelerated-startup-policy"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not an extra manifest vs a day2 policy ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Part of #834

@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 8, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 14, 2021
@pixelsoccupied
Copy link
Contributor Author

/cc @imiller0

@openshift-ci openshift-ci bot requested a review from imiller0 December 14, 2021 17:09
@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 14, 2021
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 14, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 14, 2021
Copy link
Contributor

@imiller0 imiller0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good. When policies are built the name of the policy is the combined PolicyGenTemplate name and policyName field. In this PR each PGT uses "config-policy" for the policyName of all included CRs, so this will result in 3 policies created:
common-config-policy (one shared by all nodes)
group-du-sno-config-policy (one shared by all nodes in this group)
example-sno-config-policy (one per cluster, example-sno replaced by site name)

The common policy will contain all the CRs necessary to install operators along with configuration unrelated to the operators (disconnected registry access and Monitoring footprint).

I'll wait for others to comment but otherwise will approve tomorrow.

@serngawy
Copy link
Contributor

Policies are same, I'm good with that. Would you just confirm that it has been tested. Like SNO is provisioned and then those 3 policies get applied and everything went fine.

@pixelsoccupied
Copy link
Contributor Author

pixelsoccupied commented Dec 16, 2021

@serngawy Here's the output with and without the consolidation from the hub.

[root@jumphost1 ~]# kubectl get policy -A
I1216 09:24:38.808993 2923311 request.go:668] Waited for 1.182937751s due to client-side throttling, not priority and fairness, request: GET:https://api.mycluster.hub.ran.dfwt5g.lab:6443/apis/oauth.openshift.io/v1?timeout=32s
NAMESPACE          NAME                                                        REMEDIATION ACTION   COMPLIANCE STATE   AGE
cnfdf01-policies   cnfdf01-perfprofile-policy                                  enforce                                 6d18h
cnfdf01-policies   cnfdf01-sriov-nnp-fh-policy                                 enforce                                 6d18h
cnfdf01-policies   cnfdf01-sriov-nnp-mh-policy                                 enforce                                 6d18h
cnfdf01-policies   cnfdf01-sriov-nw-fh-policy                                  enforce                                 6d18h
cnfdf01-policies   cnfdf01-tuned-perf-patch-policy                             enforce                                 6d18h
cnfdf01            cnfdf01-policies.cnfdf01-perfprofile-policy                 enforce                                 3s
cnfdf01            cnfdf01-policies.cnfdf01-sriov-nnp-fh-policy                enforce                                 3s
cnfdf01            cnfdf01-policies.cnfdf01-sriov-nnp-mh-policy                enforce                                 3s
cnfdf01            cnfdf01-policies.cnfdf01-sriov-nw-fh-policy                 enforce                                 3s
cnfdf01            cnfdf01-policies.cnfdf01-tuned-perf-patch-policy            enforce                                 3s
cnfdf01            common-cnfdf01.common-cnfdf01-log-sub-policy                enforce                                 3s
cnfdf01            common-cnfdf01.common-cnfdf01-mon-offload-policy            enforce                                 3s
cnfdf01            common-cnfdf01.common-cnfdf01-pao-sub-policy                enforce                                 3s
cnfdf01            common-cnfdf01.common-cnfdf01-ptp-sub-policy                enforce                                 3s
cnfdf01            common-cnfdf01.common-cnfdf01-sriov-sub-policy              enforce                                 3s
cnfdf01            group-cnfdf01.group-cnfdf01-console-policy                  enforce                                 3s
cnfdf01            group-cnfdf01.group-cnfdf01-log-forwarder-policy            enforce                                 3s
cnfdf01            group-cnfdf01.group-cnfdf01-log-policy                      enforce                                 3s
cnfdf01            group-cnfdf01.group-cnfdf01-mc-accelerated-startup-policy   enforce                                 3s
cnfdf01            group-cnfdf01.group-cnfdf01-mc-sctp-policy                  enforce                                 3s
cnfdf01            group-cnfdf01.group-cnfdf01-sno-network-policy              enforce                                 3s
cnfdf01            group-cnfdf01.group-cnfdf01-sriov-conf-policy               enforce                                 3s
cnfdf02-policies   cnfdf02-config-policy                                       enforce              Compliant          6d19h
cnfdf02            cnfdf02-policies.cnfdf02-config-policy                      enforce              Compliant          3s
cnfdf02            common-cnfdf02.common-cnfdf02-config-policy                 enforce              NonCompliant       3s
cnfdf02            group-cnfdf02.group-cnfdf02-config-policy                   enforce              NonCompliant       3s
common-cnfdf01     common-cnfdf01-log-sub-policy                               enforce                                 6d18h
common-cnfdf01     common-cnfdf01-mon-offload-policy                           enforce                                 6d18h
common-cnfdf01     common-cnfdf01-pao-sub-policy                               enforce                                 6d18h
common-cnfdf01     common-cnfdf01-ptp-sub-policy                               enforce                                 6d18h
common-cnfdf01     common-cnfdf01-sriov-sub-policy                             enforce                                 6d18h
common-cnfdf02     common-cnfdf02-config-policy                                enforce              NonCompliant       6d19h
group-cnfdf01      group-cnfdf01-console-policy                                enforce                                 6d18h
group-cnfdf01      group-cnfdf01-log-forwarder-policy                          enforce                                 6d18h
group-cnfdf01      group-cnfdf01-log-policy                                    enforce                                 6d18h
group-cnfdf01      group-cnfdf01-mc-accelerated-startup-policy                 enforce                                 6d18h
group-cnfdf01      group-cnfdf01-mc-sctp-policy                                enforce                                 6d18h
group-cnfdf01      group-cnfdf01-sno-network-policy                            enforce                                 6d18h
group-cnfdf01      group-cnfdf01-sriov-conf-policy                             enforce                                 6d18h
group-cnfdf02      group-cnfdf02-config-policy                                 enforce              NonCompliant       6d19h

Both cnfdf02 and cnfdf01 have the exact policySource but cnfdf02 is using the new way of doing policyName.

But I'll need some help to see if they are actually working on the SNOs.

@pixelsoccupied pixelsoccupied marked this pull request as draft December 16, 2021 21:53
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2021
Copy link
Contributor

@imiller0 imiller0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 17, 2021
Copy link
Contributor

@imiller0 imiller0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
As discussed the subscription related CRs are grouped separately from the configuration (within common PGT). This results in 4 total policies being created:
common-subscriptions-policy
common-config-policy
group-du-sno-config-policy
example-sno-config-policy.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: imiller0, pixelsoccupied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 17, 2021
@pixelsoccupied pixelsoccupied marked this pull request as ready for review December 17, 2021 21:15
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 17, 2021
@pixelsoccupied
Copy link
Contributor Author

/cc @imiller0

@openshift-ci openshift-ci bot requested a review from imiller0 December 17, 2021 21:15
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 19, 2021
@openshift-ci openshift-ci bot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 20, 2021
@browsell
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2021
@openshift-merge-robot openshift-merge-robot merged commit bf53398 into openshift-kni:master Dec 20, 2021
@imiller0
Copy link
Contributor

imiller0 commented Feb 4, 2022

/cherry-pick release-4.9

@openshift-cherrypick-robot

@imiller0: #836 failed to apply on top of branch "release-4.9":

Applying: ztp: CNF-3661 Reduce number of policies
Using index info to reconstruct a base tree...
A	ztp/gitops-subscriptions/argocd/example/policygentemplates/common-ranGen.yaml
A	ztp/gitops-subscriptions/argocd/example/policygentemplates/example-sno-site.yaml
A	ztp/gitops-subscriptions/argocd/example/policygentemplates/group-du-sno-ranGen.yaml
Falling back to patching base and 3-way merge...
Auto-merging ztp/ztp-policy-generator/testPolicyGenTemplate/site-du-sno-1-ranGen.yaml
Auto-merging ztp/gitops-subscriptions/argocd/resource-hook-example/policygentemplates/group-du-sno-ranGen.yaml
CONFLICT (content): Merge conflict in ztp/gitops-subscriptions/argocd/resource-hook-example/policygentemplates/group-du-sno-ranGen.yaml
Auto-merging ztp/gitops-subscriptions/argocd/resource-hook-example/policygentemplates/common-ranGen.yaml
CONFLICT (content): Merge conflict in ztp/gitops-subscriptions/argocd/resource-hook-example/policygentemplates/common-ranGen.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 ztp: CNF-3661 Reduce number of policies
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants