-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ztp: CNF-3661 Reduce number of policies #836
ztp: CNF-3661 Reduce number of policies #836
Conversation
/cc @imiller0 |
@pixelsoccupied You should run You'll need to edit your commit log so the first line starts with |
f9c397a
to
35fcbd9
Compare
@@ -16,14 +16,14 @@ spec: | |||
mcp: "master" | |||
sourceFiles: | |||
- fileName: SriovNetwork.yaml | |||
policyName: "sriov-nw-fh-policy" | |||
policyName: "config-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each file (common, group, test-sno) will need a uniqe policy name, otherwise they will overwrite one another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah wouldn't the policy names get prepended by common
or test-sno
and are in different namespaces?
Do have any suggestion for the name? config-policy-common
, config-policy-group
and config-policy-sno
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, apologies. The generated policy name will be <pgtName>-<policyName>
. So common-config-policy
, group-du-sno-config-policy
and test-sno-config-policy
.
@@ -18,50 +18,50 @@ spec: | |||
- fileName: validatorCRs/informDuValidator.yaml | |||
complianceType: musthave | |||
remediationAction: inform | |||
policyName: "du-validator-policy" | |||
policyName: "config-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Missxiaoguo is there any reason this policy needs to be separate from the config policies or can it be combined with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't combine this policy with other config policies as this is an inform policy but others are enforce and policyGen doesn't support have different types of remedicationAction in one policy.
Even we will eventually have the default to inform, we should leave it separated for LO to know this policy shouldn't be copied for enforce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -72,7 +72,7 @@ spec: | |||
displayName: disconnected-redhat-operators | |||
image: registry.example.com:5000/disconnected-redhat-operators/disconnected-redhat-operator-index:v4.9 | |||
- fileName: DisconnectedICSP.yaml | |||
policyName: "registry-policy" | |||
policyName: "config-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user chooses to adapt an existing deployment to this new convention I believe it will result in duplicate/parallel policies. We don't have a good way to automatically "move" already existing policies to the new names. At a minimum we need to document for the user how to clean up the old policies, but we should think about ways to make this easier if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically we need to help user migrate and since this feature is actually benefiting everyone without a loss in functionality it could be made default?
To help migrate: we can introduce a special variable migrateToMakeItBetter
(maybe added using patch), which basically deletes the old ones and recreates policies with the same name?
To help make it default: on the docs say policyName
is "something you shouldn't be explicit about" and then we programatically add config-policy
to it when it's missing/empty.
Migration might be overkill (used by small number of users and for a limited amount of time?) but default behaviour should be implanted I think. Seeing policyName: "config-policy"
in so many places is bad UX IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the new image is instantiated, couldn't we just have a one shot cleanup that deletes the redundant policies ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Automating the move/cleanup/rename/etc is something we could (and should) do. I suspect that we will need (or want) traceability between source PolicyGenTemplate and created artifacts to make this easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to reconsider these groupings.
We are grouping s/w installation and operator config together. For upgrades for instance we want a single policy that encapsulates all the olm subscriptions but I don't think that should include other config, so many something like:
common-subscriptions
common-config
@@ -49,12 +49,12 @@ spec: | |||
type: "fluentd" | |||
fluentd: {} | |||
- fileName: MachineConfigSctp.yaml | |||
policyName: "mc-sctp-policy" | |||
policyName: "config-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have a policy for this. It should be applied as part of the day 1 extra manifests, delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Removing them from the templates is being done under PR #834.
@@ -17,10 +17,10 @@ spec: | |||
mcp: "master" | |||
sourceFiles: | |||
- fileName: ConsoleOperatorDisable.yaml | |||
policyName: "console-policy" | |||
policyName: "config-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a group vs common policy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about making it common but the point was raised that in a deployment with a mix of SNO and standard clusters they may want the console for standard(?). In that case a separate group could/would be made for group-du-std which doesn't contain the Console disable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point
@@ -64,14 +64,14 @@ spec: | |||
ptp4lOpts: "-2 -s --summary_interval -4" | |||
phc2sysOpts: "-a -r -n 24" | |||
- fileName: SriovOperatorConfig.yaml | |||
policyName: "sriov-conf-policy" | |||
policyName: "config-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this group ? Mandatory for all DU deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is in the group because it contains the mapping to a machine config pool and the disableDrain flag. The thinking was that these might be different between SNO and standard clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, although we do not expose the selector in our current reference example,
spec: | ||
disableDrain: true | ||
- fileName: MachineConfigAcceleratedStartup.yaml | ||
policyName: "mc-accelerated-startup-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not an extra manifest vs a day2 policy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Part of #834
35fcbd9
to
88af83e
Compare
88af83e
to
37615a6
Compare
7f800dd
to
859071d
Compare
/cc @imiller0 |
34dce84
to
e7a9e1d
Compare
2232be3
to
658d6b3
Compare
29ce19e
to
d8c99c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks good. When policies are built the name of the policy is the combined PolicyGenTemplate name and policyName field. In this PR each PGT uses "config-policy" for the policyName of all included CRs, so this will result in 3 policies created:
common-config-policy (one shared by all nodes)
group-du-sno-config-policy (one shared by all nodes in this group)
example-sno-config-policy (one per cluster, example-sno replaced by site name)
The common policy will contain all the CRs necessary to install operators along with configuration unrelated to the operators (disconnected registry access and Monitoring footprint).
I'll wait for others to comment but otherwise will approve tomorrow.
Policies are same, I'm good with that. Would you just confirm that it has been tested. Like SNO is provisioned and then those 3 policies get applied and everything went fine. |
@serngawy Here's the output with and without the consolidation from the hub.
Both But I'll need some help to see if they are actually working on the SNOs. |
3eb9cba
to
bb9325b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
As discussed the subscription related CRs are grouped separately from the configuration (within common PGT). This results in 4 total policies being created:
common-subscriptions-policy
common-config-policy
group-du-sno-config-policy
example-sno-config-policy.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: imiller0, pixelsoccupied The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @imiller0 |
bb9325b
to
d1e51ff
Compare
/lgtm |
/cherry-pick release-4.9 |
@imiller0: #836 failed to apply on top of branch "release-4.9":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The current reference PolicyGenTemplates (common, group, and test-sno) result in ~20 policies being created for the cluster. Managing this number of policies per cluster results in increased CPU use on the spoke cluster and scaling issues in the hub.
The work of this story is to reduce the number of policies created by using a common policy name for all policies in each of the common-ranGen.yaml, group-du-sno-ranGen.yaml, and test-sno.yaml files.
Naming convention -->
Before -->
After -->