Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving the defaults for rookceph operator from configmap to csv #1648

Merged

Conversation

malayparida2000
Copy link
Contributor

@malayparida2000 malayparida2000 commented Apr 29, 2022

Going forward the configmap created by ocsinitialization will be
reserved purely for customer overrides (higher precedence), and
the csv will have the env vars for our defaults.

We had discussed this here in odf virtual teams space,
https://chat.google.com/room/AAAAREGEba8/JO7SyyGIA0k

Also the corresponding BZ link for the discussion that happened there.
https://bugzilla.redhat.com/show_bug.cgi?id=1986016

@malayparida2000 malayparida2000 force-pushed the rook-ceph-operator-config branch 2 times, most recently from cabb4f8 to 0e66f55 Compare April 29, 2022 13:07
@malayparida2000
Copy link
Contributor Author

/test ocs-operator-bundle-e2e-aws

@malayparida2000
Copy link
Contributor Author

/cc @jarrpa /cc @Madhu-1

@openshift-ci openshift-ci bot requested review from jarrpa and Madhu-1 May 12, 2022 14:56
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 12, 2022

@malayparida2000: GitHub didn't allow me to request PR reviews from the following users: /cc.

Note that only red-hat-storage members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @jarrpa /cc @Madhu-1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Madhu-1
Copy link
Member

Madhu-1 commented May 13, 2022

/assign @travisn
/assign @umangachapagain

Copy link
Contributor

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nice simplification!

@agarwal-mudit
Copy link
Member

@travisn what about these parameters, are these taken care during upgrade?

@malayparida2000
Copy link
Contributor Author

@travisn what about these parameters, are these taken care during upgrade?

@agarwal-mudit According to Jose This is a CephCluster-specific config, not a rook-ceph-operator config, so we should not worry about that one.

@nb-ohad
Copy link
Contributor

nb-ohad commented May 16, 2022

@Madhu-1 The ocs-osd-deployer operator is using the configmap here: https://github.com/red-hat-storage/ocs-osd-deployer/blob/main/controllers/managedocs_controller.go#L1381

Does this change is going to affect/break the deployer operator?

@Madhu-1
Copy link
Member

Madhu-1 commented May 16, 2022

@Madhu-1 The ocs-osd-deployer operator is using the configmap here: https://github.com/red-hat-storage/ocs-osd-deployer/blob/main/controllers/managedocs_controller.go#L1381

After discussing with @nb-ohad we have a problem with the upgraded cluster. suppose the configmap contains the CSI_LOG_LEVEL:5 and we are moving it to the ENV variable. This is fine as a customer also wanted this value to be set to 5 what if we're going to change the default value to 4 in the next release? we can update the ENV but what about the configmap, we don't know if the ocs-operator or the customer added the configuration. We can make a statement old configuration will continue to work as it is and we don't override the configurations and the default will be applied to new clusters in ENV another way is to go over the configmap and change the value from 5 to 4 but that creates a problem as the customer wanted it to be 5 in his cluster. We need to make a decision here about the expectation.

@Madhu-1
Copy link
Member

Madhu-1 commented May 16, 2022

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 16, 2022
@travisn
Copy link
Contributor

travisn commented May 16, 2022

@Madhu-1 The ocs-osd-deployer operator is using the configmap here: https://github.com/red-hat-storage/ocs-osd-deployer/blob/main/controllers/managedocs_controller.go#L1381

After discussing with @nb-ohad we have a problem with the upgraded cluster. suppose the configmap contains the CSI_LOG_LEVEL:5 and we are moving it to the ENV variable. This is fine as a customer also wanted this value to be set to 5 what if we're going to change the default value to 4 in the next release? we can update the ENV but what about the configmap, we don't know if the ocs-operator or the customer added the configuration. We can make a statement old configuration will continue to work as it is and we don't override the configurations and the default will be applied to new clusters in ENV another way is to go over the configmap and change the value from 5 to 4 but that creates a problem as the customer wanted it to be 5 in his cluster. We need to make a decision here about the expectation.

In any case, this PR seems like the right fix. If the desire is to change the csi log level from 5 to 4 for upgraded clusters, how about a separate PR to update this value in the configmap as a one-time operation, then never do it again?

Copy link
Member

@jarrpa jarrpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 17, 2022
@jarrpa
Copy link
Member

jarrpa commented May 17, 2022

@Madhu-1 The ocs-osd-deployer operator is using the configmap here: https://github.com/red-hat-storage/ocs-osd-deployer/blob/main/controllers/managedocs_controller.go#L1381

After discussing with @nb-ohad we have a problem with the upgraded cluster. suppose the configmap contains the CSI_LOG_LEVEL:5 and we are moving it to the ENV variable. This is fine as a customer also wanted this value to be set to 5 what if we're going to change the default value to 4 in the next release? we can update the ENV but what about the configmap, we don't know if the ocs-operator or the customer added the configuration. We can make a statement old configuration will continue to work as it is and we don't override the configurations and the default will be applied to new clusters in ENV another way is to go over the configmap and change the value from 5 to 4 but that creates a problem as the customer wanted it to be 5 in his cluster. We need to make a decision here about the expectation.

Yes, this has been the core issue from the beginning.

In any case, this PR seems like the right fix. If the desire is to change the csi log level from 5 to 4 for upgraded clusters, how about a separate PR to update this value in the configmap as a one-time operation, then never do it again?

No. The notion of a "one-time operation" is antithetical to the reconciliation process and requires needless overhead in state tracking to determine if we ever did it, plus further work to either continue supporting it or dropping it for future upgrades.

In the end, all of this is so freaking minor that I'm fine with just not worrying about it. We never used these for truly significant configuration values, so if a customer detects "oh no, my logging is slightly wrong" then we can just either document or tell them how to fix it. Yes it'll be a slight headache in debugging on our end, but it's not like we weren't already dealing with it before we introduced these ConfgiMap values. Maybe we can include a note on optional maintenance in the release notes (or whatever) that admins are encouraged to delete the ConfigMap in question if they're not actively using it.

@Madhu-1
Copy link
Member

Madhu-1 commented May 18, 2022

@Madhu-1 The ocs-osd-deployer operator is using the configmap here: https://github.com/red-hat-storage/ocs-osd-deployer/blob/main/controllers/managedocs_controller.go#L1381

After discussing with @nb-ohad we have a problem with the upgraded cluster. suppose the configmap contains the CSI_LOG_LEVEL:5 and we are moving it to the ENV variable. This is fine as a customer also wanted this value to be set to 5 what if we're going to change the default value to 4 in the next release? we can update the ENV but what about the configmap, we don't know if the ocs-operator or the customer added the configuration. We can make a statement old configuration will continue to work as it is and we don't override the configurations and the default will be applied to new clusters in ENV another way is to go over the configmap and change the value from 5 to 4 but that creates a problem as the customer wanted it to be 5 in his cluster. We need to make a decision here about the expectation.

Yes, this has been the core issue from the beginning.

This might be the right time to make this change as we have minimal changes in configmap.

In any case, this PR seems like the right fix. If the desire is to change the csi log level from 5 to 4 for upgraded clusters, how about a separate PR to update this value in the configmap as a one-time operation, then never do it again?

No. The notion of a "one-time operation" is antithetical to the reconciliation process and requires needless overhead in state tracking to determine if we ever did it, plus further work to either continue supporting it or dropping it for future upgrades.

In the end, all of this is so freaking minor that I'm fine with just not worrying about it. We never used these for truly significant configuration values, so if a customer detects "oh no, my logging is slightly wrong" then we can just either document or tell them how to fix it. Yes it'll be a slight headache in debugging on our end, but it's not like we weren't already dealing with it before we introduced these ConfgiMap values. Maybe we can include a note on optional maintenance in the release notes (or whatever) that admins are encouraged to delete the ConfigMap in question if they're not actively using it.

Sounds good, Thanks @jarrpa.

@nb-ohad
Copy link
Contributor

nb-ohad commented May 18, 2022

No, I say! 😆 Having documentation to "update" versus "create" is a trivial and unhelpful distinction. Just don't create it at all, and don't remove it if it exists.

@jarrpa I do not agree with this statement. A reconciliation can be a "create reconciliation" where the reconciler just makes sure of the existence of a resource and set an ownership ref on it (not a controller ref) - regardless of the content.
This use case exists, mainly when you want to ensure that whatever config is there (regardless of the content) will get wiped out with uninstallation.

I would prefer for the config map to existing as part of the installation because clients (including the deployer) assume it. Now it is true that we can manage an update for our internal client (deployer) that will make sure to create it if it does not exist, but this will just add additional upgrade code (not in the product, but non the less "upgrade code") for us and inconvenient to whatever clients that introduced automation on top of fresh deployments that adds configuration.

It also introduces more toil when that part of the automation is uninstalling, as it now needs to make sure to delete the configmap.

Values like CSI_LOG_LEVEL & CSI_ENABLE_CSIADDONS will be
configured with defaults in the csv generation it self as ENV
variables.

Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
@humblec
Copy link
Collaborator

humblec commented Jun 7, 2022

the attempt here (csv->env) looks good to me . However we may have to revisit the default verbosity level of CSI pods . It can be a different PR altogether.

@malayparida2000
Copy link
Contributor Author

@nb-ohad Can you please take another look here, This is now a blocker for RHSTOR-2511. cc @agarwal-mudit @humblec @Madhu-1

Copy link
Member

@jarrpa jarrpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 26, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 26, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jarrpa, malayparida2000

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jarrpa
Copy link
Member

jarrpa commented Jul 26, 2022

/override ci/prow/red-hat-storage-ocs-ci-e2e-aws

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 26, 2022

@jarrpa: Overrode contexts on behalf of jarrpa: ci/prow/red-hat-storage-ocs-ci-e2e-aws

In response to this:

/override ci/prow/red-hat-storage-ocs-ci-e2e-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jarrpa
Copy link
Member

jarrpa commented Jul 26, 2022

@nb-ohad @Madhu-1 I talked to @malayparida2000 about this, and there was a suggestion to keep a list of the "previous" default values of the ConfigMap in memory, and if we find those values are still the same then we can remove them from the ConfigMap so that the environment variables in the Deployment become the new defaults. I suggested we could fire an Event or similar to inform the admin that we modified/deleted the ConfigMap and that if they don't like this they can recreate it with the old values (we should even be able to provide the exact YAML to use). I'll leave this PR on hold for a few more days, but if no compelling arguments are raised against it I'll just merge it and we can implement further changes in future PRs.

@Madhu-1
Copy link
Member

Madhu-1 commented Jul 27, 2022

@nb-ohad @Madhu-1 I talked to @malayparida2000 about this, and there was a suggestion to keep a list of the "previous" default values of the ConfigMap in memory, and if we find those values are still the same then we can remove them from the ConfigMap so that the environment variables in the Deployment become the new defaults. I suggested we could fire an Event or similar to inform the admin that we modified/deleted the ConfigMap and that if they don't like this they can recreate it with the old values (we should even be able to provide the exact YAML to use). I'll leave this PR on hold for a few more days, but if no compelling arguments are raised against it I'll just merge it and we can implement further changes in future PRs.

sounds good to me.

@humblec
Copy link
Collaborator

humblec commented Jul 29, 2022

@malayparida2000 jfyi, we are adding one more control for sidecar logs instead of just having one CSI_LOG_LEVEL . The details are here ceph/ceph-csi#3264 and in rook PR (rook/rook#10639).. so we need to have this configuation refreshed either in this PR or as a follow up PR.

@malayparida2000
Copy link
Contributor Author

@humblec Let’s keep this PR limited to just for the change in implementation, The new values for sidecar logs should be another separate PR in my opinion.

@humblec
Copy link
Collaborator

humblec commented Aug 1, 2022

@humblec Let’s keep this PR limited to just for the change in implementation, The new values for sidecar logs should be another separate PR in my opinion.

Yeah,, that should be fine @malayparida2000 👍

@jarrpa
Copy link
Member

jarrpa commented Aug 3, 2022

Based on the latest comments over the past week, sounds like we have enough concensus! LET'S GET THIS IN!

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 3, 2022
@jarrpa jarrpa merged commit 8f9debc into red-hat-storage:main Aug 3, 2022
9 of 10 checks passed
@malayparida2000 malayparida2000 deleted the rook-ceph-operator-config branch August 17, 2022 19:23
@openshift-cherrypick-robot

@malayparida2000: new pull request created: #1954

In response to this:

/cherry-pick release-4.11
/cherry-pick release-4.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@malayparida2000
Copy link
Contributor Author

/cherry-pick release-4.11

@malayparida2000
Copy link
Contributor Author

/cherry-pick release-4.10

@openshift-cherrypick-robot

@malayparida2000: new pull request could not be created: failed to create pull request against red-hat-storage/ocs-operator#release-4.11 from head openshift-cherrypick-robot:cherry-pick-1648-to-release-4.11: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"A pull request already exists for openshift-cherrypick-robot:cherry-pick-1648-to-release-4.11."}],"documentation_url":"https://docs.github.com/rest/reference/pulls#create-a-pull-request"}

In response to this:

/cherry-pick release-4.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@malayparida2000: new pull request created: #1955

In response to this:

/cherry-pick release-4.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nikhil-Ladha added a commit to Nikhil-Ladha/ocs-operator that referenced this pull request Mar 8, 2023
…from configmap to csv

This is an modified cherry-pick of red-hat-storage#1648, where we are only backporting the
CSI_LOG_LEVEL changes in the csv.

Signed-off-by: Nikhil-Ladha <nikhilladha1999@gmail.com>
Nikhil-Ladha added a commit to Nikhil-Ladha/ocs-operator that referenced this pull request Mar 8, 2023
… from configmap to csv

This is a backport of red-hat-storage#1648

Signed-off-by: Nikhil-Ladha <nikhilladha1999@gmail.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ocs-operator that referenced this pull request Mar 28, 2023
…from configmap to csv

This is an modified cherry-pick of red-hat-storage#1648, where we are only backporting the
CSI_LOG_LEVEL changes in the csv.

Signed-off-by: Nikhil-Ladha <nikhilladha1999@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants