Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user workload monitoring configuration ConfigMap to be created in UWM ns #804

Merged
merged 1 commit into from Jun 23, 2020

Conversation

lilic
Copy link
Contributor

@lilic lilic commented Jun 9, 2020

  • I added CHANGELOG entry for this change.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 9, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2020
@lilic lilic force-pushed the user-workload-config branch 7 times, most recently from dbef134 to f8f027e Compare June 10, 2020 13:18
@lilic
Copy link
Contributor Author

lilic commented Jun 10, 2020

/retest

--- FAIL: TestAlertmanagerOAuthProxy (5.13s)
alertmanager_test.go:383: unexpected status code response, want 200, got 403

strange failure...

@lilic
Copy link
Contributor Author

lilic commented Jun 15, 2020

/retest

@lilic lilic changed the title WIP: Allow user workload monitoring configuration configmap to be cre… Allow user workload monitoring configuration ConfigMap to be created in UWM ns Jun 15, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2020
Copy link
Contributor Author

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @openshift/openshift-team-monitoring this is ready for a first pass, please take a look, I still need to update the CHANGELOG and add an example after it looks good from you.

@@ -89,6 +89,7 @@ func Main() int {
namespaceUserWorkload := flagset.String("namespace-user-workload", "openshift-user-workload-monitoring", "Namespace to deploy and manage user workload monitoring stack in.")
namespaceSelector := flagset.String("namespace-selector", "openshift.io/cluster-monitoring=true", "Selector for namespaces to monitor.")
configMapName := flagset.String("configmap", "cluster-monitoring-config", "ConfigMap name to configure the cluster monitoring stack.")
userWorkloadConfigMapName := flagset.String("userWorkloadConfigmap", "user-workload-monitoring-config", "ConfigMap name to configure the user workload monitoring stack.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just followed pattern we had, see line above here, honestly happy to leave it out as well!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intuition is we should limit exposure here as we want to move to a CRD based setup 🤔

This cli flag increases the public API surfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, removed it!

pkg/manifests/config.go Outdated Show resolved Hide resolved
@s-urbaniak
Copy link
Contributor

looking great so far! 🎉

@s-urbaniak
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 16, 2020
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 16, 2020
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2020
@lilic
Copy link
Contributor Author

lilic commented Jun 16, 2020

@s-urbaniak PTAL changes since last time:

  • I removed the flag, good call!
  • added example configmap and used it in one of the config unit tests
  • added CHANGELOG entry.

Thanks!

@lilic
Copy link
Contributor Author

lilic commented Jun 16, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 16, 2020
@lilic
Copy link
Contributor Author

lilic commented Jun 16, 2020

/retest

userCM, err := o.client.GetConfigmap(o.namespaceUserWorkload, o.userWorkloadConfigMapName)
if err != nil {
if apierrors.IsNotFound(err) {
klog.Warning("No User Workload Monitoring ConfigMap was found. Using defaults.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info rather than Warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use warning level for the cluster monitoring configmap, I just followed that pattern. It makes sense to me in a way, as you are warning user they did not configure their stack at all. But don't have too strong opinion so can change, but then we should do both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have the message at the info level because for me a warning is something I need at least to investigate and not customizing the monitoring isn't something suspicious. At info, the message would still logged so it would be available for support requests.
And +1000 that if we change it here, it should be consistent across the board.

return uwc, nil
}

klog.Warning("No User Workload Monitoring ConfigMap was found. Using defaults.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto?

pkg/operator/operator.go Show resolved Hide resolved
PrometheusK8sConfig *PrometheusK8sConfig `json:"prometheusK8s"`
PrometheusUserWorkloadConfig *PrometheusK8sConfig `json:"prometheusUserWorkload"`
ClusterMonitoringConfiguration *ClusterMonitoringConfiguration `json:"-"`
UserWorkloadConfiguration *UserWorkloadConfiguration `json:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both UserWorkloadConfiguration and UserWorkloadConfig is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UserWorkloadConfig?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a UserWorkloadConfig struct:

type UserWorkloadConfig struct {
Enabled *bool `json:"enabled"`
}

And another UserWorkloadConfiguration struct:

type UserWorkloadConfiguration struct {
PrometheusOperator *PrometheusOperatorConfig `json:"prometheusOperator"`
Prometheus *PrometheusK8sConfig `json:"prometheus"`
ThanosRuler *ThanosRulerConfig `json:"thanosRuler"`
}

The names are close which makes it confusing for me when reviewing the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense! Any suggestions for new or old name?

ThanosQuerierConfig *ThanosQuerierConfig `json:"thanosQuerier"`
UserWorkloadEnabled *bool `json:"enableUserWorkload"`
// TODO: Remove in 4.7 release.
PrometheusUserWorkloadConfig *PrometheusK8sConfig `json:"prometheusUserWorkload"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating PrometheusUserWorkloadConfig, PrometheusOperatorUserWorkloadConfig and ThanosRulerConfig here and in UserWorkloadConfiguration maybe it would be simpler to use only UserWorkloadConfiguration? IOW handle the "legacy" fields only when loading the config map but don't expose them here.

Copy link
Contributor Author

@lilic lilic Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point was to be able to easily remove the current logic (we will just remove it after 4.7), hence the TODO comments, as we will stop supporting this in 4.7. And to separate out the two clear tenants we have right now cluster and user workload.
If you have a better suggestion, can yon clarify, thanks! :)

IOW? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach would be to remove the user workload monitoring fields from the ClusterMonitoringConfiguration struct and only use Config.UserWorkloadConfiguration. NewConfig() would try to load the uwm fields until we decide that it shouldn't.

Something like this: simonpasquier@44bd3dd

The side effect is that uwm configuration will be taken either from the "legacy" fields or from the new configmap in openshift-user-workload-monitoring but it won't be a merge of both. In the current state, PrometheusOperatorConfig can be defined in the openshift-montoring configmap and PrometheusConfig in the openshift-user-workload-monitoring configmap, it can be confusing or desired, depending on how you see things :)

(IOW: in other words)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm .. especially with configuration I much prefer duplication structs. This is a lesson learned the hard way in Kubernetes. Only because things happen to be identical at this point in time, doesn't mean they may not very well develop in different direction in the future, and they almost always do unless the concept is literally the same (eg. TLS configuration in Prometheus). In this case we even know it will be deleted in the future, so I actually prefer @lilic 's currently proposed way.


// TODO: remove after 4.7

if f.config.ClusterMonitoringConfiguration.PrometheusUserWorkloadConfig.LogLevel != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have config fields defined both in f.config.ClusterMonitoringConfiguration and f.config.UserWorkloadConfiguration then the former will take precedence. Couldn't this cause confusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

techPreview is what we have to support right now primarily hence why it takes precedence here, but that is a good point to document.

@lilic
Copy link
Contributor Author

lilic commented Jun 18, 2020

/retest
🤔

Copy link
Contributor

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, lilic, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [brancz,lilic,s-urbaniak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 31f0ca4 into openshift:master Jun 23, 2020
@lilic lilic deleted the user-workload-config branch June 26, 2020 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants