Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended PrometheusK8sConfig and PrometheusRestrictedConfig with AdditionalAlertManagerConfigs #1132

Conversation

dislbenn
Copy link
Contributor

@dislbenn dislbenn commented Apr 27, 2021

  • I added CHANGELOG entry for this change.
  • No user-facing changes, so no entry in CHANGELOG was needed.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dislbenn
To complete the pull request process, please assign dgrisonnet after the PR has been reviewed.
You can assign the PR to them by writing /assign @dgrisonnet in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dislbenn dislbenn marked this pull request as draft April 27, 2021 01:56
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2021
@dislbenn dislbenn changed the title Modified PrometheusK8sConfig to allow a list AlertmanagerEndpoints Modified PrometheusK8sConfig to allow a list AdditionalAlertManagerConfigs May 7, 2021
@dislbenn dislbenn marked this pull request as ready for review May 7, 2021 20:03
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2021
@@ -39,6 +39,7 @@ spec:
configMaps:
- serving-certs-ca-bundle
- kubelet-serving-ca-bundle
- hub-router-ca
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you should add hub-router-ca here. I think you can make the cluster-monitoring-config configmap to support configuring configmap so that it can be able to update with customized configmap externally. I am not sure if it is correct approach. @simonpasquier can you give us recommendations? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, we don't want to hard-code an arbitrary CA.

@dislbenn dislbenn changed the title Modified PrometheusK8sConfig to allow a list AdditionalAlertManagerConfigs Modified PrometheusK8sConfig and PrometheusRestrictedConfig to allow a list AdditionalAlertManagerConfigs May 10, 2021
@dislbenn dislbenn changed the title Modified PrometheusK8sConfig and PrometheusRestrictedConfig to allow a list AdditionalAlertManagerConfigs Extended PrometheusK8sConfig and PrometheusRestrictedConfig with AdditionalAlertManagerConfigs May 10, 2021
@@ -39,6 +39,7 @@ spec:
configMaps:
- serving-certs-ca-bundle
- kubelet-serving-ca-bundle
- hub-router-ca
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, we don't want to hard-code an arbitrary CA.

VolumeClaimTemplate *monv1.EmbeddedPersistentVolumeClaim `json:"volumeClaimTemplate"`
RemoteWrite []monv1.RemoteWriteSpec `json:"remoteWrite"`
TelemetryMatches []string `json:"-"`
AdditionalAlertManagerConfigs *v1.SecretKeySelector `json:"additionalAlertManagerConfigs"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about the user experience here, I'm not sure that the configuration should just be a reference to an uncontrolled secret. The risk is high that the provided configuration is invalid and breaks Prometheus. Also not all fields of the alertmanager_config section are worth exposing (for instance configuring an external Alertmanager endpoint only needs the static_configs service discovery) and some might be not be usable in practice (for instance any field like ca_file that reference a file on disk).

It might be more appropriate to expose the main fields that are required to configure external Alertmanager endpoints. From there, CMO could generate the secret to be referenced in the Prometheus resource.

Here is the rough idea:

type PrometheusK8sConfig struct {
    ...
    AlertmanagerConfigs []AlertmanagerConfig `json:"additionalAlertManagerConfigs"`
}

type AlertmanagerConfig struct {
    # URL to the Alertmanager server.
    URL net.URL `json:"url"`
    Timeout string `json:"timeout"`
    # Reference to a key in a Secret containing the certificate authority to verify the server's certificate. 
    CA *v1.SecretKeySelector
    # Authn parameters (to be defined)
    ...
}

@bjoydeep
Copy link
Contributor

@simonpasquier @dislbenn when the Prometheus on cluster A will post the alert to Alertmanager on the Hub via this, how will the user know from which cluster this came? in ACM metric collector, when we take in data from Prometheus on cluster A and send it to the Thanos API Gateway on the Hub, we do append the clusterid and clustername as labels. Is this happening already - may be I missed it in the code?

@clyang82
Copy link
Contributor

@simonpasquier @dislbenn when the Prometheus on cluster A will post the alert to Alertmanager on the Hub via this, how will the user know from which cluster this came? in ACM metric collector, when we take in data from Prometheus on cluster A and send it to the Thanos API Gateway on the Hub, we do append the clusterid and clustername as labels. Is this happening already - may be I missed it in the code?

@bjoydeep @morvencao is trying to use relabel_configs to add clusterid and clustername in managed cluster before send alerts to hub alertmanager.

@morvencao
Copy link
Member

I also found another configuration AdditionalAlertRelabelConfigs that may resolve the relabel issue for alerts.
@simonpasquier may know more details.
see: https://github.com/openshift/prometheus-operator/blob/34a5b31452901e7d88240420f372969a01128691/pkg/apis/monitoring/v1/types.go#L299
Will try that and give feedback here.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 13, 2021
@clyang82 clyang82 force-pushed the dbennett/alertmanager-configmap-extend branch from b8f96e7 to 3225923 Compare May 13, 2021 12:40
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 13, 2021
@clyang82 clyang82 force-pushed the dbennett/alertmanager-configmap-extend branch 2 times, most recently from a04a5fb to 96a91fd Compare May 14, 2021 09:57
@clyang82
Copy link
Contributor

/test e2e-aws-single-node

@clyang82 clyang82 force-pushed the dbennett/alertmanager-configmap-extend branch 2 times, most recently from b4cb43f to ff5bfcc Compare May 17, 2021 09:45
@clyang82
Copy link
Contributor

/assign @simonpasquier

@simonpasquier
Copy link
Contributor

Regarding alert relabeling, I would expect that using external labels is enough for your use case?

@clyang82
Copy link
Contributor

Regarding alert relabeling, I would expect that using external labels is enough for your use case?

Yes. external labels is enough for our cases. Do you want to reduce the risk to not support alert relabeling right now?

@simonpasquier
Copy link
Contributor

simonpasquier commented May 20, 2021

Do you want to reduce the risk to not support alert relabeling right now?

yes

@clyang82
Copy link
Contributor

Do you want to reduce the risk to not support alert relabeling right now?
yes

Can we keep using []*monv1.ProbeTargetStaticConfig and just ignore RelabelConfigs? or do you suggest to define a new object in CMO? Thanks.

@clyang82
Copy link
Contributor

/retest

Signed-off-by: clyang82 <chuyang@redhat.com>
@clyang82
Copy link
Contributor

clyang82 commented Jun 4, 2021

/retest

if err != nil {
return errors.Wrap(err, "reconciling Prometheus additionalAlertManagerConfigs secret failed")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late realization for me (sorry!) but we need to delete the secret in case the admin had configured additional Alertmanagers and then remove them all.

…gured

Signed-off-by: clyang82 <chuyang@redhat.com>
@simonpasquier
Copy link
Contributor

/test e2e-agnostic-upgrade

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

This looks good from my end. The code changes are well isolated meaning that I'm confident that users that don't configure additional Alertmanagers have no risk of being impacted by the change.

pkg/manifests/config.go Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 8, 2021
@clyang82
Copy link
Contributor

clyang82 commented Jun 9, 2021

/test e2e-agnostic-upgrade

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2021
@eparis
Copy link
Member

eparis commented Jun 10, 2021

/lgtm
looks like just a rebase

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2021
@bjoydeep
Copy link
Contributor

/test e2e-agnostic-upgrade

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@simonpasquier
Copy link
Contributor

/lgtm

ping @eparis for overriding the bugzilla/valid-bug label

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dislbenn, eparis, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eparis eparis added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 11, 2021
@openshift-merge-robot openshift-merge-robot merged commit 7168290 into openshift:master Jun 11, 2021
@clyang82 clyang82 deleted the dbennett/alertmanager-configmap-extend branch June 11, 2021 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants