New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended PrometheusK8sConfig and PrometheusRestrictedConfig with AdditionalAlertManagerConfigs #1132
Extended PrometheusK8sConfig and PrometheusRestrictedConfig with AdditionalAlertManagerConfigs #1132
Conversation
dislbenn
commented
Apr 27, 2021
•
edited
edited
- I added CHANGELOG entry for this change.
- No user-facing changes, so no entry in CHANGELOG was needed.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dislbenn The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -39,6 +39,7 @@ spec: | |||
configMaps: | |||
- serving-certs-ca-bundle | |||
- kubelet-serving-ca-bundle | |||
- hub-router-ca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think you should add hub-router-ca
here. I think you can make the cluster-monitoring-config
configmap to support configuring configmap so that it can be able to update with customized configmap externally. I am not sure if it is correct approach. @simonpasquier can you give us recommendations? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, we don't want to hard-code an arbitrary CA.
@@ -39,6 +39,7 @@ spec: | |||
configMaps: | |||
- serving-certs-ca-bundle | |||
- kubelet-serving-ca-bundle | |||
- hub-router-ca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, we don't want to hard-code an arbitrary CA.
pkg/manifests/config.go
Outdated
VolumeClaimTemplate *monv1.EmbeddedPersistentVolumeClaim `json:"volumeClaimTemplate"` | ||
RemoteWrite []monv1.RemoteWriteSpec `json:"remoteWrite"` | ||
TelemetryMatches []string `json:"-"` | ||
AdditionalAlertManagerConfigs *v1.SecretKeySelector `json:"additionalAlertManagerConfigs"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about the user experience here, I'm not sure that the configuration should just be a reference to an uncontrolled secret. The risk is high that the provided configuration is invalid and breaks Prometheus. Also not all fields of the alertmanager_config
section are worth exposing (for instance configuring an external Alertmanager endpoint only needs the static_configs
service discovery) and some might be not be usable in practice (for instance any field like ca_file
that reference a file on disk).
It might be more appropriate to expose the main fields that are required to configure external Alertmanager endpoints. From there, CMO could generate the secret to be referenced in the Prometheus resource.
Here is the rough idea:
type PrometheusK8sConfig struct {
...
AlertmanagerConfigs []AlertmanagerConfig `json:"additionalAlertManagerConfigs"`
}
type AlertmanagerConfig struct {
# URL to the Alertmanager server.
URL net.URL `json:"url"`
Timeout string `json:"timeout"`
# Reference to a key in a Secret containing the certificate authority to verify the server's certificate.
CA *v1.SecretKeySelector
# Authn parameters (to be defined)
...
}
@simonpasquier @dislbenn when the Prometheus on cluster A will post the alert to Alertmanager on the Hub via this, how will the user know from which cluster this came? in ACM metric collector, when we take in data from Prometheus on cluster A and send it to the Thanos API Gateway on the Hub, we do append the clusterid and clustername as labels. Is this happening already - may be I missed it in the code? |
@bjoydeep @morvencao is trying to use |
I also found another configuration |
b8f96e7
to
3225923
Compare
a04a5fb
to
96a91fd
Compare
/test e2e-aws-single-node |
b4cb43f
to
ff5bfcc
Compare
/assign @simonpasquier |
Regarding alert relabeling, I would expect that using external labels is enough for your use case? |
Yes. external labels is enough for our cases. Do you want to reduce the risk to not support alert relabeling right now? |
yes |
Can we keep using |
/retest |
Signed-off-by: clyang82 <chuyang@redhat.com>
/retest |
if err != nil { | ||
return errors.Wrap(err, "reconciling Prometheus additionalAlertManagerConfigs secret failed") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late realization for me (sorry!) but we need to delete the secret in case the admin had configured additional Alertmanagers and then remove them all.
…gured Signed-off-by: clyang82 <chuyang@redhat.com>
/test e2e-agnostic-upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
This looks good from my end. The code changes are well isolated meaning that I'm confident that users that don't configure additional Alertmanagers have no risk of being impacted by the change.
/test e2e-agnostic-upgrade |
/retest Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
/lgtm |
/test e2e-agnostic-upgrade |
/retest Please review the full test history for this PR and help us cut down flakes. |
/lgtm ping @eparis for overriding the |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dislbenn, eparis, simonpasquier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |