Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom alertmanager go template to enhance email config #1721

Conversation

Priyankasaggu11929
Copy link
Contributor

@Priyankasaggu11929 Priyankasaggu11929 commented Apr 16, 2021

Description

The PR make changes to add a custom alertmanager go template (as a second data item in the alertmanager-application-monitoring secret). The new custom go template is aiming to enhance the pre-existing email configuration.

Related JIRA ticket: https://issues.redhat.com/browse/MGDAPI-1580
Files updated:

  • pkg/products/monitoring/reconciler.go
  • pkg/products/monitoring/reconciler_test.go
  • pkg/products/monitoring/templateHelper.go
  • templates/monitoring/alertmanager/alertmanager-email-config.tmpl ~ (New alertmanager custom go template)
  • templates/monitoring/alertmanager/alertmanager-application-monitoring.yaml

Steps to verify changes

  • Install RHOAM using this PR branch.
  • Verify the following fields in the alertmanager config:
    • email configs:
      • Subject: '{{template "email.integreatly.subject" . }}'
      • html: '{{ template "email.integreatly.html" . }}'
    • pagerduty configs:
      • description: '{{template "email.integreatly.subject" . }}'
      • cluster_ID: <cluster-id>
      • cluster_name: <cluster-name>
      • console: <console-url>
      • grafana: <grafana-url>
    • And finally at the bottom of the config, verify if the external custom template is linked properly
      templates:
      - /etc/alertmanager/config/alertmanager-email-config.tmpl
      
  • Verify all unit tests defined under the pkg/products/monitoring/... are passing:
    command: ~ go test -v -coverprofile cover.out ./pkg/products/monitoring/...

custom_email_config
custom_email_template

  • The final result of the above changes would look like:

    • In Email:

      Following is the new email subject format:

        1. For single alert: 
               [cluster_name] alert_name (alert_severity) Firing | Resolved
       
               For example: 
                  [psaggu-stage-2865n] ThreeScaleContainerHighMemory (Critical) Firing
                  [psaggu-stage-2865n] ThreeScaleContainerHighMemory (Critical) Resolved
       
       2. For multiple alerts clumped by alertmanager 
              [cluster_name] Multiple Alerts* (alert_serverity) Firing | Resolved
       
              For example:
                 [psaggu-stage-2865n] Multiple Alerts* (Critical) Firing
                 [psaggu-stage-2865n] Multiple Alerts* (Critical) Resolved
      

    email_subject_template
    email_html_template

    • In PagerDuty:
      pagerduty

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have added a test case that will be used to verify my changes
  • Verified independently on a cluster by reviewer

@openshift-ci-robot
Copy link

Hi @Priyankasaggu11929. Thanks for your PR.

I'm waiting for a integr8ly member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Priyankasaggu11929
Copy link
Contributor Author

/assign @laurafitzgerald

Please review. Thanks!

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from laurafitzgerald after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Priyankasaggu11929 Priyankasaggu11929 force-pushed the add-custom-alertmanager-email-template branch 2 times, most recently from cdca6e3 to 0e7a935 Compare April 16, 2021 14:43
@pepedocs
Copy link

Hi @laurafitzgerald just to add, the following is the solution as to how we have customized the alertmanager email output and some field of the pagerduty output.

Assuming an Output is what we get as a result of a firing alert (e.g. alertmanager email or pagerduty incident).

  1. Alertmanager supports custom templates to customize alert outputs for example emails, pagerduty incidents, slack, etc. Currently, integreatly is using the default email template which can be located here.

  2. We used a similar method here to customize the alertmanager email output to have more information relevant and useful to SRE. For example, we need to be able to make the alert title much more readable and in the email subject and make this subject less cluttered as possible.

  3. The goal was to customize the default email template in 1. by adding some information (e.g. cluster info) at runtime in the integreatly operator.

Challenge:

  • The default alertmanager template in 1. is already in a go template format with variables that can only be realized in alertmanager scope (e.g. .Alerts). Therefore, we cannot easily add more evaluations/variables that has the integreatly scope (e.g. Cluster information).

Solution:

  • Initialize the email template alertmanager-email-config.tmpl with cluster info placeholder strings for example ${CLUSTERID}. At runtime, in the integreatly operator scope, replace these placeholders with actual cluster information by using a simple string replace. The output of this is the new alertmanager template lertmanager-email-config.tmpl that will be consumed by alertmanager.

@laurafitzgerald
Copy link
Contributor

@Priyankasaggu11929 thanks for the pr. 👏 This change should get visibility of more than one engineer on the team. Could you also share it to either rhoam engineering or customer success team channel for visibility.

Copy link
Contributor

@briangallagher briangallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pepedocs @Priyankasaggu11929 One small issue above and a question. Do you think there is any risk in the new email template being incompatible with newer versions of AlertManager?


// For OpenShift console
grafanaRouteName = "grafana-route"
grafanaRouteNamespace = "redhat-rhoam-middleware-monitoring-operator"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Priyankasaggu11929 We need to consider RHMI here. The namespace may be "redhat-rhmi-middleware-monitoring-operator" You can probably use "config.getOperatorNamespace()"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briangallagher,

I'm testing for the grafana route by passing the namespace value using config.getOperatorNamespace(), in place of creating a constant variable like above grafanaRouteNamespace. It's working fine in the reconciler.go file (L807-L810) & I'm able to verify changes in the alertmanager config.

	grafanaRoute := &routev1.Route{}
	if err := serverClient.Get(context.TODO(), types.NamespacedName{Name: grafanaRouteName, Namespace: r.Config.GetOperatorNamespace()}, grafanaRoute); err != nil {
		return integreatlyv1alpha1.PhaseFailed, fmt.Errorf("failed to fetch OpenShift console URL details for alertmanager config: %w", err)
	}

But, I'm unable to find out a way to use the same function, config.getOperatorNamespace(), in reconciler_test.go (L414-L422) file. Could you suggest how I can do that?

	grafanaRoute := &routev1.Route{
		ObjectMeta: metav1.ObjectMeta{
			Name:      grafanaRouteName,
			Namespace: <THIS_PLACE_HOLDER>,
		},
		Spec: routev1.RouteSpec{
			Host: "example-grafana.com",
		},
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test, just hardcode in a relevant namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding the grafana route namesapce in reconciler_test.go file is failing one of the unit-test defined here.

For now, I have rebased the branch with upstream master & removed grafana route changes from all the files it was referenced. I've testing it again on a staging OSD cluster & it's working fine for me.

I'll raise a separate PR for the grafana route changes once this is merged.

@briangallagher
Copy link
Contributor

Confirmed on cluster with RHOAM install.

@pepedocs
Copy link

pepedocs commented Apr 30, 2021

Sorry forgot to add mention. @briangallagher

Do you think there is any risk in the new email template being incompatible with newer versions of AlertManager?

Yes but if that happens that would also break all current assumptions about writing a custom template. We basically just duplicated the default template and then modified some areas that will contain cluster-specific info. If AM deprecates a template variable and make it unusable for example, this will break the custom template we made here, but all users of this variable will also get affected, hence it'd be unlikely. Apart from this risk, I don't see anything else TBH.

@Priyankasaggu11929 Priyankasaggu11929 force-pushed the add-custom-alertmanager-email-template branch from 0e7a935 to 5290159 Compare May 1, 2021 07:21
@Priyankasaggu11929 Priyankasaggu11929 changed the title Add custom alertmanager go template to enhance email config WIP: Add custom alertmanager go template to enhance email config May 1, 2021
@Priyankasaggu11929 Priyankasaggu11929 force-pushed the add-custom-alertmanager-email-template branch from 5290159 to f19befe Compare May 1, 2021 08:26
@Priyankasaggu11929 Priyankasaggu11929 changed the title WIP: Add custom alertmanager go template to enhance email config Add custom alertmanager go template to enhance email config May 1, 2021
@Priyankasaggu11929
Copy link
Contributor Author

/ok-to-test

@openshift-ci-robot
Copy link

@Priyankasaggu11929: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Priyankasaggu11929
Copy link
Contributor Author

@briangallagher @laurafitzgerald, could any of you please mark the PR /ok-to-test.

Thank you!

@briangallagher
Copy link
Contributor

/ok-to-test

@Priyankasaggu11929
Copy link
Contributor Author

Hi @briangallagher, just re-iterating.

All the prow tests passed successfully. Is it fine to merge now? Thank you.

@briangallagher
Copy link
Contributor

/lgtm Verified based on above steps

@briangallagher
Copy link
Contributor

@Priyankasaggu11929 The PR is blocked from merging due to the ongoing release. As soon as that blocking JIRA is remove (release complete) you can ask one of the approvers to approve

@pmccarthy
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 11, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmccarthy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@briangallagher
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 11, 2021
@openshift-merge-robot openshift-merge-robot merged commit 20036db into integr8ly:master May 11, 2021
MStokluska pushed a commit to MStokluska/integreatly-operator that referenced this pull request May 11, 2021
…alertmanager-email-template

Add custom alertmanager go template to enhance email config
MStokluska pushed a commit to MStokluska/integreatly-operator that referenced this pull request May 11, 2021
…alertmanager-email-template

Add custom alertmanager go template to enhance email config
MStokluska pushed a commit to MStokluska/integreatly-operator that referenced this pull request May 11, 2021
…alertmanager-email-template

Add custom alertmanager go template to enhance email config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants