Skip to content

Conversation

@shalomtuby
Copy link

@shalomtuby shalomtuby commented Sep 28, 2023

Why

This enhancement adds an ability to notify alerts to a GCP PubSub topic.

We found several examples implementing this by adding components/services like a cloud function or local server to "proxy" the alerts into the GCP PubSub topic. (for instance, https://modules.prosody.im/mod_pubsub_alertmanager.html).

We want to avoid unnecessary additional components in that process.

This use case can benefit more developers and projects and add more opportunities to use the Alert Manager system for their needs.

How

There is more than one way to implement this integration:

  1. The official way to publish messages to a GCP PubSub topic is using the Official Google PubSub client,
  2. Using the Google PubSub REST API endpoint.
    Implementing the REST API way was chosen to reduce the dependencies that will added to the code base for the GCP PubSub notifier support. Also, to re-use the existing code base for Webhook integration.

Do we need a new notifier (and not using existing Webhook)?
The exiting webhook notifier is missing two required particulars that GCP PubSub REST API is needed:

  1. The authentication using the google oauth2 with the ability to refresh the token on the fly.
  2. The message structure for the :publish endpoint is strict with a different body schema, i.e.:
{
   "messages": [
      {
         "data": "the message content",
         "attributes": {"key", "value"},
         "orderingKey": "ordering_key"
      }
   ]
}

Prerequisite

  1. GCP account
  2. PubSub Topic
  3. Service Account with a minimal permission role roles/pubsub.publisher

Configuration

type GooglePubsubConfig struct {
  //The GCP project ID
  Project       string                     `yaml:"project" json:"project"`
  // the PubSub target topic name
  Topic         string                     `yaml:"topic" json:"topic"`

  Authorization *GooglePubsubAuthorization `yaml:"authorization,omitempty" json:"authorization,omitempty"`
  // used in unit test to mock the endpoint
  TemplateURL *URL `yaml:"templateURL,omitempty" json:"templateUrl,omitempty"`
  ...
}

type GooglePubsubAuthorization struct {
       // The GCP service-account JSON file on the disk
       // We can add more auth implementations later, like `google auth default credentials.`
    ServiceAccountFile string `yaml:"service_account_file,omitempty" json:"service_account_file,omitempty"`
}

The receiver config also supported a global configuration google_service_account_file as a shared configuration for all the google_pubsub receiver instances (We can use the global config GCP service account for future GCP-related integration)

An example for alertmanager.yml configuration:

global:
  # as a global variable
  google_service_account_file: "credentials.json"

...

receivers:
  - name: 'pubsub'
    google_pubsub_configs:
      - project: 'project-name'
        topic: 'topic-name'
        authorization:
          service_account_file: "credentials.json"
        send_resolved: true

Implementation

On receiver initials, the receiver reads the service account file from the disk and constructs a new TokenSource based on the provided service account with the required scope https://www.googleapis.com/auth/pubsub using google oauth2 lib.

The receiver creates a new HttpClient attached to the new TokenSource and stores the new client on the Notifier config.

The receiver generates the target URL based on the URL template, Project, and Topic configuration, and stores the new client on the Notifier config.

When the notifier receives a new group event, we construct a new PubSub message with a single message instance inside that contains the "original" Alert Manager group message.

The notifier fills the PubSub message with all available attributes (key values struct) of the source (GroupKey, GroupLabels, CommonLabels, and CommonAnnotations)

The notifier fills the PubSub Message orderingKey with the GroupKey value

The notifier sends the message using the httpClient from the notifier config.

The notifier handles the response based on the status code family (same as Webhook does):

  • 500 Status code family → retry
  • 400 Status code family → don't retry

On success, the :publish endpoint returns the created PubSub message ID, and we print it to the log.

@shalomtuby shalomtuby force-pushed the pubsub-receiver branch 3 times, most recently from 50ee2eb to 58c6160 Compare October 5, 2023 06:41
@shalomtuby shalomtuby marked this pull request as ready for review October 5, 2023 06:48
@TheMeier
Copy link
Contributor

TheMeier commented Mar 9, 2024

@shalomtuby needs rebase

@LotadTech
Copy link

@TheMeier

Hey are we able to rebase this and try get this implemented would be very handy for users on GCP.

If @shalomtuby is not here we can move this to a new PR and push this along?

@shalomtuby shalomtuby force-pushed the pubsub-receiver branch 4 times, most recently from 3432233 to e4ee1ee Compare May 27, 2024 17:48
…ish endpoint

Signed-off-by: Shalom Tuby <shalom.tuby@satoricyber.com>
@shalomtuby
Copy link
Author

@TheMeier

Hey are we able to rebase this and try get this implemented would be very handy for users on GCP.

If @shalomtuby is not here we can move this to a new PR and push this along?

Hi @BrandonDalton ,
PR rebased onto 'main'.
Could you please help me understand why the test is failing? Should I include the react-app dist output within the PR?
I'm not a Go expert, so code review will be helpful here.
Thanks 🙏

@jkroepke
Copy link
Member

@shalomtuby the tests are just unstable, you can observe them here, too: #3845

Someone had to retry your tests.

@LotadTech
Copy link

LotadTech commented May 29, 2024

Looks like some issue with

port 44911: listen tcp 127.0.0.1:44911: bind: address already in use\n" ts=2024-05-27T17:58:06.774Z caller=main.go:225 level=error msg="unable to initialize gossip mesh" err="create memberlist: Could not set up network transport: failed to obtain an address: Failed to start TCP listener on \"127.0.0.1\" port 44911: listen tcp 127.0.0.1:44911: bind: address already in use"

@LotadTech
Copy link

Can we get someone to rerun the tests?

@shalomtuby shalomtuby closed this Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants