Skip to content
Emit Datadog monitors based on Kubernetes state.
Go Shell Dockerfile
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci adding multiple kube version e2e tests and updating orb version (#86) Feb 5, 2020
.github Update config Oct 9, 2019
cmd logging review (#80) Jan 15, 2020
e2e
hack initial pass at a leader election feature (#64) Dec 9, 2019
img Add icon in png format (#81) Jan 17, 2020
pkg
static Add roadmap Jun 28, 2019
.codecov.yml adding .codecov.yml Aug 2, 2019
.gitignore Add mutex to datadog code that is not thread safe (#71) Dec 16, 2019
.goreleaser.yml fix typo in goreleaser config and remove arm builds (#76) Jan 10, 2020
CHANGELOG.md Add changelog Jun 28, 2019
CODEOWNERS fix to CODEOWNERS syntax Jul 26, 2019
CODE_OF_CONDUCT.md Add coc,contributing,roadmap Jun 28, 2019
CONTRIBUTING.md Change dd-manager references to astro Sep 10, 2019
DESIGN.md Change dd-manager references to astro Sep 10, 2019
Dockerfile
LICENSE Initial commit Mar 13, 2019
README.md Adding helm install info to README (#77) Jan 13, 2020
ROADMAP.md
conf-example.yml Various fixes Jan 3, 2020
conf.yml first pass implementation of annotation overrides for monitor specs Jul 19, 2019
go.mod updating dependencies (#85) Feb 4, 2020
go.sum
main.go adding a single commandline flag to make the log level configurable Sep 12, 2019

README.md

Astro

CircleCI codecov Apache 2.0 license Go Report Card

Astro is designed to simplify Datadog monitor administration. This is an operator that emits Datadog monitors based on Kubernetes state. The operator responds to changes of resources in your kubernetes cluster and will manage Datadog monitors based on the configured state.

Want to learn more? Fairwinds holds office hours on Zoom the first Friday of every month, at 12pm Eastern. You can also reach out via email at opensource@fairwinds.com

Installing

The Astro helm chart is the preferred way to install Astro into your cluster.

Configuration

A combination of environment variables and a yaml file is used to configure the application. An example configuration file is available here.

Environment Variables

Variable Descritpion Required Default
DD_API_KEY The api key for your Datadog account. Y
DD_APP_KEY The app key for your Datadog account. Y
OWNER A unique name to designate as the owner. This will be applied as a tag to identified managed monitors. N astro
DEFINITIONS_PATH The path to monitor definition configurations. This can be a local path or a URL. Multiple paths should be separated by a ; N conf.yml
DRY_RUN when set to true monitors will not be managed in Datadog. N false

Configuration File

A configuration file is used to define your monitors. These are organized as rulesets, which consist of the type of resource the ruleset applies to, annotations that must be present on the resource to be considered valid objects, and a set of monitors to manage for that resource. Go templating syntax may be used in your monitors and values will be inserted from each Kubernetes object that matches the ruleset. There is also a section called cluster_variables that you can use to define your own variables. These variables can be inserted into the monitor templates.

---
cluster_variables:
  var1: test
  var2: test2
rulesets:
- type: deployment
  match_annotations:
  - name: astro/owner
    value: astro
  monitors:
    dep-replica-alert:
      name: "Deployment Replica Alert - {{ .ObjectMeta.Name }}"
      type: metric alert
      query: "max(last_10m):max:kubernetes_state.deployment.replicas_available{kubernetescluster:foobar,deployment:{{ .ObjectMeta.Name }}} <= 0"
      message: |-
        {{ "{{#is_alert}}" }}
        Available replicas is currently 0 for {{ .ObjectMeta.Name }}
        {{ "{{/is_alert}}" }}
        {{ "{{^is_alert}}" }}
        Available replicas is no longer 0 for {{ .ObjectMeta.Name }}
        {{ "{{/is_alert}}" }}
      tags: []
      options:
        no_data_timeframe: 60
        notify_audit: false
        notify_no_data: false
        renotify_interval: 5
        new_host_delay: 5
        evaluation_delay: 300
        timeout_h: 1
        escalation_message: ""
        thresholds:
          critical: 2
          warning: 1
          unknown: -1
          ok: 0
          critical_recovery: 0
          warning_recovery: 0
        include_tags: true
        require_full_window: true
        locked: false
  • cluster_variables: (dict). A collection of variables that can be used in monitors. They can be used in monitors by prepending with ClusterVariables, eg {{ ClusterVariables.var1 }}.
  • rulesets: (List). A collection of rulesets. A ruleset consists of a Kubernetes resource type, annotations the resource must have to be considered valid, and a collection of monitors to manage for the resource.
    • match_annotations: (List). A collection of name/value pairs pairs of annotations that must be present on the resource to manage it.
    • bound_objects: (List). A collection of object types that are bound to this object. For instance, if you have a ruleset for a namespace, you can bind other objects like deployments, services, etc. Then, when the bound objects in the namespace get updated, those rulesets apply to it.
    • monitors: (Map). A collection of monitors to manage for any resource that matches the rules defined.
      • Monitor Identifier (map key: unique and arbitrary, it should only include alpha characters and -)
        • name: Name of the Datadog monitor.
        • type: The type of the monitor, chosen from:
          • metric alert
          • service check
          • event alert
          • query alert
          • composite
          • log alert
        • query: The monitor query to notify on.
        • message: A message included with in monitor notifications.
        • tags: A list of tags to add to your monitor.
        • options: A dict of options, consisting of the following:
          • no_data_timeframe: Number of minutes before a monitor will notify if data stops reporting.
          • notify_audit: boolean that indicates whether tagged users are notified if the monitor changes.
          • notify_no_data: boolean that indicates if the monitor notifies if data stops reporting.
          • renotify_interval: Number of minutes after the last notification a monitor will re-notify.
          • new_host_delay: Number of seconds to wait for a new host before evaluating the monitor status.
          • evaluation_delay: Number of seconds to delay evaluation.
          • timeout_h: Number of hours the before the monitor will automatically resolve if it's not reporting data.
          • escalation_message: Message to include with re-notifications.
          • thresholds: Map of thresholds for the alert. Valid options are:
            • ok
            • critical
            • warning
            • unknown
            • critical_recovery
            • warning_recovery
          • include_tags: When true, notifications from this monitor automatically insert triggering tags into the title.
          • require_full_window: boolean indicating if a monitor needs a full window of data to be evaluated.
          • locked: boolean indicating if changes are only allowed from the creator or admins.

A Note on Templating

Since Datadog uses a very similar templating language to go templating, to pass a template variable to Datadog it must be "escaped" by inserting it as a template literal:

{{ "{{/is_alert}}" }}

Overriding Configuration

It is possible to override monitor elements using Kubernetes resource annotations.

You can annotate an object like so to override the name of the monitor:

annotations:
  astro.fairwinds.com/override.dep-replica-alert.name: "Deployment Replicas Alert"

In the example above we will be modifying the dep-replica-alert monitor (which is the Monitor Identifier from the config) to have a new name As of now, the only fields that can be overridden are:

  • name
  • message
  • query
  • type

Additionally, templating in the override is currently not available.

Contributing

PRs welcome! Check out the Contributing Guidelines, Code of Conduct, and Roadmap for more information.

Further Information

A history of changes to this project can be viewed in the Changelog

If you'd like to learn more about Astro, or if you'd like to speak with a Kubernetes expert, you can contact info@fairwinds.com or visit our website

License

Apache License 2.0

You can’t perform that action at this time.