Skip to content
Keep datadog monitors/dashboards/etc in version control, avoid chaotic management via UI
Branch: master
Clone or download
Latest commit 40d4be6 Mar 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib v1.32.0 Mar 14, 2019
template
test retry import as screen when dashboard is not found Mar 14, 2019
.gitignore
.rubocop.yml retry on hard errors too, like ssl or timeouts Dec 19, 2018
.ruby-version
.travis.yml
Gemfile truncate long error messages so something shows up in the comment Nov 1, 2018
Gemfile.lock
Rakefile
Readme.md fix travis Mar 5, 2019
kennel.gemspec also release template/readme so we can keep copied repos in sync Oct 9, 2018

Readme.md

Kennel

Keep datadog monitors/dashboards/etc in version control, avoid chaotic management via UI.

  • Documented, reusable, automated, and searchable configuration
  • Changes are PR reviewed and auditable
  • Good defaults like no-data / re-notify are preselected
  • Reliable cleanup with automated deletion

Install

  • create a new private kennel repo for your organization (do not fork this repo)
  • use the template folder as starting point:
    git clone git@github.com:your-org/kennel.git
    git clone git@github.com:grosser/kennel.git seed
    mv seed/teamplate/* kennel/
    cd kennel && git add . && git commit -m 'initial'
  • add a basic projects and teams so others can copy-paste to get started
  • setup travis build for your repo
  • uncomment .travis.yml section for automated github PR feedback and datadog updates on merge
  • follow Setup in your repos Readme.md

Structure

  • projects/ monitors/dashboards/etc scoped by project
  • teams/ team definitions
  • parts/ monitors/dashes/etc that are used by multiple projects
  • generated/ projects as json, to show current state and proposed changes in PRs

Workflows

Adding a team

# teams/my_team.rb
module Teams
  class MyTeam < Kennel::Models::Team
    defaults(
      slack: -> { "my-alerts" },
      email: -> { "my-team@exmaple.com" }
    )
  end
end

Adding a new monitor

  • use datadog monitor UI to create a monitor
  • get the id from the url
  • RESOURCE=monitor ID=12345 bundle exec rake kennel:import
  • see below

Updating an existing monitor

  • find or create a project in projects/
  • add a monitor to parts: [ list
# projects/my_project.rb
class MyProject < Kennel::Models::Project
  defaults(
    team: -> { Teams::MyTeam.new }, # use existing team or create new one in teams/
    parts: -> {
      [
        Kennel::Models::Monitor.new(
          self,
          id: -> { 123456 }, # id from datadog url, not necessary when creating a new monitor
          type: -> { "query alert" },
          kennel_id: -> { "load-too-high" }, # make up a unique name
          name: -> { "Foobar Load too high" }, # nice descriptive name that will show up in alerts and emails
          message: -> {
            # Explain what behavior to expect and how to fix the cause. Use #{super()} to add team notifications.
            <<~TEXT
              Foobar will be slow and that could cause Barfoo to go down.
              Add capacity or debug why it is suddenly slow.
              #{super()}
            TEXT
          },
          query: -> { "avg(last_5m):avg:system.load.5{hostgroup:api} by {pod} > #{critical}" }, # replace actual value with #{critical} to keep them in sync
          critical: -> { 20 }
        )
      ]
    }
  )
end
  • bundle exec rake plan update to existing should be shown (not Create / Delete)
  • alternatively: bundle exec rake generate to only update the generated json files
  • review changes then git commit
  • make a PR ... get reviewed ... merge
  • datadog is updated by travis

Adding a new dashboard

  • go to datadog dashboard UI and click on New Dashboard to create a dashboard
  • get the id from the url
  • RESOURCE=dash ID=12345 bundle exec rake kennel:import
  • see below

Updating an existing dashboard

  • find or create a project in projects/
  • add a dashboard to parts: [ list
class MyProject < Kennel::Models::Project
  defaults(
    team: -> { Teams::MyTeam.new }, # use existing team or create new one in teams/
    parts: -> {
      [
        Kennel::Models::Dash.new(
          self,
          id: -> { 123457 }, # id from datadog url, not needed when creating a new dashboard
          title: -> { "My Dashboard" },
          description: -> { "Overview of foobar" },
          template_variables: -> { ["environment"] }, # see https://docs.datadoghq.com/api/?lang=ruby#timeboards
          kennel_id: -> { "overview-dashboard" }, # make up a unique name
          definitions: -> {
            [ # An array or arrays, each one is a graph in the dashboard, alternatively a hash for finer control
              [
                # title, viz, type, query, edit an existing graph and see the json definition
                "Graph name", "timeseries", "area", "sum:mystats.foobar{$environment}"
              ],
              [
                # queries can be an Array as well, this will generate multiple requests
                # for a single graph
                "Graph name", "timeseries", "area", ["sum:mystats.foobar{$environment}", "sum:mystats.success{$environment}"],
                # add events too ...
                events: [{q: "tags:foobar,deploy", tags_execution: "and"}]
              ]
            ]
          }
        )
      ]
    }
  )
end

Adding a new screenboard

  • similar to dash.rb
  • add to parts: list
Kennel::Models::Screen.new(
  self,
  board_title: -> { "test-board" },
  kennel_id: -> { "test-screen" },
  widgets: -> {
    [
      {text: "Hello World", height: 6, width: 24, x: 0, y: 0, type: "free_text"},
      {title_text: "CPU", height: 12, width: 36, timeframe: "1mo", x: 0, y: 6, type: "timeseries", tile_def: {viz: "timeseries", requests: [{q: "avg:system.cpu.user{*}", type: "line"}]}}
    ]
  }
)

Skipping validations

Some validations might be too strict for your usecase or just wrong, please open an issue and to unblock use the validate: -> { false } option.

Linking with kennel_ids

To link to existing monitors via their kennel_id

  • Screens uptime widgets can use monitor: {id: "foo:bar"}
  • Screens alert_graph widgets can use alert_id: "foo:bar"

Debugging changes locally

  • rebase on updated master to not undo other changes
  • figure out project name by converting the class name to snake-case
  • run PROJECT=foo bundle exec rake kennel:update_datadog to test changes for a single project

Listing umuted alerts

Run rake kennel:alerts TAG=service:my-service to see all un-muted alerts for a given datadog monitor tag.

Examples

Reusable monitors/dashes/etc

Add to parts/<folder>.

module Monitors
  class LoadTooHigh < Kennel::Models::Monitor
    defaults(
      name: -> { "#{project.name} load too high" },
      message: -> { "Shut it down!" },
      type: -> { "query alert" },
      query: -> { "avg(last_5m):avg:system.load.5{hostgroup:#{project.kennel_id}} by {pod} > #{critical}" }
    )
  end
end

Reuse it in multiple projects.

class Database < Kennel::Models::Project
  defaults(
    team: -> { Kennel::Models::Team.new(slack: -> { 'foo' }, kennel_id: -> { 'foo' }) },
    parts: -> { [Monitors::LoadTooHigh.new(self, critical: -> { 13 })] }
  )
end

Integration testing

rake play
cd template
rake kennel:plan

Then make changes to play around, do not commit changes and make sure to revert with a rake kennel:update after deleting everything.

To make changes via the UI, make a new free datadog account and use it's credentaisl instead.

Author

Michael Grosser
michael@grosser.it
License: MIT
Build Status

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.