Keep datadog monitors/dashboards/etc in version control, avoid chaotic management via UI.
- Documented, reusable, automated, and searchable configuration
- Changes are PR reviewed and auditable
- Good defaults like no-data / re-notify are preselected
- Reliable cleanup with automated deletion
- create a new private
kennel
repo for your organization, clone this repo, push the contents of thetemplate
folder into the private repo - uncomment
.travis.yml
section for automated github PR feedback and datadog updates on merge - setup travis build for the repo
- add a basic projects and teams so others can copy-paste to get started
projects/
monitors/dashboards/etc scoped by projectteams/
team definitionsparts/
monitors/dashes/etc that are used by multiple projectsgenerated/
projects as json, to show current state and proposed changes in PRs
- clone the repo
gem install bundler && bundle install
- go to Datadog API Settings
- find or create your personal "Application Key" and add it to
.env
asDATADOG_APP_KEY=
(will be on the last page if new) - copy any
API Key
and add it to.env
asDATADOG_API_KEY
- use datadog monitor UI to create a monitor
- get the
id
from the url, click "Export Monitor" on the monitors edit tab to get thequery
andtype
- see below
- find or create a project in
projects/
- add a monitor to
parts: [
list
class MyProject < Kennel::Models::Project
defaults(
team: -> { Teams::MyTeam.new }, # use existing team or create new one in teams/
parts: -> {
[
Kennel::Models::Monitor.new(
self,
id: -> { 123456 }, # id from datadog url
type: -> { "query alert" },
kennel_id: -> { "load-too-high" }, # make up a unique name
name: -> { "Foobar Load too high" }, # nice descriptive name that will show up in alerts and emails
message: -> {
# Explain what behavior to expect and how to fix the cause. Use #{super()} to add team notifications.
<<~TEXT
Foobar will be slow and that could cause Barfoo to go down.
Add capacity or debug why it is suddenly slow.
#{super()}
TEXT
},
query: -> { "avg(last_5m):avg:system.load.5{hostgroup:api} by {pod} > #{critical}" }, # replace actual value with #{critical} to keep them in sync
critical: -> { 20 }
)
]
}
)
end
bundle exec rake plan
update to existing should be shown (not Create / Delete)- alternatively:
bundle exec rake generate
to only update the generatedjson
files - review changes then
git commit
- make a PR ... get reviewed ... merge
- datadog is updated by travis
- go to datadog dashboard UI and click on New Dashboard to create a dashboard
- get the
id
from the url - see below
- find or create a project in
projects/
- add a dashboard to
parts: [
list
class MyProject < Kennel::Models::Project
defaults(
team: -> { Teams::MyTeam.new }, # use existing team or create new one in teams/
parts: -> {
[
Kennel::Models::Dash.new(
self,
id: -> { 123457 }, # id from datadog url
title: -> { "My Dashboard" },
description: -> { "Overview of foobar" },
template_variables: -> { ["environment"] }, # see https://docs.datadoghq.com/api/?lang=ruby#timeboards
kennel_id: -> { "overview-dashboard" }, # make up a unique name
definitions: -> {
[ # An array or arrays, each one is a graph in the dashboard, alternatively a hash for finer control
[
# title, viz, type, query, edit an existing graph and see the json definition
"Graph name", "timeseries", "area", "sum:mystats.foobar{$environment}"
],
[
# queries can be an Array as well, this will generate multiple requests
# for a single graph
"Graph name", "timeseries", "area", ["sum:mystats.foobar{$environment}", "sum:mystats.success{$environment}"]
]
]
}
)
]
}
)
end
- needs to be implemented, is be similar to
dash.rb
- add to
parts:
list
Kennel::Models::Screen.new(
self,
board_title: -> { "test-board" },
widgets: -> { [{text: "Hello World", height: 6, width: 24, x: 0, y: 0, type: "free_text"}] }
)
Some validations might be too strict for your usecase or just wrong, please open an issue for that and
in the meantime use the validate: -> { false }
option.
- make sure to be on update
master
to not undo other changes - run
bundle exec rake kennel:update_datadog
Add to parts/<folder>
.
module Monitors
class LoadTooHigh < Kennel::Models::Monitor
defaults(
name: -> { "#{project.name} load too high" },
message: -> { "Shut it down!" },
query: -> { "avg(last_5m):avg:system.load.5{hostgroup:#{project.kennel_id}} by {pod} > #{critical}" }
)
end
end
Reuse it in multiple projects.
class Database < Kennel::Models::Project
defaults(
team: -> { Kennel::Models::Team.new(slack: -> { 'foo' }) },
parts: -> { [Monitors::LoadTooHigh.new(self, critical: -> { 13 })] }
)
end