Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unbound input plugin #3434

Merged
merged 10 commits into from
Nov 20, 2017
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions etc/telegraf.conf
Original file line number Diff line number Diff line change
Expand Up @@ -2865,6 +2865,19 @@
# # socket_listener plugin
# # see https://github.com/influxdata/telegraf/tree/master/plugins/inputs/socket_listener

# # A plugin to collect stats from Unbound - a validating, recursive, and caching DNS resolver
# [[inputs.unbound]]
# ## If running as a restricted user you can prepend sudo for additional access:
# #use_sudo = false
#
# ## The default location of the unbound-control binary can be overridden with:
# binary = "/usr/sbin/unbound-control"
#
# ## By default, telegraf gather stats for 4 metric points.
# ## Setting stats will override the defaults shown below.
# ## Glob matching can be used, ie, stats = ["total.*"]
# ## stats may also be set to ["*"], which will collect all stats
# stats = ["total.*", "num.*","time.up", "mem.*"]

# # A Webhooks Event collector
# [[inputs.webhooks]]
Expand Down
1 change: 1 addition & 0 deletions plugins/inputs/all/all.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ import (
_ "github.com/influxdata/telegraf/plugins/inputs/trig"
_ "github.com/influxdata/telegraf/plugins/inputs/twemproxy"
_ "github.com/influxdata/telegraf/plugins/inputs/udp_listener"
_ "github.com/influxdata/telegraf/plugins/inputs/unbound"
_ "github.com/influxdata/telegraf/plugins/inputs/varnish"
_ "github.com/influxdata/telegraf/plugins/inputs/webhooks"
_ "github.com/influxdata/telegraf/plugins/inputs/win_perf_counters"
Expand Down
186 changes: 186 additions & 0 deletions plugins/inputs/unbound/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Unbound Input Plugin

This plugin gathers stats from [Unbound - a validating, recursive, and caching DNS resolver](https://www.unbound.net/)

### Configuration:

```toml
# A plugin to collect stats from Unbound - a validating, recursive, and caching DNS resolver
[[inputs.unbound]]
## If running as a restricted user you can prepend sudo for additional access:
#use_sudo = false

## The default location of the unbound-control binary can be overridden with:
binary = "/usr/sbin/unbound-control"

## By default, telegraf gathers stats for 3 metric points.
## Setting stats will override the defaults shown below.
## stats may also be set to ["all"], which will collect all stats
stats = ["total.*", "num.*","time.up", "mem.*"]
```

### Measurements & Fields:

This is the full list of stats provided by unbound. Stats will be grouped by their prefix (eg thread0,
total, etc). In the output, the prefix will be used as a tag, and removed from field names. See
https://www.unbound.net/documentation/unbound-control.html for details.

- unbound
thread0.num.queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section list the fields how they will be emitted from Telegraf, so thread0.num.queries will be num.queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider replace dots with underscores the field num_queries so the fields fit style wise with the other plugins.

thread0.num.cachehits
thread0.num.cachemiss
thread0.num.prefetch
thread0.num.recursivereplies
thread0.requestlist.avg
thread0.requestlist.max
thread0.requestlist.overwritten
thread0.requestlist.exceeded
thread0.requestlist.current.all
thread0.requestlist.current.user
thread0.recursion.time.avg
thread0.recursion.time.median
total.num.queries
total.num.cachehits
total.num.cachemiss
total.num.prefetch
total.num.recursivereplies
total.requestlist.avg
total.requestlist.max
total.requestlist.overwritten
total.requestlist.exceeded
total.requestlist.current.all
total.requestlist.current.user
total.recursion.time.avg
total.recursion.time.median
time.now
time.up
time.elapsed
mem.total.sbrk
mem.cache.rrset
mem.cache.message
mem.mod.iterator
mem.mod.validator
histogram.000000.000000.to.000000.000001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these histogram metrics encoded? Is the field named 000000.000000.to.000000.000001? Maybe we can do special processing of these, normally we use the format produced by the histogram aggregator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes actually the field name will be 000000.000000.to.000000.000001 which is not really usefful. Anyway I don't think that this kind of information is relevant for telegraf. It is easier to construct the histogram from telegraf collected data rather than using these fields. I think I will skip them from collected metrics.

histogram.000000.000001.to.000000.000002
histogram.000000.000002.to.000000.000004
histogram.000000.000004.to.000000.000008
histogram.000000.000008.to.000000.000016
histogram.000000.000016.to.000000.000032
histogram.000000.000032.to.000000.000064
histogram.000000.000064.to.000000.000128
histogram.000000.000128.to.000000.000256
histogram.000000.000256.to.000000.000512
histogram.000000.000512.to.000000.001024
histogram.000000.001024.to.000000.002048
histogram.000000.002048.to.000000.004096
histogram.000000.004096.to.000000.008192
histogram.000000.008192.to.000000.016384
histogram.000000.016384.to.000000.032768
histogram.000000.032768.to.000000.065536
histogram.000000.065536.to.000000.131072
histogram.000000.131072.to.000000.262144
histogram.000000.262144.to.000000.524288
histogram.000000.524288.to.000001.000000
histogram.000001.000000.to.000002.000000
histogram.000002.000000.to.000004.000000
histogram.000004.000000.to.000008.000000
histogram.000008.000000.to.000016.000000
histogram.000016.000000.to.000032.000000
histogram.000032.000000.to.000064.000000
histogram.000064.000000.to.000128.000000
histogram.000128.000000.to.000256.000000
histogram.000256.000000.to.000512.000000
histogram.000512.000000.to.001024.000000
histogram.001024.000000.to.002048.000000
histogram.002048.000000.to.004096.000000
histogram.004096.000000.to.008192.000000
histogram.008192.000000.to.016384.000000
histogram.016384.000000.to.032768.000000
histogram.032768.000000.to.065536.000000
histogram.065536.000000.to.131072.000000
histogram.131072.000000.to.262144.000000
histogram.262144.000000.to.524288.000000
num.query.type.A
num.query.type.PTR
num.query.type.TXT
num.query.type.AAAA
num.query.type.SRV
num.query.type.ANY
num.query.class.IN
num.query.opcode.QUERY
num.query.tcp
num.query.ipv6
num.query.flags.QR
num.query.flags.AA
num.query.flags.TC
num.query.flags.RD
num.query.flags.RA
num.query.flags.Z
num.query.flags.AD
num.query.flags.CD
num.query.edns.present
num.query.edns.DO
num.answer.rcode.NOERROR
num.answer.rcode.SERVFAIL
num.answer.rcode.NXDOMAIN
num.answer.rcode.nodata
num.answer.secure
num.answer.bogus
num.rrset.bogus
unwanted.queries
unwanted.replies

### Tags:

As indicated above, the prefix of a unbound stat will be used as it's 'section' tag. So section tag may have one of
the following values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like each section is very different, maybe we should call the measurement unbound_<section> and remove the tag. If we did this we would probably want to do something special for the threadX section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is not too much data (no really need to spread information in several measurements) and I do not control unbound-control future evolution, I will start to simply drop section/tag and push all fields as they are (just converting field name dots to underscores).
I will just have a look on the "histograms" fields and see if I drop them or do something intelligible. I am not sure that these fields are really interesting for telegraf as we are collecting timeseries data... we don't really need partially pre-processed historical information.

- section:
- thread0
- total
- time
- mem
- histogram
- num
- unwanted

### Permissions:

It's important to note that this plugin references unbound-control, which may require additional permissions to execute successfully.
Depending on the user/group permissions of the telegraf user executing this plugin, you may need to alter the group membership, set facls, or use sudo.

**Group membership (Recommended)**:
```bash
$ groups telegraf
telegraf : telegraf

$ usermod -a -G unbound telegraf

$ groups telegraf
telegraf : telegraf unbound
```

**Sudo privileges**:
If you use this method, you will need the following in your telegraf config:
```toml
[[inputs.unbound]]
use_sudo = true
```

You will also need to update your sudoers file:
```bash
$ visudo
# Add the following line:
telegraf ALL=(ALL) NOPASSWD: /usr/sbin/unbound-control
```

Please use the solution you see as most appropriate.

### Example Output:

```
telegraf --config etc/telegraf.conf --input-filter unbound --test
* Plugin: inputs.unbound, Collection 1
> unbound,section=total,host=laptop-aromeyer num.cachemiss=0,requestlist.current.all=0,num.cachehits=0,requestlist.overwritten=0,requestlist.max=0,num.recursivereplies=0,requestlist.avg=0,recursion.time.avg=0,recursion.time.median=0,num.prefetch=0,requestlist.exceeded=0,requestlist.current.user=0,tcpusage=0,num.queries=0 1509977403000000000
> unbound,section=time,host=laptop-aromeyer up=5794.844261,elapsed=12.484727,now=1509977402.617432 1509977403000000000

```
163 changes: 163 additions & 0 deletions plugins/inputs/unbound/unbound.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
// +build !windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this build flag, and also in the test file.


package unbound

import (
"bufio"
"bytes"
"fmt"
"os/exec"
"strconv"
"strings"
"time"

"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/filter"
"github.com/influxdata/telegraf/internal"
"github.com/influxdata/telegraf/plugins/inputs"
)

type runner func(cmdName string, UseSudo bool) (*bytes.Buffer, error)

// Unbound is used to store configuration values
type Unbound struct {
Stats []string
Binary string
UseSudo bool

filter filter.Filter
run runner
}

var defaultStats = []string{"total.*", "num.*", "time.up", "mem.*"}
var defaultBinary = "/usr/sbin/unbound-control"

var sampleConfig = `
## If running as a restricted user you can prepend sudo for additional access:
#use_sudo = false

## The default location of the unbound-control binary can be overridden with:
binary = "/usr/sbin/unbound-control"

## By default, telegraf gather stats for 3 metric points.
## Setting stats will override the defaults shown below.
## Glob matching can be used, ie, stats = ["total.*"]
## stats may also be set to ["*"], which will collect all stats
stats = ["total.*", "num.*","time.up", "mem.*"]
`

func (s *Unbound) Description() string {
return "A plugin to collect stats from Unbound - a validating, recursive, and caching DNS resolver "
}

// SampleConfig displays configuration instructions
func (s *Unbound) SampleConfig() string {
return sampleConfig
}

// Shell out to unbound_stat and return the output
func unboundRunner(cmdName string, UseSudo bool) (*bytes.Buffer, error) {
cmdArgs := []string{"stats"}

cmd := exec.Command(cmdName, cmdArgs...)

if UseSudo {
cmdArgs = append([]string{cmdName}, cmdArgs...)
cmd = exec.Command("sudo", cmdArgs...)
}

var out bytes.Buffer
cmd.Stdout = &out
err := internal.RunTimeout(cmd, time.Millisecond*200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an aggressive timeout, I would move it up to at least a second and consider making it configurable.

if err != nil {
return &out, fmt.Errorf("error running unbound-control: %s", err)
}

return &out, nil
}

// Gather collects the configured stats from unbound_stat and adds them to the
// Accumulator
//
// The prefix of each stat (eg MAIN, MEMPOOL, LCK, etc) will be used as a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't actual sections, and there is no unbound_stat. I guess this is holdovers from an earlier version of the code, can you update it?

// 'section' tag and all stats that share that prefix will be reported as fields
// with that tag
func (s *Unbound) Gather(acc telegraf.Accumulator) error {
if s.filter == nil {
var err error
if len(s.Stats) == 0 {
s.filter, err = filter.Compile(defaultStats)
} else {
// legacy support, change "all" -> "*":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have legacy support already since we are a new plugin :)

if s.Stats[0] == "all" {
s.Stats[0] = "*"
}
s.filter, err = filter.Compile(s.Stats)
}
if err != nil {
return err
}
}

out, err := s.run(s.Binary, s.UseSudo)
if err != nil {
return fmt.Errorf("error gathering metrics: %s", err)
}

sectionMap := make(map[string]map[string]interface{})
scanner := bufio.NewScanner(out)
for scanner.Scan() {

cols := strings.Split(scanner.Text(), "=")

stat := cols[0]
value := cols[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could panic if somehow there are not two fields, make sure to guard against this.


if s.filter != nil && !s.filter.Match(stat) {
continue
}

parts := strings.SplitN(stat, ".", 2)

section := parts[0]
field := parts[1]

// Init the section if necessary
if _, ok := sectionMap[section]; !ok {
sectionMap[section] = make(map[string]interface{})
}

sectionMap[section][field], err = strconv.ParseUint(value, 10, 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parse as int64, since this is what the Accumulator holds, if there is an error parsing don't add to the fields.

if err != nil {
sectionMap[section][field], err = strconv.ParseFloat(value, 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fallback method could switch the type of the field, which will cause the metrics to be impossible to add to InfluxDB. We may need to parse all fields as floats, or have a list of the types.

if err != nil {
acc.AddError(fmt.Errorf("Expected a numeric or a float value for %s = %v\n",
stat, value))
}
}

}

for section, fields := range sectionMap {
tags := map[string]string{
"section": section,
}
if len(fields) == 0 {
continue
}
acc.AddFields("unbound", fields, tags)
}

return nil
}

func init() {
inputs.Add("unbound", func() telegraf.Input {
return &Unbound{
run: unboundRunner,
Stats: defaultStats,
Binary: defaultBinary,
UseSudo: false,
}
})
}
Loading