Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an input plugin to monitor basic info of Windows Services #3023

Merged
merged 48 commits into from Aug 7, 2017
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
0db2520
Initial win_services plugin code: a hardcoded service reporting rando…
vlastahajek Jul 10, 2017
7400e87
Newer golang.org/x/sys for mgr.ListServices
vlastahajek Jul 11, 2017
1d52da8
Completed basic functionality
vlastahajek Jul 11, 2017
3b4df68
Marked with TODO
vlastahajek Jul 12, 2017
b67c645
All services log under same measurement
vlastahajek Jul 12, 2017
017c131
Requirement analysis for windows services
vlastahajek Jul 12, 2017
5f93ba1
Fixed typo, better formation to highligh questions
vlastahajek Jul 12, 2017
f895ef3
Adjusted based on feedback from Daniel Nelson
vlastahajek Jul 13, 2017
570f613
Adjusted based on feedback from Daniel Nelson
vlastahajek Jul 13, 2017
45fb743
Adjusted based na analysis feedback:
vlastahajek Jul 13, 2017
96af3b7
Fix: Fixed error reporting
vlastahajek Jul 13, 2017
37c2952
- proper readme for win_services plugin
vlastahajek Jul 13, 2017
03447bc
proof reading
karel-rehor Jul 13, 2017
b05aa4a
small english correction
vlastahajek Jul 13, 2017
2bd9a78
using state and startup_mode as fields and display_name as a tag
vlastahajek Jul 14, 2017
3d4e3db
Added win_services unit tests
vlastahajek Jul 15, 2017
b88e0e5
Added comment explaining condition for running
vlastahajek Jul 15, 2017
3d00f89
Merge branch 'master' into vh-win-services
vlastahajek Jul 16, 2017
1ab4cdc
- Removing prefix 'service_' from state and startup mode enumerations
vlastahajek Jul 16, 2017
b5a5a1f
Edited to reflect actual state
vlastahajek Jul 16, 2017
957015c
Removing prefix 'service_' from state and startup mode enumerations
vlastahajek Jul 16, 2017
b54b8a0
Revert "Marked with TODO"
vlastahajek Jul 16, 2017
3b3e973
- Better script description
vlastahajek Jul 17, 2017
c568b19
proof read new section on TICK Scripts
karel-rehor Jul 17, 2017
e0a8773
- Fixed TICK script
vlastahajek Jul 17, 2017
d203d20
Delete obsolete file
vlastahajek Jul 17, 2017
4a0d3e8
Added important note
vlastahajek Jul 17, 2017
75407cc
Typo
vlastahajek Jul 17, 2017
d570894
Reformatted with go fmt
vlastahajek Jul 18, 2017
3b3106f
Reformated with go fmt
vlastahajek Jul 18, 2017
721264a
- Implemented PR review feedback
vlastahajek Jul 18, 2017
e0dbcab
Changed state and startup mode fields from string to int
vlastahajek Jul 19, 2017
86b989a
- Changed state and startup mode fields from string to int
vlastahajek Jul 19, 2017
a96932c
Changed state and startup mode fields from string to int
vlastahajek Jul 19, 2017
3568a10
Merge branch 'master' into vh-win-services
vlastahajek Jul 19, 2017
5eda565
Small improvements based on PR discussion:
vlastahajek Aug 1, 2017
1966f42
Changing existing tests, which test real service manager, to integrat…
vlastahajek Aug 1, 2017
ddbdcfc
Added interfaces and real impl wrappers to enable unit testing
vlastahajek Aug 1, 2017
8c75562
Added data driven unit tests with mock implementation
vlastahajek Aug 1, 2017
5b71ba8
Fixed getter name
vlastahajek Aug 1, 2017
e444c75
Merge branch 'master' into vh-win-services
vlastahajek Aug 1, 2017
9085744
Added documentation to types
vlastahajek Aug 2, 2017
547cb9b
Support types made internal
vlastahajek Aug 2, 2017
9f6ca38
Merge branch 'master' into vh-win-services
vlastahajek Aug 4, 2017
86cef94
Merged changes from vh-win-services branch from Godeps_windows
vlastahajek Aug 4, 2017
215828e
Proper skipping of tests in short mode
vlastahajek Aug 4, 2017
d307493
Unit test don't need admin permission
vlastahajek Aug 7, 2017
2b1ee6d
Win_services folder added to windows tests
vlastahajek Aug 7, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion Godeps_windows
Expand Up @@ -2,7 +2,7 @@ github.com/Microsoft/go-winio ce2922f643c8fd76b46cadc7f404a06282678b34
github.com/StackExchange/wmi f3e2bae1e0cb5aef83e319133eabfee30013a4a5
github.com/go-ole/go-ole be49f7c07711fcb603cff39e1de7c67926dc0ba7
github.com/shirou/w32 3c9377fc6748f222729a8270fe2775d149a249ad
golang.org/x/sys a646d33e2ee3172a661fc09bca23bb4889a41bc8
golang.org/x/sys 739734461d1c916b6c72a63d7efda2b27edb369f
github.com/go-ini/ini 9144852efba7c4daf409943ee90767da62d55438
github.com/jmespath/go-jmespath bd40a432e4c76585ef6b72d3fd96fb9b6dc7b68d
github.com/pmezard/go-difflib/difflib 792786c7400a136282c1664665ae0a8db921c6c2
Expand Down
1 change: 1 addition & 0 deletions plugins/inputs/all/all.go
Expand Up @@ -87,6 +87,7 @@ import (
_ "github.com/influxdata/telegraf/plugins/inputs/varnish"
_ "github.com/influxdata/telegraf/plugins/inputs/webhooks"
_ "github.com/influxdata/telegraf/plugins/inputs/win_perf_counters"
_ "github.com/influxdata/telegraf/plugins/inputs/win_services"
_ "github.com/influxdata/telegraf/plugins/inputs/zfs"
_ "github.com/influxdata/telegraf/plugins/inputs/zookeeper"
)
68 changes: 68 additions & 0 deletions plugins/inputs/win_services/README.md
@@ -0,0 +1,68 @@
# Telegraf Plugin: win_services
Input plugin to report Windows services info.

It requires that Telegraf must be running under the administrator privileges.
### Configuration:

```toml
[[inputs.win_services]]
## Names of the services to monitor. Leave empty to monitor all the available services on the host
service_names = [
"LanmanServer",
"TermService",
]
```

### Measurements & Fields:

- win_services
- state : integer
- startup_mode : integer

The `state` field can have the following values:
- 1 - stopped
- 2 - start pending
- 3 - stop pending
- 4 - running
- 5 - continue pending
- 6 - pause pending
- 7 - paused

The `startup_mode` field can have the following values:
- 0 - boot start
- 1 - system start
- 2 - auto start
- 3 - demand start
- 4 - disabled

### Tags:

- All measurements have the following tags:
- service_name
- display_name

### Example Output:
```
* Plugin: inputs.win_services, Collection 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove everything in this section up through this line, so the only items are the example line protocol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, np.
I made it this way cause it's done so in the example plugins you advised me (ngnix, kapacitor).

> win_services,host=WIN2008R2H401,display_name=Server,service_name=LanmanServer state=4i,startup_mode=2i 1500040669000000000
> win_services,display_name=Remote\ Desktop\ Services,service_name=TermService,host=WIN2008R2H401 state=1i,startup_mode=3i 1500040669000000000
```
### TICK Scripts

A sample TICK script for a notification about a not running service.
It sends a notification whenever any service changes its state to be not _running_ and when it changes that state back to _running_.
The notification is sent via an HTTP POST call.

```
stream
|from()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('win_services')
.groupBy('host','service_name')
|alert()
.crit(lambda: "state" != 4)
.stateChangesOnly()
.message('Service {{ index .Tags "service_name" }} on Host {{ index .Tags "host" }} is in state {{ index .Fields "state" }} ')
.post('http://localhost:666/alert/service')
```
172 changes: 172 additions & 0 deletions plugins/inputs/win_services/win_services.go
@@ -0,0 +1,172 @@
// +build windows

package win_services

import (
"fmt"
"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/plugins/inputs"
"golang.org/x/sys/windows/svc/mgr"
)

var sampleConfig = `
## Names of the services to monitor. Leave empty to monitor all the available services on the host
service_names = [
"LanmanServer",
"TermService",
]
`

var description = "Input plugin to report Windows services info."

type Win_Services struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this WinServices

ServiceNames []string `toml:"service_names"`
}

type ServiceInfo struct {
ServiceName string
DisplayName string
State int
StartUpMode int
Error error
}

var ServiceStatesMap = map[int]string{
0x00000001: "stopped",
0x00000002: "start_pending",
0x00000003: "stop_pending",
0x00000004: "running",
0x00000005: "continue_pending",
0x00000006: "pause_pending",
0x00000007: "paused",
}

var ServiceStartupModeMap = map[int]string{
0x00000000: "boot_start",
0x00000001: "system_start",
0x00000002: "auto_start",
0x00000003: "demand_start",
0x00000004: "disabled",
}

func (m *Win_Services) Description() string {
return description
}

func (m *Win_Services) SampleConfig() string {
return sampleConfig
}

func (m *Win_Services) Gather(acc telegraf.Accumulator) error {

serviceInfos, err := listServices(m.ServiceNames)

if err != nil {
return err
}

for _, service := range serviceInfos {
if service.Error == nil {
fields := make(map[string]interface{})
tags := make(map[string]string)

tags["display_name"] = service.DisplayName
tags["service_name"] = service.ServiceName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags should not contain the empty string, since InfluxDB will not accept them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the best practice for storing a measurement in case of a tag has empty value?

  1. Skip the tag? Store just the service_name tag and fields?
  2. Insert a place holder text, e.g. "" or "<empty_display_name>"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either skip the tag or omit the entire point, depending on if the field needs to be required. For display name I would skip, since it is optional. For service name I would omit the point.


fields["state"] = service.State
fields["startup_mode"] = service.StartUpMode

acc.AddFields("win_services", fields, tags)
} else {
acc.AddError(service.Error)
}
}

return nil
}

func listServices(userServices []string) ([]ServiceInfo, error) {
scmgr, err := mgr.Connect()
if err != nil {
return nil, fmt.Errorf("Could not open service manager: %s", err)
}
defer scmgr.Disconnect()

var serviceNames []string
if len(userServices) == 0 {
//Listing service names from system
serviceNames, err = scmgr.ListServices()
if err != nil {
return nil, fmt.Errorf("Could not list services: %s", err)
}
} else {
serviceNames = userServices
}
serviceInfos := make([]ServiceInfo, len(serviceNames))

for i, srvName := range serviceNames {
serviceInfos[i] = collectServiceInfo(scmgr, srvName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest way to deal with zero value tags/fields IMO is to return an (ServiceInfo, error) and append if err != nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then need to propagate a service error to Gather, or pass Accumulator here.
I want to keep the collecting info separated from plugin API.

The ServiceInfo structure is a domain model of a service, any error occurred during the collecting of service info is a property of such model.

This allows future enhancement in case a user will want to record also service errors into db. Because I still think that if user wants to monitor a particular service and on some hosts it is not possible due to an error, user has to look into the telegraf log instead into the db.

But maybe not, we will see.

}

return serviceInfos, nil
}

func collectServiceInfo(scmgr *mgr.Mgr, serviceName string) (serviceInfo ServiceInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend returning (ServiceInfo, error) then you can remove the Error field from ServiceInfo.


serviceInfo.ServiceName = serviceName
srv, err := scmgr.OpenService(serviceName)
if err != nil {
serviceInfo.Error = fmt.Errorf("Could not open service '%s': %s", serviceName, err)
return
}
defer srv.Close()

//While getting service info there could a theoretically a lot of errors on different places.
//However in reality if there is a problem with a service then usually openService fails and if it passes, other calls will most probably be ok
//So, following error checking is just for sake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment, basically applies to almost any error :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network, or more generally communication, errors are for example more probable than subsequent errors from Windows Service API. I still think that checking valid ranges of state and startup_mode is not necessary cause if invalid value would be return than the function will return error anyway.

This comment was intended to explain this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get rid of the checkState logic since we are now operating on ints, but we need to check all errors from the external api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, checking errors of functions is must, no doubts.

srvStatus, err := srv.Query()
if err == nil {
state := int(srvStatus.State)
if !checkState(state) {
serviceInfo.Error = fmt.Errorf("Uknown state of Service %s: %d", serviceName, state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in Uknown -> Unknown

//finish collecting info on first found error
return
}
serviceInfo.State = state
} else {
serviceInfo.Error = fmt.Errorf("Could not query service '%s': %s", serviceName, err)
//finish collecting info on first found error
return
}

srvCfg, err := srv.Config()
if err == nil {
startupMode := int(srvCfg.StartType)
if !checkStartupMode(startupMode) {
serviceInfo.Error = fmt.Errorf("Uknown startup mode of Service %s: %d", serviceName, startupMode)
//finish collecting info on first found error
return
}
serviceInfo.DisplayName = srvCfg.DisplayName
serviceInfo.StartUpMode = startupMode
} else {
serviceInfo.Error = fmt.Errorf("Could not get config of service '%s': %s", serviceName, err)
}
return
}

//returns true of state is in valid range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't super important to me, but go has the convention of starting the comment with the name of the function. https://blog.golang.org/godoc-documenting-go-code, ex:

// checkState returns true if state is valid

func checkState(state int) bool {
_, ok := ServiceStatesMap[state]
return ok
}

//returns true of startup mode is in valid range
func checkStartupMode(startupMode int) bool {
_, ok := ServiceStartupModeMap[startupMode]
return ok
}

func init() {
inputs.Add("win_services", func() telegraf.Input { return &Win_Services{} })
}
3 changes: 3 additions & 0 deletions plugins/inputs/win_services/win_services_notwindows.go
@@ -0,0 +1,3 @@
// +build !windows

package win_services
101 changes: 101 additions & 0 deletions plugins/inputs/win_services/win_services_test.go
@@ -0,0 +1,101 @@
// +build windows

//this test must be run under administrator account
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still required?

package win_services

import (
"github.com/influxdata/telegraf/testutil"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"golang.org/x/sys/windows/svc/mgr"
"testing"
)

var InvalidServices = []string{"XYZ1@", "ZYZ@", "SDF_@#"}
var KnownServices = []string{"LanmanServer", "TermService"}

func TestList(t *testing.T) {
services, err := listServices(KnownServices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to add any tests that require special permissions or certain services to running (unless it can also set them up). It would be better to use an interface with a test version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is the best goal here.

I can easily mock listServices to test the Gather method, but to properly test listServices function I will need to mock Mgr and Service from mgr package. As there is type dependency, as Mgr.OpenService returns mgr.Service, I can not just create an interface which will be satisfied by existing structs in mgr package and their methods.
I would need to create proxies of both to satisfy new interface in real implementation and this will complicate the code. Did you mean this?
Direct API calls are always harder co mock than RPC calls most current plugins use, but I don't mean I don't want to do it, in fact, it will be fun.. ;),

Cause current tests use services available on every Windows edition. I would rather create a special for test purposes, but that would be more E2E test and that's probably not suitable for such test suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chronograf uses a nice pattern for mocks that can handle this. I wrote an example and added it to this gist https://gist.github.com/danielnelson/79908f0bc7145d3feb7a91e0ef56d88c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I meant.

You have to create on struct (proxy) that satisfies interface and redirect call to real implementation,
a lot boilerplate code .

But if it has to be that way...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to have unit tests. The integration tests can remain but should be guarded with a testing.Short() test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
What is the best practise to store unit tests and integration test. I would separate them into two files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have win_services_test.go and win_services_integration_test.go.

require.NoError(t, err)
assert.Len(t, services, 2, "Different number of services")
assert.Equal(t, services[0].ServiceName, KnownServices[0])
assert.Nil(t, services[0].Error)
assert.Equal(t, services[1].ServiceName, KnownServices[1])
assert.Nil(t, services[1].Error)
}

func TestEmptyList(t *testing.T) {
services, err := listServices([]string{})
require.NoError(t, err)
assert.Condition(t, func() bool { return len(services) > 20 }, "Too few service")
}

func TestListEr(t *testing.T) {
services, err := listServices(InvalidServices)
require.NoError(t, err)
assert.Len(t, services, 3, "Different number of services")
for i := 0; i < 3; i++ {
assert.Equal(t, services[i].ServiceName, InvalidServices[i])
assert.NotNil(t, services[i].Error)
}
}

func TestGather(t *testing.T) {
ws := &Win_Services{KnownServices}
assert.Len(t, ws.ServiceNames, 2, "Different number of services")
var acc testutil.Accumulator
require.NoError(t, ws.Gather(&acc))
assert.Len(t, acc.Errors, 0, "There should be no errors after gather")

for i := 0; i < 2; i++ {
fields := make(map[string]interface{})
tags := make(map[string]string)
si := getServiceInfo(KnownServices[i])
fields["state"] = int(si.State)
fields["startup_mode"] = int(si.StartUpMode)
tags["service_name"] = si.ServiceName
tags["display_name"] = si.DisplayName
acc.AssertContainsTaggedFields(t, "win_services", fields, tags)
}

}

func TestGatherErrors(t *testing.T) {
ws := &Win_Services{InvalidServices}
assert.Len(t, ws.ServiceNames, 3, "Different number of services")
var acc testutil.Accumulator
require.NoError(t, ws.Gather(&acc))
assert.Len(t, acc.Errors, 3, "There should be 3 errors after gather")
}

func getServiceInfo(srvName string) *ServiceInfo {

scmgr, err := mgr.Connect()
if err != nil {
return nil
}
defer scmgr.Disconnect()

srv, err := scmgr.OpenService(srvName)
if err != nil {
return nil
}
var si ServiceInfo
si.ServiceName = srvName
srvStatus, err := srv.Query()
if err == nil {
si.State = int(srvStatus.State)
} else {
si.Error = err
}

srvCfg, err := srv.Config()
if err == nil {
si.DisplayName = srvCfg.DisplayName
si.StartUpMode = int(srvCfg.StartType)
} else {
si.Error = err
}
srv.Close()
return &si
}