Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an input plugin to monitor basic info of Windows Services #3023

Merged
merged 48 commits into from Aug 7, 2017

Conversation

vlastahajek
Copy link
Contributor

Info

Input plugin to report Windows services info: service name, display name, state, startup mode

Tested on Windows Server 2008 R2 Enterprise, Windows 10 Enterprise

It requires Telegraf must run under the administrator privileges

Readme includes sample TICK script

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

vlastahajek and others added 28 commits July 10, 2017 15:08
Measurement name made configurable
Error service is indicated with state -1
- snake case names of fields/tag
- simplified configuration
- state and startup mode as string
- better error reporting
This reverts commit 3b4df68.
- Better script alert message
It produces:
```
* Plugin: inputs.win_services, Collection 1
> win_services,host=WIN2008R2H401,display_name=Server,service_name=LanmanServer state="running",startup_mode="auto_start" 15 00040669000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a stray space in 15 00040669000000000

```
It produces:
```
* Plugin: inputs.win_services, Collection 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove everything in this section up through this line, so the only items are the example line protocol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, np.
I made it this way cause it's done so in the example plugins you advised me (ngnix, kapacitor).

"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/plugins/inputs"
"golang.org/x/sys/windows/svc/mgr"
"fmt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you are using go fmt or the tests won't pass.

]
`

var description = "Input plugin to report Windows services info: service name, display name, state, startup mode"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mention the specific fields, this will be enough: "Input plugin to report Windows services info"

serviceInfos[i].DisplayName = srvCfg.DisplayName
serviceInfos[i].StartUpMode = int(srvCfg.StartType)
} else {
serviceInfos[i].Error = fmt.Errorf("Could not get config of service '%s': %s", srvName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If either Query or Config fail the ServiceInfo could have zero values on some of the fields. This will cause tags/fields to be empty strings which we don't want. I would append the ServiceInfo only if there are no errors.

For error handling this will probably require you to pass the accumulator into this function. Also, you probably only want to report the first error per service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to the issue above. A weird state value will reported as an error, but for example empty display name is a valid value, as Windows API allows this.

var KnownServices = []string{"LanmanServer", "TermService"}

func TestList(t *testing.T) {
services, err := listServices(KnownServices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to add any tests that require special permissions or certain services to running (unless it can also set them up). It would be better to use an interface with a test version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is the best goal here.

I can easily mock listServices to test the Gather method, but to properly test listServices function I will need to mock Mgr and Service from mgr package. As there is type dependency, as Mgr.OpenService returns mgr.Service, I can not just create an interface which will be satisfied by existing structs in mgr package and their methods.
I would need to create proxies of both to satisfy new interface in real implementation and this will complicate the code. Did you mean this?
Direct API calls are always harder co mock than RPC calls most current plugins use, but I don't mean I don't want to do it, in fact, it will be fun.. ;),

Cause current tests use services available on every Windows edition. I would rather create a special for test purposes, but that would be more E2E test and that's probably not suitable for such test suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chronograf uses a nice pattern for mocks that can handle this. I wrote an example and added it to this gist https://gist.github.com/danielnelson/79908f0bc7145d3feb7a91e0ef56d88c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I meant.

You have to create on struct (proxy) that satisfies interface and redirect call to real implementation,
a lot boilerplate code .

But if it has to be that way...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to have unit tests. The integration tests can remain but should be guarded with a testing.Short() test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
What is the best practise to store unit tests and integration test. I would separate them into two files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have win_services_test.go and win_services_integration_test.go.

tags["service_name"] = service.ServiceName

fields["state"] = ServiceStatesMap[service.State]
fields["startup_mode"] = ServiceStartupModeMap[service.StartUpMode]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do something smart if the value returned is not in the map, probably return an error and skip the service. This could be tested earlier on in listServices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussion in the issue leads to using numbers I will use directly what I get from Windows API and report unusual values as error, ok?

@@ -0,0 +1,76 @@
# Telegraf Plugin: win_services
Input plugin to report Windows services info: service name, display name, state, startup mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't list the tags/fields here, so we won't need to keep this updated if we add anything in the future.

@vlastahajek
Copy link
Contributor Author

I implemented all changes except changing the testing approach.
I would rather have a short call, or so, on this topic..

if err == nil {
state := int(srvStatus.State)
if !checkState(state) {
serviceInfo.Error = fmt.Errorf("Uknown state of Service %s: %d", serviceName, state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in Uknown -> Unknown

return
}

//returns true of state is in valid range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't super important to me, but go has the convention of starting the comment with the name of the function. https://blog.golang.org/godoc-documenting-go-code, ex:

// checkState returns true if state is valid

tags := make(map[string]string)

tags["display_name"] = service.DisplayName
tags["service_name"] = service.ServiceName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags should not contain the empty string, since InfluxDB will not accept them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the best practice for storing a measurement in case of a tag has empty value?

  1. Skip the tag? Store just the service_name tag and fields?
  2. Insert a place holder text, e.g. "" or "<empty_display_name>"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either skip the tag or omit the entire point, depending on if the field needs to be required. For display name I would skip, since it is optional. For service name I would omit the point.

return serviceInfos, nil
}

func collectServiceInfo(scmgr *mgr.Mgr, serviceName string) (serviceInfo ServiceInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend returning (ServiceInfo, error) then you can remove the Error field from ServiceInfo.

serviceInfos := make([]ServiceInfo, len(serviceNames))

for i, srvName := range serviceNames {
serviceInfos[i] = collectServiceInfo(scmgr, srvName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest way to deal with zero value tags/fields IMO is to return an (ServiceInfo, error) and append if err != nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then need to propagate a service error to Gather, or pass Accumulator here.
I want to keep the collecting info separated from plugin API.

The ServiceInfo structure is a domain model of a service, any error occurred during the collecting of service info is a property of such model.

This allows future enhancement in case a user will want to record also service errors into db. Because I still think that if user wants to monitor a particular service and on some hosts it is not possible due to an error, user has to look into the telegraf log instead into the db.

But maybe not, we will see.


//While getting service info there could a theoretically a lot of errors on different places.
//However in reality if there is a problem with a service then usually openService fails and if it passes, other calls will most probably be ok
//So, following error checking is just for sake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment, basically applies to almost any error :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network, or more generally communication, errors are for example more probable than subsequent errors from Windows Service API. I still think that checking valid ranges of state and startup_mode is not necessary cause if invalid value would be return than the function will return error anyway.

This comment was intended to explain this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get rid of the checkState logic since we are now operating on ints, but we need to check all errors from the external api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, checking errors of functions is must, no doubts.


var description = "Input plugin to report Windows services info."

type Win_Services struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this WinServices

var KnownServices = []string{"LanmanServer", "TermService"}

func TestList(t *testing.T) {
services, err := listServices(KnownServices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chronograf uses a nice pattern for mocks that can handle this. I wrote an example and added it to this gist https://gist.github.com/danielnelson/79908f0bc7145d3feb7a91e0ef56d88c

@vlastahajek
Copy link
Contributor Author

vlastahajek commented Aug 4, 2017

Hi Daniel,
Thanks again for your review.
I've completed almost all required changes, except those commented.
Please, review, for the last time, hopefully ;)
Thanks,
Vlasta

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this plugin to the test-windows makefile target? You might need to rebase since we have modified it recently.

@@ -0,0 +1,180 @@
// +build windows

//this test must be run under administrator account
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still required?

@vlastahajek
Copy link
Contributor Author

Both done

@danielnelson danielnelson merged commit e21f2de into influxdata:master Aug 7, 2017
@vlastahajek vlastahajek deleted the vh-win-services branch August 8, 2017 07:15
maxunt pushed a commit that referenced this pull request Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants