Add Windows Performance Counters Receiver #1088

james-bebbington · 2020-09-22T04:24:40Z

Windows Performance Counters Receiver Proposal

This receiver, for Windows only, will capture any system, application, or custom performance counter data from the Windows registry. This receiver will use the PDH interface to capture this data.

Note there are multiple ways to consume counter data from Windows as described here (or see a more detailed explanation here). The primary methods are:

WMI: The highest level interface for accessing "typed" metric data. I don't think this supports accessing custom counters (not 100% sure). Regardless, we want to avoid using this due to performance concerns as documented in detail here: CPU collector blocks every ~17 minutes on call to wmi.Query prometheus-community/windows_exporter#89
PDH: A low level interface for accessing performance counter data. This is a relatively straightforward to use interface that returns performance counter data in the format expected by users.
Registry Counters: The lowest level interface for accessing performance counter data. This returns raw values, often Sums, instead of rates / averages. While this is useful in some cases, this is generally difficult to use, and returns values that will be difficult for users to interpret, so is probably not appropriate for a generic receiver.

==> We will use (2), and port the existing implementation from https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiver/hostmetricsreceiver/internal/windows/pdh as described in open-telemetry/opentelemetry-collector#973.

Proposed Configuration

Option One: Build expressive syntax for generating metrics from Performance Counters.

windowsperformancecountersreceiver:
  metrics:
    # simple gauge metric populated from single counter value
    - name: iis/current_connections
      type: gauge # could consider making gauge the default since this will be the case for 90% of perf counters
      counter: "Web Service (_Total)/Current Connections"

    # counter metric that has two dimensions (one populated from the perf counter instance name & another populated from a set of perf counters)
    - name: iis/request_count
      type: counter
      instance_label: website # optional (if this value is specified, then the counter is expected to return multiple results)
      instance_filter: [ "MyWebsite", "AnotherWebsite", "_Total" ]
      counter_label: http_method
      counters:
        - counter: "Web Service (*)/Total Get Requests"
          label_value: get
        - counter: "Web Service (*)/Total Post Requests"
          label_value: post

Option Two: Build simple syntax for generating metrics from Performance Counters, and extend the Metrics Transform processor instead.

windowsperformancecountersreceiver:
  counters:
    - object: "Web Service"
      instances: [ "_Total" ]
      counters:
        - "Current Connections"
    - object: "Web Service"
      instances: [ "MyWebsite", "AnotherWebsite", "_Total" ]
      counters:
        - "Total Get Requests"
        - "Total Post Requests"

...

metricstransformprocessor:
  transforms:
    # rename "Web Service (*)/Current Connections" -> iis/current_connections
    - metric_name: "Web Service (*)/Current Connections"
      action: update
      new_name: iis/current_connections

    # combine "Web Service (*)/Total * Requests" -> iis/request_count (this is not existing functionality)
    - metric_names:
        - name: "Web Service (*)/Total Get Requests"
          label_value: get
        - name: "Web Service (*)/Total Post Requests"
          label_value: post
      action: combine
      label: http_method
      new_name: iis/request_count
      # change label instance -> website
        - action: update_label
          label: instance
          new_label: website

jrcamp · 2020-09-23T15:30:38Z

Do you think IIS would have its own dedicated receiver in the future (that wraps or leverages this one)? That way once we have docs, etc. it'll show up there? Or were you thinking of just supporting it (and the other performance counter ones) via configuration? I think the best user experience is to have a dedicated receiver for it which doesn't rely user having to add lots of special stuff to their configuration.

james-bebbington · 2020-09-23T21:48:03Z

I don't have any plans to add a receiver for IIS, etc, specifically. Specifying this in config is fine for us, and super flexible for future uses. If you think that would be useful, it could easily be added later though.

As another alternative, it could be worth investigating how config works so that we can split up config to make this tidier, rather than having one giant config file, e.g.

<config.yaml>
includes:
  - config_self.yaml
  - config_iis.yaml
  - config_sqlserver.yaml

# shared components listed here

<config_self.yaml>
pipelines:
  metrics/self:
    receivers: [ prometheus/self ]
    processors: [ metricstransformprocessor/self ]
    exporters: [ ... ]

receivers:
  prometheus/self:
    # scrape self metrics

processors:
  metricstransformprocessor/self:
    ...

<config_iis.yaml>
pipelines:
  metrics/iis:
    receivers: [ windowsperformancecountersreceiver/iis ]
    processors: [ metricstransformprocessor/iis ]
    exporters: [ ... ]

receivers:
  windowsperformancecountersreceiver/iis:
    ...

processors:
  metricstransformprocessor/iis:
    ...

<config_sqlserver.yaml>
pipelines:
  metrics/sqlserver:
    receivers: [ windowsperformancecountersreceiver/sqlserver ]
    processors: [ metricstransformprocessor/sqlserver ]
    exporters: [ ... ]

receivers:
  windowsperformancecountersreceiver/sqlserver:
    ...

processors:
  metricstransformprocessor/sqlserver:
    ...

I looked into this briefly a while ago. Viper does support loading config from multiple files, but with some limitations (arrays are replaced, not merged). It also doesn't support includes; if the following PR was merged it would allow us to do the suggestion above: spf13/viper#893

jrcamp · 2020-09-23T22:00:38Z

One thing I've been thinking about for a while is a concept of "scoped processors". I think sometimes it's confusing to have to do filtering or renaming in a separate processor section away from the receiver that's collecting it. What if any receiver could configure processors within it? For example (calling option #3):

windowsperformancecounters:
  counters:
    - object: "Web Service"
      instances: [ "_Total" ]
      counters:
        - "Current Connections"
    - object: "Web Service"
      instances: [ "MyWebsite", "AnotherWebsite", "_Total" ]
      counters:
        - "Total Get Requests"
        - "Total Post Requests"

  processors:
    - metricstransform:
        transforms:
          # rename "Web Service (*)/Current Connections" -> iis/current_connections
          - metric_name: "Web Service (*)/Current Connections"
            action: update
            new_name: iis/current_connections

I don't think this is as good in terms of UX as Option #1 but better than Option #2. It would also work well if we were able to split up configs like you showed above. However most people will probably either a) be using a wrapped receiver (which is why I wanted to ask the above question first) or b) pre-canned configs provided to them. Given that I probably slightly prefer #2 or #3 due to being able to reuse metricstransform processor logic. And I prefer #3 because people can copy/include just 1 section of config rather than having to copy a receiver section and a processor section separately.

james-bebbington · 2020-09-25T10:14:18Z

That is an interesting proposal. Note I updated my example above to include processors as well to show that they can still be described in the same block / file as the related receiver.

Having processors "scoped" to a receiver doesn't seem all that different from having a separate pipeline for each receiver, so I feel that (2) & (3) are quite similar. But being able to separate a giant config across multiple files seems quite useful.

james-bebbington · 2020-10-12T07:01:00Z

Based on feedback, I went with Option 2. We could extend this to Option 3 in the future if desired, but I will leave that as possible future work.

So the next question is how to best implement the combine operation in the Metrics transform processor? My initial thought was to configure this as described in the issue description, but that config is very specific to this operation.

A better alternative is proposed below:

Step 1: Change the metric_name filter to be a filterset. This will apply to all existing transforms, meaning you can do things like:

# update label keys and values for all system.network metrics
- metric_names: ^system\.network\.
  match_type: regexp
  action: update
  operations:
    # change label interface -> device
    - action: update_label
      label: interface
      new_label: device
    # change direction label values receive -> rx
    - action: update_label
      label: direction
      value_actions:
        # receive -> rx
        - value: receive
          new_value: rx

Step 2: Add a combine action:

# combine "Web Service (*)/Total .* Requests" -> iis/request_count
- metric_names: ^Web Service (*)/Total .* Requests$
  match_type: regexp
  action: combine
  new_name: iis/request_count # mandatory if action = combine (or new)
  combine_labels: # mandatory if action = combine 
    - name: "Web Service (*)/Total Get Requests"
      labels: { http_method: get }
    - name: "Web Service (*)/Total Post Requests"
      label_value: { http_method: post }
    # could add a fallback option here; for now will ignore + report if any matched metrics are not in this list
  operations:
    # change label instance -> website
    - action: update_label
      label: instance
      new_label: website

^ Note combine differs from merge, which is another future transform that might be desired (i.e. merge / aggregate data points across metrics).

…#1211) Adds skeleton for Windows Performance Counters Receiver with very basic initial functionality that can read simple counters that do not include any instance into a Gauge metric - see the added README.md for more details. **Link to tracking Issue:** #1088 Note the scaffolding code makes use of the new `receiverhelper` & `scraperhelper` functions added in the Core PRs referenced below. **Depends on:** #1175, open-telemetry/opentelemetry-collector#1886, open-telemetry/opentelemetry-collector#1890

james-bebbington · 2020-11-03T06:28:34Z

A much more simplified variant of the second option is to just use regexp substitution to specify label names, i.e. the above becomes:

# combine "Web Service (*)/Total .* Requests" -> iis/request_count
- metric_names: ^Web Service \(\*\)/Total (?P<http_method>.*) Requests$
  match_type: regexp
  action: combine
  new_name: iis/request_count
  operations:
    # change label instance -> website
    - action: update_label
      label: instance
      new_label: website

I will start with this and we can add more complex alternatives in the future if needed.

The failing test was introduced I believe by a change in behaviour noted in this issue: getmoto/moto#3292. The solution is to set a region in the create_bucket call. Fixes #1088

* Use a shorter timeout for AWS EC2 metadata requests Fix #1088 According to the docs, the value for `timeout` is in seconds: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen. 1000 seconds seems slow and in some cases can block the startup of the program being instrumented (see #1088 as an example), because the request will hang indefinitely in non-AWS environments. Using a much shorter 1 second timeout seems like a reasonable workaround for this. * add changelog entry for timeout change * use 5s timeout for ECS and EKS, update changelog Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>

james-bebbington added the feature request label Sep 22, 2020

james-bebbington mentioned this issue Sep 22, 2020

Deprecate third party code from host metrics receiver open-telemetry/opentelemetry-collector#973

Closed

This was referenced Oct 2, 2020

1/3 Port PDH helpers from Core #1175

Merged

2/3 Add Windows Performance Counters Receiver skeleton implementation #1211

Merged

3/3 Extend Windows Performance Counters Receiver to support counters with instances #1229

Merged

james-bebbington mentioned this issue Oct 13, 2020

Improve filtering configuration with globs and make match_type optional open-telemetry/opentelemetry-collector#1081

Closed

james-bebbington mentioned this issue Oct 20, 2020

Proposal: Metrics Label Processing Solutions open-telemetry/opentelemetry-collector#1892

Closed

james-bebbington mentioned this issue Nov 2, 2020

Add metric filter option to metrics transform processor #1447

Merged

james-bebbington mentioned this issue Nov 6, 2020

Add combine action #1506

Merged

bogdandrutu closed this as completed in #1506 Nov 17, 2020

andrewhsu added the enhancement New feature or request label Jan 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Windows Performance Counters Receiver #1088

Add Windows Performance Counters Receiver #1088

james-bebbington commented Sep 22, 2020 •

edited

jrcamp commented Sep 23, 2020

james-bebbington commented Sep 23, 2020 •

edited

jrcamp commented Sep 23, 2020 •

edited

james-bebbington commented Sep 25, 2020

james-bebbington commented Oct 12, 2020 •

edited

james-bebbington commented Nov 3, 2020 •

edited

Add Windows Performance Counters Receiver #1088

Add Windows Performance Counters Receiver #1088

Comments

james-bebbington commented Sep 22, 2020 • edited

Windows Performance Counters Receiver Proposal

Proposed Configuration

jrcamp commented Sep 23, 2020

james-bebbington commented Sep 23, 2020 • edited

jrcamp commented Sep 23, 2020 • edited

james-bebbington commented Sep 25, 2020

james-bebbington commented Oct 12, 2020 • edited

james-bebbington commented Nov 3, 2020 • edited

james-bebbington commented Sep 22, 2020 •

edited

james-bebbington commented Sep 23, 2020 •

edited

jrcamp commented Sep 23, 2020 •

edited

james-bebbington commented Oct 12, 2020 •

edited

james-bebbington commented Nov 3, 2020 •

edited