Skip to content

Latest commit

 

History

History
315 lines (279 loc) · 19.2 KB

File metadata and controls

315 lines (279 loc) · 19.2 KB

SignalFx Metrics Exporter

Status
Stability beta: traces, metrics, logs
Distributions contrib
Issues Open issues Closed issues
Code Owners @dmitryax, @crobert-1

This exporter can be used to send metrics, events, and trace correlation to SignalFx.

Apart from metrics, the exporter is also capable of sending metric metadata (properties and tags) to SignalFx. Currently, only metric metadata updates from the k8s_cluster receiver are supported.

Metrics Configuration

The following configuration options are required:

  • access_token (no default): The access token is the authentication token provided by Splunk Observability Cloud. The access token can be obtained from the web app. For details on how to do so please refer the documentation here.
  • Either realm or both api_url and ingest_url. Both api_url and ingest_url take precedence over realm.
    • realm (no default): SignalFx realm where the data will be received.
    • api_url (no default): Destination to which properties and tags are sent. If realm is set, this option is derived and will be https://api.{realm}.signalfx.com. If a value is explicitly set, the value of realm will not be used in determining api_url. The explicit value will be used instead.
    • ingest_url (no default): Destination where SignalFx metrics are sent. If realm is set, this option is derived and will be https://ingest.{realm}.signalfx.com. If a value is explicitly set, the value of realm will not be used in determining ingest_url. The explicit value will be used instead. The exporter will automatically append the appropriate path: "/v2/datapoint" for metrics, and "/v2/event" for events.

The following configuration options can also be configured:

  • access_token_passthrough: (default = true) Whether to use "com.splunk.signalfx.access_token" metric resource attribute, if any, as the SignalFx access token. In either case this attribute will be dropped during final translation, in this exporter only. Intended to be used in tandem with identical configuration option for SignalFx receiver to preserve datapoint origin for only this exporter, as others will reveal the organization access token by not filtering the attribute.
  • exclude_metrics: List of metric filters that will determine metrics to be excluded from sending to Signalfx backend. The filtering is applied after the default translations controlled by disable_default_translation_rules option. See here for examples. Apart from the values explicitly provided via this option, by default, these are also appended to this list. Setting this option to [] will override all the default excludes.
  • include_metrics: List of filters to override exclusion of any metrics. This option can be used to included metrics that are otherwise dropped by default. See here for a list of metrics that are dropped by default. For example, the following configuration can be used to send through some of that are dropped by default.
    include_metrics:
      # When sending in translated metrics.
      - metric_names: [cpu.interrupt, cpu.user, cpu.system]
      # When sending in metrics in OTel convention.
      - metric_name: system.cpu.time
        dimensions:
          state: [interrupt, user, system]
  • log_data_points (default = false): If the log level is set to debug and this is true, all datapoints dispatched to Splunk Observability Cloud will be logged
  • log_dimension_updates (default = false): Whether or not to log dimension updates.
  • disable_default_translation_rules (default = false): Disable default translation of the OTel metrics to a SignalFx compatible format. The default translation rules are defined in translation/constants.go.
  • timeout (default = 10s): Amount of time to wait for a send operation to complete.
  • http2_read_idle_timeout (default = 10s): Send a ping frame for a health check if the connection has been idle for the configured value. 0s means http/2 health check will be disabled.
  • http2_ping_timeout (default = 10s): Triggered by http2_read_idle_timeout; When there's no response to the ping within the configured value, the connection will be closed. If this value is set to 0, it will default to 15s.
  • headers (no default): Headers to pass in the payload.
  • max_idle_conns (default = 100): The maximum idle HTTP connections the client can keep open.
  • max_idle_conns_per_host (default = 100): The maximum idle HTTP connections the client can keep open per host.
  • idle_conn_timeout (default = 30s): The maximum amount of time an idle connection will remain open before closing itself.
  • More HTTP settings are available, see HTTP settings.
  • sync_host_metadata: Defines whether the exporter should scrape host metadata and send it as property updates to SignalFx backend. Disabled by default. IMPORTANT: Host metadata synchronization relies on resourcedetection processor. If this option is enabled make sure that resourcedetection processor is enabled in the pipeline with one of the cloud provider detectors or environment variable detector setting a unique value to host.name attribute within your k8s cluster. And keep override=true in resourcedetection config.
  • exclude_properties: A list of property filters to limit dimension update content. Property filters can contain any number of the following fields, supporting (negated) string literals, re2 /regex/, and glob syntax values: dimension_name, dimension_value, property_name, and property_value. For any field not expressly configured for each filter object, a default catch-all value of /^.*$/ is used to allow each specified field to require a match for the filter to take effect:
    # will filter all 'k8s.workload.name' properties from 'k8s.pod.uid' dimension updates:
    exclude_properties:
      - dimension_name: k8s.pod.uid
        property_name: k8s.workload.name
  • dimension_client: Contains options controlling the dimension update client configuration used for metadata updates.
    • max_buffered: (default = 10,000): The buffer size for queued dimension updates.
    • send_delay (default = 10s): The time to wait between dimension updates for a given dimension.
    • max_idle_conns (default = 20): The maximum idle HTTP connections the client can keep open.
    • max_idle_conns_per_host (default = 20): The maximum idle HTTP connections the client can keep open per host.
    • max_conns_per_host (default = 20): The maximum total number of connections the client can keep open per host.
    • idle_conn_timeout (default = 30s): The maximum amount of time an idle connection will remain open before closing itself.
    • timeout (default = 10s): Amount of time to wait for the dimension HTTP request to complete.
  • nonalphanumeric_dimension_chars: (default = "_-.") A string of characters that are allowed to be used as a dimension key in addition to alphanumeric characters. Each nonalphanumeric dimension key character that isn't in this string will be replaced with a _.
  • ingest_tls: (no default) exposes a list of TLS settings to establish a secure connection with signafx receiver configured on another collector instance.
    • ca_file needs to be set if the exporter's ingest_url is pointing to a signalfx receiver with TLS enabled and using a self-signed certificate where its CA is not loaded in the system cert pool. Full list of TLS options can be found in the configtls README The following example instructs the signalfx exporter ingest client to use a custom ca_file to verify the server certificate.
    ingest_tls:
        ca_file: "/etc/opt/certs/ca.pem"
  • api_tls: (no default) exposes a list of TLS settings to establish a secure connection with http_forwarder extension configured on another collector instance.
    • ca_file needs to be set if the exporter's api_url is pointing to a http_forwarder extension with TLS enabled and using a self-signed certificate where its CA is not loaded in the system cert pool. Full list of TLS options can be found in the configtls README The following example instructs the signalfx exporter api client to use a custom ca_file to verify the server certificate.
    api_tls:
        ca_file: "/etc/opt/certs/ca.pem"
  • drop_histogram_buckets: (default = false) if set to true, histogram buckets will not be translated into datapoints with _bucket suffix but will be dropped instead, only datapoints with _sum, _count, _min (optional) and _max (optional) suffixes will be sent. Please note that this option does not apply to histograms sent in OTLP format with send_otlp_histograms enabled.
  • send_otlp_histograms: (default: false) if set to true, any histogram metrics receiver by the exporter will be sent to Splunk Observability backend in OTLP format without conversion to SignalFx format. This can only be enabled if the Splunk Observability environment (realm) has the new Histograms feature rolled out. Please note that histograms sent in OTLP format do not apply to the exporter configurations include_metrics and exclude_metrics. In addition, this exporter offers queued retry which is enabled by default. Information about queued retry configuration parameters can be found here.

Traces Configuration (correlation only)

⚠️ Note that traces must still be sent in using sapmexporter to see them in SignalFx.

When traces are sent to the signalfx exporter it correlates traces to metrics. When a new service or environment is seen it associates the source (e.g. host or pod) to that service or environment in SignalFx. Metrics can then be filtered based on that trace service and environment (sf_service and sf_environment).

One of realm and api_url are required.

  • access_token (required, no default): The access token is the authentication token provided by SignalFx.
  • realm (no default): SignalFx realm where the data will be received.
  • api_url (default = https://api.{realm}.signalfx.com/): Destination to which correlation updates are sent. If a value is explicitly set, the value of realm will not be used in determining api_url. The explicit value will be used instead.
  • correlation Contains options controlling the syncing of service and environment properties onto dimensions.
    • endpoint (required, default = api_url or https://api.{realm}.signalfx.com/): This is the base URL for API requests (e.g. https://api.us0.signalfx.com).
    • timeout (default = 5s): Is the timeout for every attempt to send data to the backend.
    • stale_service_timeout (default = 5 minutes): How long to wait after a span's service name is last seen before uncorrelating it.
    • max_requests (default = 20): Max HTTP requests to be made in parallel.
    • max_buffered (default = 10,000): Max number of correlation updates that can be buffered before updates are dropped.
    • max_retries (default = 2): Max number of retries that will be made for failed correlation updates.
    • log_updates (default = false): Whether or not to log correlation updates to dimensions (at DEBUG level).
    • retry_delay (default = 30 seconds): How long to wait between retries.
    • cleanup_interval (default = 1 minute): How frequently to purge duplicate requests.
    • sync_attributes (default = {"k8s.pod.uid": "k8s.pod.uid", "container.id": "container.id"}) Map containing key of the attribute to read from spans to sync to dimensions specified as the value.

Default Metric Filters

List of metrics excluded by default

Some OpenTelemetry receivers may send metrics that SignalFx considers to be categorized as custom metrics. In order to prevent unwanted overage usage due to custom metrics from these receivers, the SignalFx exporter has a set of metrics excluded by default. Some exclusion rules use regex to exclude multiple metric names. Some metrics are only excluded if specific resource labels (dimensions) are present. If translation_rules are configured and new metrics match a default exclusion, the new metric will still be excluded. Users may configure the SignalFx exporter's include_metrics config option to override the any of the default exclusions, as include_metrics will always take precedence over any exclusions. An example of include_metrics is shown below.

exporters:
  signalfx:
    include_metrics:
      - metric_names: [cpu.interrupt, cpu.user, cpu.system]
      - metric_name: system.cpu.time
        dimensions:
          state: [interrupt, user, system]

The following include_metrics example would instruct the exporter to send only cpu.interrupt metrics with a cpu dimension value ("per core" datapoints), and both "per core" and aggregate cpu.idle metrics:

exporters:
  signalfx:
    include_metrics:
      - metric_name: "cpu.idle"
      - metric_name: "cpu.interrupt"
        dimensions:
          cpu: ["*"]

Translation Rules and Metric Transformations

The translation_rules metrics configuration field accepts a list of metric-transforming actions to help ensure compatibility with custom charts and dashboards when using the OpenTelemetry Collector. It also provides the ability to produce custom metrics by copying, calculating new, or aggregating other metric values without requiring an additional processor. The rule language is expressed in yaml mappings and is documented here. Translation rules currently allow the following actions:

  • aggregate_metric - Aggregates a metric through removal of specified dimensions
  • calculate_new_metric - Creates a new metric via operating on two consistuent ones
  • convert_values - Convert float values to int or int to float for specified metric names
  • copy_metrics - Creates a new metric as a copy of another
  • delta_metric - Creates a new delta metric for a specified non-delta one
  • divide_int - Scales a metric's integer value by a given factor
  • drop_dimensions - Drops dimensions for specified metrics, or globally
  • drop_metrics - Drops all metrics with a given name
  • multiply_float - Scales a metric's float value by a given float factor
  • multiply_int - Scales a metric's int value by a given int factor
  • rename_dimension_keys - Renames dimensions for specified metrics, or globally
  • rename_metrics - Replaces a given metric name with specified one
  • split_metric - Splits a given metric into multiple new ones for a specified dimension

The translation rules defined in translation/constants.go are used by default for this value. The default rules will create the following aggregated metrics from the hostmetrics receiver:

  • cpu.idle
  • cpu.interrupt
  • cpu.nice
  • cpu.num_processors
  • cpu.softirq
  • cpu.steal
  • cpu.system
  • cpu.user
  • cpu.utilization
  • cpu.utilization_per_core
  • cpu.wait
  • disk.summary_utilization
  • disk.utilization
  • disk_ops.pending
  • disk_ops.total
  • memory.total
  • memory.utilization
  • network.total
  • process.cpu_time_seconds
  • system.disk.io.total
  • system.disk.operations.total
  • system.network.io.total
  • system.network.packets.total
  • vmpage_io.memory.in
  • vmpage_io.memory.out
  • vmpage_io.swap.in
  • vmpage_io.swap.out

In addition to the aggregated metrics, the default translation rules make available the following "per core" custom hostmetrics. The CPU number is assigned to the dimension cpu

  • cpu.interrupt
  • cpu.nice
  • cpu.softirq
  • cpu.steal
  • cpu.system
  • cpu.user
  • cpu.wait

These metrics are intended to be reported directly to Splunk IM by the SignalFx exporter. Any desired changes to their attributes or values should be made via additional translation rules or from their constituent host metrics.

Example Config

exporters:
  signalfx:
    access_token: <replace_with_actual_access_token>
    access_token_passthrough: true
    headers:
      added-entry: "added value"
      dot.test: test
    realm: us1
    timeout: 5s
    max_idle_conns: 80

⚠️ When enabling the SignalFx receiver or exporter, configure both the metrics and logs pipelines.

service:
  pipelines:
    metrics:
      receivers: [signalfx]
      processors: [memory_limiter, batch]
      exporters: [signalfx]
    logs:
      receivers: [signalfx]
      processors: [memory_limiter, batch]
      exporters: [signalfx]
    traces:
      receivers: [zipkin]
      processors: []
      exporters: [signalfx]

The full list of settings exposed for this exporter are documented here with detailed sample configurations here.

This exporter also offers proxy support as documented here.

Advanced Configuration

Several helper files are leveraged to provide additional capabilities automatically: