Skip to content
This repository has been archived by the owner on Jul 19, 2022. It is now read-only.

Indicator Document Reference

John McBride edited this page Nov 18, 2019 · 27 revisions

Indicator Documents are the data interchange format specified by the Indicator Protocol. Note that documents are uniquely identified by their product name and metadata. This implies that documents with the same name and metadata will overwrite each other. If you have multiple documents with the same product name, you must put a uniquely identifying key/value pair in their metadata field, or they will overwrite each other in the registry. Convention is to use component: bosh_job_name for BOSH components (jobs).

Now let's take a look at each field in detail.

apiVersion

[string, required]: The api version used in this document. 'indicatorprotocol.io/v1' is the only currently supported version.

apiVersion: indicatorprotocol.io/v1

kind

[string, required]: The kind of the object. For documents, this value is always IndicatorDocument to differentiate them from patches.

kind: IndicatorDocument

metadata

[map, optional]: A map containing just the key labels, following the k8s conventions for metadata.

metadata:
 labels: … // see below

metadata/labels

[map, optional]: A key-value map of metadata label information. These fields can be referenced in the promql field as $var.

metadata:
  labels:
    deployment: awesome_cluster_1
    source_id: abcd-1234

Reserved variable names: Some variable names are reserved for specific uses and should not be used as metadata keys. This includes:

Common metadata keys:

  • deployment: The bosh deployment name.
  • service_broker_guid: A guid that identifies a service broker. Only service broker deployments should send this key.
  • parent_service_broker_guid: A guid that identifies the service broker which created this service instance. Only service instances should send this key. Allows monitoring tools to nest broker/instance dashboards.

spec

[map, required]: The spec of this document, following the k8s API conventions. Contains the keys product, indicators, and optionally layout.

spec:
  product: … // see below
  indicators: … // see below
  layout: … //see below

spec/product

[map, required]: Information about the product

product:
  name: The Product
  version: 0.0.1

spec/product/name

[string, required]: The name of the product used to name dashboards, determine icons, etc. (e.g. rabbitmq, redis, mysql). Used along with metadata by the indicator-registry to ID and upsert this

spec/product/version

[string, required]: The product's version. Used by monitoring tools so Operators know which version they are dealing with.

spec/indicators

[array, optional]: Defines a list things that are measured in line with the SRE principles.

indicators:
- name: http_traffic
  promql: rate(http_responses_total{deployment="$deployment",source_id="my-product-source"}[1m])
  thresholds:
  - level: critical
    operator: gte
    value: 2000
  - level: warning
    operator: gte
    value: 1500
  documentation:
    title: HTTP Traffic
    description: Requests per second, per instance. This service starts to degrade in performance around 2000 requests/second.

spec/[indicator]/name

[string,required]: A unique name used for reference in the documentation block.

name: http_traffic

spec/[indicator]/promql

[string,required]: The Prometheus Query Language (PromQL) expression for producing the measurement value.

promql: rate(http_responses_total{deployment="$deployment",source_id="my-product-source"}[1m])

note: $step can be inserted into the promql range selector, like so: promql: rate(http_responses_error[$step])

spec/[indicator]/type

[string,optional]: one of [sli, kpi, other] Indicator type (default other).

type: sli

spec/[indicator]/thresholds

[array,optional]: Specifies the alerting thresholds for the indicator.

thresholds:
- level: critical
  operator: gte
  value: 2000
- level: warning
  operator: gt
  value: 1500

spec/[indicator]/[threshold]/level

[string,required]: Severity level, critical and warning produce "Red critical" and "Yellow warning" in documentation. Also impacts Indicator Status.

spec/[indicator]/[threshold]/operator

[string,required]: one of [gt,gte,eq,neq,lte,lt]

spec/[indicator]/[threshold]/value

[number,required]: Condition for meeting this threshold

spec/[indicator]/[threshold]/alert

[map, optional]: Specifies alerting configuration

alert:
  for: 5m
  step: 2m

spec/[indicator]/alert/for

[duration string, optional] Length of time thresholds need to be breached before sending an alert (default "1m")

spec/[indicator]/alert/step

[duration string, optional] $step value used in alerting queries (default "1m")

spec/[indicator]/presentation

[map,optional]: Specification for display of this indicator.

presentation:
  chartType: status
  currentValue: false
  frequency: 300
  labels: [source_id, deployment]
  units: seconds

spec/[indicator]/presentation/chartType

[string,optional]: The type of chart to display. Valid options are step, bar, status and quota (default step).

spec/[indicator]/presentation/currentValue

[boolean,optional]: Whether to display the latest value in place of a chart (default false).

spec/[indicator]/presentation/frequency

[number,optional]: Frequency (in seconds) of data emission, i.e. the minimum charting interval (default 0).

spec/[indicator]/presentation/labels

[array,string,optional]: A subset of labels returned from the PromQL query that are relevant in displaying this indicator (default empty).

spec/[indicator]/presentation/units

[string,optional]: The type of data that the promql returns (e.g. "bytes", "percentage", "seconds". default "").

spec/[indicator]/documentation

[map,optional]: A freeform map of documentation attributes.

documentation:
  title: HTTP Traffic
  description: Requests per second, per instance. This service starts to degrade in performance around 2000 requests/second.
  recommendedResponse: Panic! Run around in circles waving your hands.

common keys:

  • title: The human-readable title of the indicator.
  • description: A description of the indicator.
  • recommendedResponse: What to do if this indicator is in an unhealthy state.
  • thresholdNote: A note that will appear next to thresholds in documentation. Commonly used for guidance on how to determine thresholds.

spec/layout

[map, optional]: A default layout will be generated that includes all indicators if one is not provided.

layout:
  owner: The Product Team
  title: Monitoring the Product
  description: This topic explains how to monitor the health of The Product.
  sections:
  - title: Service Level Indicators
    description: Service Level Indicators monitor that key features of The Product
    indicators:
    - http_traffic
  • title [string,optional]: The top level page header.
  • description [markdown,optional]: A formatted text block that appears under the table of contents.
  • sections [array,optional]
    • title [string,optional]: The title of the section.
    • description [markdown,optional]: A formatted text block that appears under the title.
    • indicators [string array,optional]: An array of indicator names.
Clone this wiki locally