Skip to content

Latest commit

 

History

History
1178 lines (935 loc) · 26 KB

hashicorp-consul-monitoring-integration.mdx

File metadata and controls

1178 lines (935 loc) · 26 KB
title tags metaDescription redirects freshnessValidatedDate
HashiCorp Consul monitoring integration
Integrations
On-host integrations
On-host integrations list
New Relic's HashiCorp integration: what data it reports and how to configure it.
/docs/integrations/host-integrations/host-integrations-list/hashicorp-consul-monitoring-integration
/docs/consul-integration-new-relic-infrastructure
/docs/integrations/host-integrations/host-integrations-list/consul-monitoring-integration
never

Our HashiCorp Consul on-host integration collects and sends inventory and metrics from your Consul data center environment to New Relic, where you can see the health of your environment. We collect data on both the data center and agent/node levels.

Note that we also have a [HashiCorp Cloud Platform Consul integration](/docs/infrastructure/host-integrations/host-integrations-list/hashicorp-consul-monitoring-integration).

Compatibility and requirements [#comp-req]

Before installation, ensure you meet these requirements:

  • Our integration is compatible with HashiCorp Consul 1.0 or higher.
  • If you're using ACL, the credentials for the Consul integration must have the policies agent:read, node:read, and service:read.

Quick start [#quick]

Instrument your Consul environment quickly and send your telemetry data with guided install. Our guided install uses our infrastructure agent and our CLI to set up the HashiCorp integration, and discovers other applications and log sources running in your environment and then recommends which ones you should instrument.

The guided install works with most setups. But if it doesn't suit your needs, there are other install options below.

Ready to get started? Click the relevant button, depending on which data center region you use. When you're done with the install, return to this documentation to review the configuration options.

Guided install, US region

<ButtonLink role="button" to="https://one.eu.newrelic.com/launcher/nr1-core.explorer?pane=eyJuZXJkbGV0SWQiOiJucjEtY29yZS5saXN0aW5nIn0=&cards[0]=eyJuZXJkbGV0SWQiOiJucjEtaW5zdGFsbC1uZXdyZWxpYy5ucjEtaW5zdGFsbC1uZXdyZWxpYyIsInBhdGgiOiJvaGkiLCJyZWNpcGVOYW1lIjoiYXBhY2hlLW9wZW4tc291cmNlLWludGVncmF0aW9uIiwiYWN0aXZlQ29tcG9uZW50IjoiVlRTT0NvbW1hbmQiLCJhY3RpdmVFbnZpcm9ubWVudCI6ImFwYWNoZS1vcGVuLXNvdXJjZS1pbnRlZ3JhdGlvbiJ9" variant="primary"

Guided install, EU region

Install [#install]

To install the HashiCorp Consul integration, follow the instructions for your environment:

See [Monitor service running on ECS](/docs/integrations/host-integrations/host-integrations-list/monitor-services-running-amazon-ecs).

{ ' ' }

<Collapser id="k8s-install" title="Kubernetes"

See [Monitor service running on
Kubernetes](/docs/monitor-service-running-kubernetes).

<Collapser id="linux-install" title="Linux"

1. Install [the infrastructure agent](/docs/integrations/host-integrations/installation/install-infrastructure-host-integrations/#install), and replace the `INTEGRATION_FILE_NAME` variable with `nri-consul`.
2. Change directory to the integrations folder:

   ```
   cd /etc/newrelic-infra/integrations.d
   ```
3. Copy the sample configuration file:

   ```
   sudo cp consul-config.yml.sample consul-config.yml
   ```
4. Edit the `consul-config.yml` file as described in the [configuration settings](#config).

<Collapser id="windows-install" title="Windows"

1. Download the `nri-consul` .MSI installer image from:

   [https://download.newrelic.com/infrastructure_agent/windows/integrations/nri-consul/nri-consul-amd64.msi](https://download.newrelic.com/infrastructure_agent/windows/integrations/nri-consul/nri-consul-amd64.msi)
2. To install from the Windows command prompt, run:

   ```
   msiexec.exe /qn /i PATH\TO\nri-consul-amd64.msi
   ```
3. In the Integrations directory, `C:\Program Files\New Relic\newrelic-infra\integrations.d\`, create a copy of the sample configuration file by running:

   ```
   cp consul-config.yml.sample consul-config.yml
   ```
4. Edit the `consul-config.yml` file as described in the [configuration settings](#config).

Update your integration [#update]

This integration doesn't automatically update. For best results, regularly update the integration package and the infrastructure agent.

Post-installation tasks [#after-install]

When you're done with the install, you can set configuration options. Some configurations are required to get the integration to work, while some are optional.

Configure the integration [#config]

If you enabled this integration via our ECS or Kubernetes integration, see those docs:

For the standard on-host installation, this integration comes with a YAML config file, apache-config.yml. This configuration is where you can place required login credentials and configure how data is collected. Which options you change depends on your setup and preferences. It comes with a sample config file apache-config.yml.sample that you can copy and edit.

Specific settings related to Consul are defined using the env section of the configuration file. These settings control the connection to your Consul instance as well as other security settings and features.

If you are still using our legacy configuration/definition files, please refer to this [document](/docs/create-integrations/infrastructure-integrations-sdk/specifications/host-integrations-standard-configuration-format/) for help.

Consul configuration options [#config-options]

The Consul integration collects both metrics and inventory information. This table shows what each configuration option applies to.

  <th>
    Description
  </th>

  <th>
    Default
  </th>

  <th>
    Applies to
  </th>
</tr>
  <td>
    Hostname or IP where Consul is running.
  </td>

  <td>
    localhost
  </td>

  <td style={{ "text-align": "center" }}>
    M/I
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**PORT**</DNT>
  </td>

  <td>
    Port on which Consul is listening.
  </td>

  <td>
    8500
  </td>

  <td style={{ 'text-align': 'center' }}>
    M/I
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**TOKEN**</DNT>
  </td>

  <td>
    ACL Token if token authentication is enabled.
  </td>

  <td>
    N/A
  </td>

  <td style={{ 'text-align': 'center' }}>
    M/I
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**ENABLE_SSL**</DNT>
  </td>

  <td>
    Connect using SSL.
  </td>

  <td>
    false
  </td>

  <td style={{ 'text-align': 'center' }}>
    M/I
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**CA_BUNDLE_FILE**</DNT>
  </td>

  <td>
    Alternative Certificate Authority bundle file.
  </td>

  <td>
    N/A
  </td>

  <td style={{ 'text-align': 'center' }}>
    M/I
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**CA_BUNDLE_DIR**</DNT>
  </td>

  <td>
    Alternative Certificate Authority bundle directory.
  </td>

  <td>
    N/A
  </td>

  <td style={{ 'text-align': 'center' }}>
    M/I
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**TRUST_SERVER_CERTIFICATE**</DNT>
  </td>

  <td>
    If set to true, server certificate is NOT verified for SSL.
  </td>

  <td>
    false
  </td>

  <td style={{ 'text-align': 'center' }}>
    M/I
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**FAN_OUT**</DNT>
  </td>

  <td>
    If true will attempt to gather metrics from all other nodes in Consul
    cluster.
  </td>

  <td>
    true
  </td>

  <td style={{ 'text-align': 'center' }}>
    M
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**CHECK_LEADERSHIP**</DNT>
  </td>

  <td>
    Check leadership on consul server. This should be disabled on consul in
    client mode.
  </td>

  <td>
    true
  </td>

  <td style={{ 'text-align': 'center' }}>
    M
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**TIMEOUT**</DNT>
  </td>

  <td>
    Timeout for each of the consul client calls.
  </td>

  <td>
    30s
  </td>

  <td style={{ 'text-align': 'center' }}>
    M/I
  </td>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**METRICS**</DNT>
  </td>

  <td>
    Set to `true` to enable metrics-only collection.
  </td>

  <td>
    false
  </td>

  <td style={{ 'text-align': 'center' }}/>
</tr>

{
  ' '
}

<tr>
  <td>
    <DNT>**INVENTORY**</DNT>
  </td>

  <td>
    Set to `true` to enable inventory-only collection.
  </td>

  <td>
    false
  </td>

  <td style={{ 'text-align': 'center' }}/>
</tr>
Setting
**HOSTNAME**

The values for these settings can be defined in several ways:

  • Adding the value directly in the config file. This is the most common way.
  • Replacing the values from environment variables using the {{}} notation. This requires infrastructure agent v1.14.0+. Read more here or see the example below.
  • Using secrets management. Use this to protect sensitive information, such as passwords that would be exposed in plain text on the configuration file. For more information, see Secrets management.

Labels [#labels]

You can further decorate your metrics using labels. Labels allow you to add attributes (key/value pairs) to your metrics, which you can then use to query, filter, or group your metrics.

Our default sample config file includes examples of labels but, because they're not mandatory, you can remove, modify, or add new ones of your choice.

 labels:
   env: production
   role: load_balancer

Example configurations [#examples]

This is the basic configuration used to collect metrics and inventory from your localhost:
```
integrations:
  - name: nri-consul
    env:
      HOSTNAME: localhost
      PORT: 8500
    interval: 15s
    labels:
      environment: production
    inventory_source: config/consul
```

<Collapser id="basic-intervals" title="Basic configuration with different metric/inventory intervals"

This configuration collects metrics every 30 seconds and inventory every 60 seconds with timeout:

```
integrations:
  - name: nri-consul
    env:
      METRICS: true
      HOSTNAME: localhost
      PORT: 8500
      TIMEOUT: 10s
    interval: 15s
    labels:
      environment: production

  - name: nri-consul
    env:
      INVENTORY: true
      HOSTNAME: localhost
      PORT: 8500
      TIMEOUT: 10s
    interval: 60s
    labels:
      environment: production
    inventory_source: config/consul
```

<Collapser id="envvar-replacement" title="Environment variables replacement"

In this configuration we are using the environment variable `CONSUL_HOST` to populate the HOSTNAME setting of the integration:

```
integrations:
  - name: nri-consul
    env:
      METRICS: "true"
      HOSTNAME: {{CONSUL_HOST}}
      PORT: 8500
    interval: 15s
    labels:
      env: production
      role: load_balancer
```

<Collapser id="metrics-nofanout" title="Metrics-only with Token and no Fan Out"

In this configuration we usee an ACL token to connect and we are not collecting metrics from other connected consul nodes (FAN_OUT: false):

```
integrations:
  - name: nri-consul
    env:
      METRICS: "true"
      HOSTNAME: localhost
      PORT: 8500
      TOKEN: my_token
      FAN_OUT: false
    interval: 15s
    labels:
      env: production
      role: load_balancer
```

<Collapser id="multi-instance" title="Multi-instance monitoring"

In this configuration we are monitoring multiple Consul servers from the same integration. For the first instance (`HOSTNAME: 1st_consul_host`) we are collecting metrics and inventory while for the second instance (`HOSTNAME: 2nd_consul_host`) we will only collect metrics.

```
integrations:
  - name: nri-consul
    env:
      METRICS: "true"
      HOSTNAME: 1st_consul_host
      PORT: 8500
    interval: 15s
    labels:
      env: production
      role: load_balancer
  - name: nri-consul
    env:
      INVENTORY: "true"
      HOSTNAME: 1st_consul_host
      PORT: 8500
    interval: 60s
    labels:
      env: production
      role: load_balancer
    inventory_source: config/consul

  - name: nri-consul
    env:
      METRICS: "true"
      HOSTNAME: 2nd_consul_host
      PORT: 8500
    interval: 15s
    labels:
      env: production
      role: load_balancer
```

Find and use data [#find-and-use]

Data from this integration can be found by going to: one.newrelic.com > Infrastructure > Third-party services > Apache.

Apache data is attached to the ConsulDatacenterSampleand ConsulAgentSample event types. You can query this data for troubleshooting purposes or to create custom charts and dashboards.

For more on how to find and use your data, see Understand integration data.

Metric data [#metrics]

The HashiCorp Consul integration collects the following metric data attributes:

Consul data center sample metrics [#consul-datacenter-sample]

These attributes are attached to the ConsulDatacenterSample event type:

  <th>
    Description
  </th>
</tr>
  <td>
    The number of nodes with service status `critical` from those registered.
  </td>
</tr>

<tr>
  <td>
    `consul.catalog.nodes_passing`
  </td>

  <td>
    The number of nodes with service status `passing` from those registered.
  </td>
</tr>

<tr>
  <td>
    `consul.catalog.nodes_up`
  </td>

  <td>
    The number of nodes.
  </td>
</tr>

<tr>
  <td>
    `consul.catalog.nodes_warning`
  </td>

  <td>
    The number of nodes with service status `warning` from those registered.
  </td>
</tr>

<tr>
  <td>
    `consul.catalog.total_nodes`
  </td>

  <td>
    The number of nodes registered in the consul cluster.
  </td>
</tr>

<tr>
  <td>
    `consul.memberlist.msg.suspect`
  </td>

  <td>
    The number of times an agent suspects another as failed while probing during gossip protocol.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.apply`
  </td>

  <td>
    The number of raft transactions occurring.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.commitTime.avg`
  </td>

  <td>
    The average time it takes to commit a new entry to the raft log on the leader.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.commitTime.count`
  </td>

  <td>
    The number of samples of `raft.commitTime`.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.commitTime.max`
  </td>

  <td>
    The max time it takes to commit a new entry to the raft log on the leader.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.commitTime.median`
  </td>

  <td>
    The median time it takes to commit a new entry to the raft log on the leader.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.leader.dispatchLog.avg`
  </td>

  <td>
    The average time it takes for the leader to write log entries to disk.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.leader.dispatchLog.count`
  </td>

  <td>
    The number of samples of `raft.leader.dispatchLog`.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.leader.dispatchLog.max`
  </td>

  <td>
    The max time it takes for the leader to write log entries to disk.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.leader.dispatchLog.median`
  </td>

  <td>
    The median time it takes for the leader to write log entries to disk.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.leader.lastContact.avg`
  </td>

  <td>
    The average time elapsed since the leader was last able to check its lease with followers.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.leader.lastContact.count`
  </td>

  <td>
    The number of samples of `raft.leader.lastContact`.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.leader.lastContact.max`
  </td>

  <td>
    The max time elapsed since the leader was last able to check its lease with followers.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.leader.lastContact.median`
  </td>

  <td>
    The median time elapsed since the leader was last able to check its lease with followers.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.state.candidate`
  </td>

  <td>
    The number of initiated leader elections.
  </td>
</tr>

<tr>
  <td>
    `consul.raft.state.leader`
  </td>

  <td>
    The number of completed leader elections.
  </td>
</tr>

<tr>
  <td>
    `consul.serf.member.flap`
  </td>

  <td>
    The number of times an agent is marked dead and then quickly recovers.
  </td>
</tr>
Metric
`consul.catalog.nodes_critical`

Consul agent sample metrics [#consul-agent-sample]

These attributes are attached to the ConsulAgentSample event type:

  <th>
    Description
  </th>
</tr>
  <td>
    ACL cache hits.
  </td>
</tr>

<tr>
  <td>
    `agent.aclCacheMiss`
  </td>

  <td>
    ACL cache misses.
  </td>
</tr>

<tr>
  <td>
    `agent.kvStores`
  </td>

  <td>
    The number of samples of `kvs.apply`.
  </td>
</tr>

<tr>
  <td>
    `agent.kvStoresAvgInMilliseconds`
  </td>

  <td>
    The average time it takes to complete an update to the KV store.
  </td>
</tr>

<tr>
  <td>
    `agent.kvStoresMaxInMilliseconds`
  </td>

  <td>
    The max time it takes to complete an update to the KV store.
  </td>
</tr>

<tr>
  <td>
    `agent.kvStoresMedianInMilliseconds`
  </td>

  <td>
    The median time it takes to complete an update to the KV store.
  </td>
</tr>

<tr>
  <td>
    `agent.peers`
  </td>

  <td>
    The number of peers in the peer set.
  </td>
</tr>

<tr>
  <td>
    `agent.staleQueries`
  </td>

  <td>
    Served queries within the allowed stale threshold.
  </td>
</tr>

<tr>
  <td>
    `agent.txnAvgInMilliseconds`
  </td>

  <td>
    The average time it takes to apply a transaction operation.
  </td>
</tr>

<tr>
  <td>
    `agent.txnMaxInMilliseconds`
  </td>

  <td>
    The max time it takes to apply a transaction operation.
  </td>
</tr>

<tr>
  <td>
    `agent.txnMedianInMilliseconds`
  </td>

  <td>
    The median time it takes to apply a transaction operation.
  </td>
</tr>

<tr>
  <td>
    `agent.txns`
  </td>

  <td>
    The number of samples of `txn.apply`.
  </td>
</tr>

<tr>
  <td>
    `client.rpcFailed`
  </td>

  <td>
    Measure of failed RPC requests.
  </td>
</tr>

<tr>
  <td>
    `client.rpcLoad`
  </td>

  <td>
    Measure of how much an agent is loading Consul servers.
  </td>
</tr>

<tr>
  <td>
    `client.rpcRateLimited`
  </td>

  <td>
    Measure of RPC requests that get rate limited.
  </td>
</tr>

<tr>
  <td>
    `net.agent.maxLatencyInMilliseconds`
  </td>

  <td>
    Maximum latency from this node to all others.
  </td>
</tr>

<tr>
  <td>
    `net.agent.medianLatencyInMilliseconds`
  </td>

  <td>
    Median latency from this node to all others.
  </td>
</tr>

<tr>
  <td>
    `net.agent.minLatencyInMilliseconds`
  </td>

  <td>
    Minimum latency from this node to all others.
  </td>
</tr>

<tr>
  <td>
    `net.agent.p25LatencyInMilliseconds`
  </td>

  <td>
    P25 latency from this node to all others.
  </td>
</tr>

<tr>
  <td>
    `net.agent.p75LatencyInMilliseconds`
  </td>

  <td>
    P75 latency from this node to all others.
  </td>
</tr>

<tr>
  <td>
    `net.agent.p90LatencyInMilliseconds`
  </td>

  <td>
    P90 latency from this node to all others.
  </td>
</tr>

<tr>
  <td>
    `net.agent.p95LatencyInMilliseconds`
  </td>

  <td>
    P95 latency from this node to all others.
  </td>
</tr>

<tr>
  <td>
    `net.agent.p99LatencyInMilliseconds`
  </td>

  <td>
    P99 latency from this node to all others.
  </td>
</tr>

<tr>
  <td>
    `runtime.allocations`
  </td>

  <td>
    Cumulative count of heap objects allocated.
  </td>
</tr>

<tr>
  <td>
    `runtime.allocationsInBytes`
  </td>

  <td>
    The current bytes allocated by the Consul process.
  </td>
</tr>

<tr>
  <td>
    `runtime.frees`
  </td>

  <td>
    Cumulative count of heap objects freed.
  </td>
</tr>

<tr>
  <td>
    `runtime.gcCycles`
  </td>

  <td>
    The number of completed GC cycles.
  </td>
</tr>

<tr>
  <td>
    `runtime.gcPauseInMilliseconds`
  </td>

  <td>
    Cumulative nanoseconds in GC stop-the-world pauses since Consul started.
  </td>
</tr>

<tr>
  <td>
    `runtime.goroutines`
  </td>

  <td>
    The number of running go routines.
  </td>
</tr>

<tr>
  <td>
    `runtime.heapObjects`
  </td>

  <td>
    The number of objects allocated on the heap
  </td>
</tr>

<tr>
  <td>
    `runtime.virtualAddressSpaceInBytes`
  </td>

  <td>
    Total size of the virtual address space reserved by the go runtime.
  </td>
</tr>
Metric
`agent.aclCacheHit`

Inventory data [#inventory]

The HashiCorp Consul integration captures the configuration parameters and current settings of the Consul Agent nodes. It collects the results of the /v1/agent/self REST API endpoint. It pulls the Config and DebugConfig sections from that response.

**Note**: Nested sections within `Config` and `DebugConfig` are not collected.

The data is available on the Inventory page, under the config/consul source. For more about inventory data, see Understand integration data.

Check the source code [#source-code]

This integration is open source software. That means you can browse its source code and send improvements, or create your own fork and build it.