# Apache Druid metrics

Metrics give you insight into why your Druid instance is performing in the way that it is.

In this notebook you will take a tour of the out-of-the-box configurations for metrics in Apache Druid, and use some simple terminal commands to inspect them.

## Prerequisites

This tutorial works with Druid 30.0.0 or later. It is designed to run from a Mac with a locally running instance of Druid but can also be run on common Linux distributions and on Windows with [WSL (Windows Subsystem for Linux)](https://learn.microsoft.com/en-us/windows/wsl/).

If you wish to use this tutorial within Jupyter through the [learn-druid](https://github.com/implydata/learn-druid) Docker Compose, use the `jupyter` profile to avoid starting a second instance of Druid that may cause conflicts.

## Initialization

In this step, you will find instructions to install prerequisite tools and to deploy Druid locally.

Before starting, open a terminal window.

### Install required tools

You will need the following tools:

* `brew` to install prerequisite tools.
* `wget` to pull Apache Druid from the official repository.

For instructions on installing `brew`, see the [Homebrew homepage](https://brew.sh/).

Install `wget` using `brew`. For example:

```bash
brew install wget
```

### Install Apache Druid

Run the following to create a dedicated folder for learn-druid in your home directory:

```bash
cd ~ ; mkdir learn-druid-local
cd learn-druid-local
```

Pull and unpack a compatible version of Apache Druid from the [Apache Druid downloads page](https://dlcdn.apache.org/druid/), for example `30.0.0`.

```bash
version="30.0.0"
wget https://dlcdn.apache.org/druid/$version/apache-druid-$version-bin.tar.gz
tar -xzf apache-druid-$version-bin.tar.gz
```

Use the following commands to rename the folder.

```bash
mv apache-druid-$version apache-druid
cd apache-druid
```

# Metrics configuration

Metrics configuration is set in the `common.runtime.properties` file. This comprises:

* [Monitors](https://druid.apache.org/docs/latest/configuration/index.html#metrics-monitors), which extend Druid's built-in metrics.
* [Emitters](https://druid.apache.org/docs/latest/configuration/index.html#metrics-emitters), which push the metrics to a destination location.

## Emitters

In this section you will amend the emitter configuration in order to take a look at the JSON objects that contain the data and description for each metric.

Remembering that Druid has multiple configuration file locations out-of-the-box, run this command to view the `auto` configuration file for emitters:

```bash
grep druid.emitter ~/learn-druid-local/apache-druid/conf/druid/auto/_common/common.runtime.properties
```

Notice that, by default, the `druid.emitter` is configured to `noop`, meaning that [no metrics are emitted](https://druid.apache.org/docs/latest/configuration/#metrics-emitters).

### Change the emitter to Logging

Enable the emission of metrics from your instance to the [log files](https://druid.apache.org/docs/latest/configuration/#logging-emitter-module) by setting the `druid.emitter` to logging.

Run the following command to update your configuration.

```bash
sed -i '' 's/druid.emitter=noop/druid.emitter=logging/' \
  ~/learn-druid-local/apache-druid/conf/druid/auto/_common/common.runtime.properties
```

Additional log entries will be created containing the JSON data for each metric according to the [Logging emitter](https://druid.apache.org/docs/latest/configuration/#logging-emitter-module) configuration. This includes the `druid.emitter.logging.logLevel` of INFO for these entries.

### Start a Druid instance

Start Druid with the following command:

```bash
nohup ~/learn-druid-local/apache-druid/bin/start-druid & disown > log.out 2> log.err < /dev/null
```

### Look at the JSON metrics messages

Run the following command to display the JSON being emitted to the log files.

* `grep` finds only lines in log files related to metrics from the `LoggingEmitter`.
* The `cut` command returns only the 7th field in the data
* The result is made pretty through `jq`.

```bash
grep 'org.apache.druid.java.util.emitter.core.LoggingEmitter - \[metrics\]' ~/learn-druid-local/apache-druid/log/*.log \
  | cut -d ' ' -f 7- \
  | jq
```

Run the command a few times to build up a good sample.

You will see:

* A timestamp for the event.
* The server hostname and type that emitted the metric together with its running version, such as "druid/broker" and "29.0.0".
* The metric name, such as "serverview/init/time".
* A value for the metric.

## Monitors

Run this command to view the `auto` configuration file for for metrics monitors:

```bash
grep druid.monitoring ~/learn-druid-local/apache-druid/conf/druid/auto/_common/common.runtime.properties
```

### Inspect some metrics

The default configuration for Druid extends the basic metrics with:

* [JVM metrics](https://druid.apache.org/docs/latest/operations/metrics.html#jvm) from the `JvmMonitor` monitor.
* Service heartbeats from the `ServiceStatusMonitor` monitor.

Use the command below to see a specific JVM metric for your Coordinator process. You may want to run this command a few times to see what is happening.

```bash
grep 'org.apache.druid.java.util.emitter.core.LoggingEmitter - \[metrics\]' ~/learn-druid-local/apache-druid/log/*.log \
  | cut -d ' ' -f 7- \
  | jq 'select(.metric == "jvm/pool/used" and .service=="druid/coordinator")'
```

Notice that this metric has additional dimensions, `poolKind` and `poolName`. Other monitors emit [other dimensions](https://druid.apache.org/docs/latest/operations/metrics).

Run the following command to return your entire instance to the basic metrics for Druid:

```bash
sed -i '' 's/"org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.server.metrics.ServiceStatusMonitor"//' \
  ~/learn-druid-local/apache-druid/conf/druid/auto/_common/common.runtime.properties
```

Now restart your instance and - for the purpose of this exercise - clear down your logs.

```bash
kill $(ps -ef | grep 'supervise' | awk 'NF{print $2}' | head -n 1)
rm ~/learn-druid-local/apache-druid/log/*.log
nohup ~/learn-druid-local/apache-druid/bin/start-druid & disown > log.out 2> log.err < /dev/null
```

Run this command to see the base metrics that are now being emitted:

```bash
grep 'org.apache.druid.java.util.emitter.core.LoggingEmitter - \[metrics\]' ~/learn-druid-local/apache-druid/log/*.log \
  | cut -d ' ' -f 7- \
  | jq
```

### Add a process-specific monitor

Some monitors are designed to work on specific processes. Enabling monitors on unsupported processes will cause that process to fail during startup. In this section you will add the [Historical](https://druid.apache.org/docs/latest/operations/metrics/#historical-1) monitor.

First, stop your cluster with the following command:

```bash
kill $(ps -ef | grep 'supervise' | awk 'NF{print $2}' | head -n 1)
```

Now run this command to add the Historical monitor to your Historical process's `runtime.properties`.

```bash
echo "druid.monitoring.monitors=[\"org.apache.druid.server.metrics.HistoricalMetricsMonitor\"]" >> \
  ~/learn-druid-local/apache-druid/conf/druid/auto/historical/runtime.properties
```

Start Druid with the following command:

```bash
nohup ~/learn-druid-local/apache-druid/bin/start-druid & disown > log.out 2> log.err < /dev/null
```

Now run this command to review some of the metrics data from the Historical:

```bash
grep 'org.apache.druid.java.util.emitter.core.LoggingEmitter - \[metrics\]' ~/learn-druid-local/apache-druid/log/historical.log \
  | cut -d ' ' -f 7- \
  | jq
```

### Increase the emission period

There is a default [emission period](https://druid.apache.org/docs/latest/configuration/#enabling-metrics) of 1 minute. Apply a `druid.monitoring.emissionPeriod` to your configuration to have metrics emitted at a different rate.

Run this command to have the Historical process emit metrics every 15 seconds:

```bash
echo "druid.monitoring.emissionPeriod=PT15S" >> \
  ~/learn-druid-local/apache-druid/conf/druid/auto/historical/runtime.properties
```

To apply the configuration, restart your instance:

```bash
kill $(ps -ef | grep 'supervise' | awk 'NF{print $2}' | head -n 1)
rm ~/learn-druid-local/apache-druid/log/*.log
nohup ~/learn-druid-local/apache-druid/bin/start-druid & disown > log.out 2> log.err < /dev/null
```

Run this command to see the metrics as they are being emitted. For ease of reading, the `jq` portion of this command only selects the timestamp, metric name, and its value.

```bash
grep 'org.apache.druid.java.util.emitter.core.LoggingEmitter - \[metrics\]' ~/learn-druid-local/apache-druid/log/historical.log \
  | cut -d ' ' -f 7- \
  | jq '"\(.timestamp) \(.metric) \(.value)"'
```

# Clean up

Run this command to stop Druid.

```bash
kill $(ps -ef | grep 'supervise' | awk 'NF{print $2}' | head -n 1)
```

Delete the `learn-druid-local` folder from your home folder in the usual way.

## Learn more

You've seen how the two components of Druid's configuration for metrics are controlled - through monitors and through emitters - and that you can configure these either at the cluster level or for individual processes.

* Read about [monitors](https://druid.apache.org/docs/latest/configuration/index.html#metrics-monitors) and [emitters](https://druid.apache.org/docs/latest/configuration/index.html#metrics-emitters) in the official documentation.
* Try out all the other monitors that are available, remembering that some monitors are only applicable to specific processes, requiring you to modify the `runtime.properties` for those processes only.
* Try out some of the [core emitters](https://druid.apache.org/docs/latest/configuration/#metrics-emitters) that are available as well as those available as [community extensions](https://druid.apache.org/docs/latest/configuration/extensions/#community-extensions), such as the [Apache Kafka](https://druid.apache.org/docs/latest/development/extensions-contrib/kafka-emitter), [statsd](https://druid.apache.org/docs/latest/development/extensions-contrib/statsd), and [Prometheus](https://druid.apache.org/docs/latest/development/extensions-contrib/prometheus) emitters.
* Experiment by using the Kafka emitter to push your instance's own metrics into a topic that you then consume back into the cluster with real-time ingestion.