# Apache Druid and Log4j

Apache Druid uses Log4J to emit information as it runs. They not only enable you to investigate issues and solve problems, but to understand how each of Druid's processes work in isolation and in collaboration with one another.

In this learning module we will:

* Identify the various Druid process log files.
* Understand the role of the log files.
* Review some task log files.

The first step in making use of log files is to become aware of what logs are available. We'll see that the Druid processes each generate a couple of different files. In addition to the process log files, we will see that transient Druid worker tasks also generate log files.

As Druid processes run, they write status information into files called log files. We can use these files to understand the Druid processes' behaviors and diagnose problems.

Since Druid is a distributed system, we will find log files for each Druid process. In addition, Druid also captures the output written to the standard output.

Some processes may spin off tasks to perform sub-processing. In Druid, a task is separate process that usually runs in its own JVM. Each of these tasks create their own log files.

Many of the logs capture behavior during ingestion and other processing, but we can also configure Druid to capture specific query information.

## Prerequisites

This tutorial works with Druid 28.0.1 or later.

### Run with Docker

Launch this tutorial and all prerequisites using the `jupyter` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see the Learn Druid repository [readme](https://github.com/implydata/learn-druid).

## Initialization

To use this notebook, you will need to run Druid locally.

You will also make extensive use of the terminal, which we suggest you place alongside this notebook or on another screen.

### Install tools

Open a local terminal window.

If you have not already install `wget` or `multitail`, run the following commands to install these tools using `brew`.

```bash
brew install multitail && brew install wget
```

Run the following command to pull the default configuration for `multitail` to your home folder. Do not run this command if you are already running `multitail` as it will overwrite your own configuration.

```bash
curl https://raw.githubusercontent.com/halturin/multitail/master/multitail.conf > ~/.multitailrc
```

### Install Apache Druid

Run the following to create a dedicated folder for learn-druid in your home directory:

```bash
cd && mkdir learn-druid-local
cd learn-druid-local
```

Now pull a compatible version of Druid.

> If you do not have wget on your Mac, you can install it with brew.

```bash
wget https://dlcdn.apache.org/druid/28.0.1/apache-druid-28.0.1-bin.tar.gz && tar -xzf apache-druid-28.0.1-bin.tar.gz &&
cd apache-druid-28.0.1
```

## Query logging

The key processes involved in interactive queries (>>> API <<<) are:

* Broker
* Historical
* Tasks (for streaming ingestion)

For non-interactive SQL queries, the processes are:

* Broker
* Overlord
* Middle Manager
* Tasks

Typically, you would start at the broker, and then move on to look at the logs generated by the historicals and middle managers (for interactive API queries), or to the middle manager and tasks (for non-interactive queries).

### Enable request logging

Sometimes it may be helpful to understand what queries Druid is fielding as well as who is making the queries. [Request logs](https://druid.apache.org/docs/latest/operations/request-logging/) give us this information.

By default, request logging is disabled. So, in the next couple of steps we enable query logging and restart Druid so that the configuration change takes effect.

Run this script in your terminal to amend the `common.runtime.properties` file so that request logging is enabled.

```bash
sed -i '' 's/druid.startup.logging.logProperties=true/druid.startup.logging.logProperties=true\ndruid.request.logging.type=slf4j/' \
  ~/learn-druid-local/apache-druid-28.0.1/conf/druid/auto/_common/common.runtime.properties
```

Restart your Druid deployment for the change to take effect.

Use CTRL+C to stop running processes, then repeat the druid-start above.

### Monitor an interactive query

```bash
multitail -du -P a \
    -f broker.log \
    -f historical.log
```

Run this command to run a query:

```bash
curl "http://localhost:8888/druid/v2/sql" \
--header 'Content-Type: application/json' \
--data '{
    "query": "SELECT * FROM \"example-wikipedia-logs\" ",
    "context" : {"sqlQueryId" : "learn-druid-local-sample-query"},
    "header" : true
}'
```

Notice that the historical log entry for the query contains a JSON object which contains:
* Metrics about how long it took to run.
* An identifier for this query, as given in the query parameter context.
* The SQL that was run.
* Context parameters.

### Monitor a non-interactive query



# Stop Druid

Run this command to stop Druid, replacing "{pid}" with the process Id you noted earlier.

```bash
kill {pid}
```

For example:

```bash
kill 9864
```

> If you do not remember your PID, use `ps` to look for the `supervise` process.

## Learn more

In the lab you learned that you can turn on logging for query requests with the druid.startup.logging.logProperties setting. Read all the options - including other possible targets for these logs - in the documentation. An interesting configuration, for example, automatically filters query logging for you.

* [Request logging](https://druid.apache.org/docs/latest/configuration/index.html#request-logging)
* [Filtered request logging](https://druid.apache.org/docs/latest/configuration/index.html#filtered-request-logging)

This information can be really powerful: watch this Druid Summit presentation by Amir Youssefi and Pawas Ranjan from Conviva that describes how useful this information can be to tuning Druid clusters.

* [Druid optimizations for scaling customer facing analytics at Conviva](https://youtu.be/zkHXr-3GFJw?t=746)

Take a few minutes to scan the official documentation for information about logging configuration. You may want to keep this page to hand throughout the course.

* [Logging](https://druid.apache.org/docs/latest/configuration/logging.html)

You're about to learn more about Apache Druid's use of Apache Logging Services in the form of Log4J™. Get insight into the background and benefits of Log4J on the official project website:

* [Apache Logging Services](https://logging.apache.org/)