# Apache Druid and Log4j

Apache Druid uses Log4J to emit information as it runs. They not only enable you to investigate issues and solve problems, but to understand how each of Druid's processes work in isolation and in collaboration with one another.

In this learning module we will:

* Identify the various Druid process log files.
* Understand the role of the log files.
* Review some task log files.

The first step in making use of log files is to become aware of what logs are available. We'll see that the Druid processes each generate a couple of different files. In addition to the process log files, we will see that transient Druid worker tasks also generate log files.

As Druid processes run, they write status information into files called log files. We can use these files to understand the Druid processes' behaviors and diagnose problems.

Since Druid is a distributed system, we will find log files for each Druid process. In addition, Druid also captures the output written to the standard output.

Some processes may spin off tasks to perform sub-processing. In Druid, a task is separate process that usually runs in its own JVM. Each of these tasks create their own log files.

Many of the logs capture behavior during ingestion and other processing, but we can also configure Druid to capture specific query information.

# Installation

To use this notebook, you will need to run Druid locally.

You will also make extensive use of the terminal, which we suggest you place alongside this notebook.

>  You must install Druid locally. When running this tutorial in the learn-druid docker image, opening a terminal window will open a terminal on the pod in which Jupyter Labs is running, and you will not be able to install Druid.

## Install multitail

```bash
brew install multitail
```

## Install and start Apache Druid

> If you are running JupyterLab on your local machine, open a terminal window by clicking here.
> 
> <button data-commandLinker-command="terminal:create-new" href="#">Open a terminal</button>
> 
> Alternatively, start your local terminal in the usual way.

Use the following commands to install Druid.

Run the following to create a dedicated folder for learn-druid in your home directory:

```bash
cd && mkdir learn-druid-logs && cd learn-druid-logs
```

Now pull a compatible version of Druid:

```bash
wget https://dlcdn.apache.org/druid/28.0.1/apache-druid-28.0.1-bin.tar.gz && tar -xzf apache-druid-28.0.1-bin.tar.gz &&
cd apache-druid-28.0.1
```

Run the following command to start Druid:

```bash
bin/start-druid
```

# Process log files

Open a second terminal window.

> If you are running JupyterLab on your local machine, open a terminal window by clicking here.
> 
> <button data-commandLinker-command="terminal:create-new" href="#">Open a terminal</button>
> 
> Alternatively, start your local terminal in the usual way.

The log file locations are set in the `log4j2.xml` alongside Druid's own configuration files.

Run this command to view the configuration file that's used by the `learn-druid` script:

```bash
more ~/learn-druid-logs/apache-druid-28.0.1/conf/druid/auto/_common/log4j2.xml
```

* [`Properties`](https://logging.apache.org/log4j/2.x/manual/configuration.html#PropertySubstitution) provide key/values pairs that may be used throughout the configuration file.
* [`Appenders`](https://logging.apache.org/log4j/2.x/manual/appenders.html) designate the format of log messages and determine the target for the messages.
* [`Loggers`](https://logging.apache.org/log4j/2.x/manual/configuration.html#Loggers) filter the log messages and dispense them to Appenders. Loggers can filter messages based on the Java package and/or class and by message priority.

In Druid, `Properties` are leveraged to set a location for all logs. This location is calculated when Druid stars, and passed as a parameter to the Java and then to Log4J for that process. By default, this location is a "log" folder at the root of your Druid installation, but [can be over-ridden]((https://druid.apache.org/docs/latest/configuration/logging#log-directory) by using the `DRUID_LOG_DIR` system variable.

There are two `appenders`:

* A `Console` appender for `SYSTEM_OUT`.
* A `RollingRandomAccessFile` appender called "FileAppender" which is used for detailed process logs.

Run the following command to see both these types of files in your running environment.

```bash
cd ~/learn-druid-logs/apache-druid-28.0.1/log
ls
```

* `process name.stdout.log` file - which is the information written by the processes to stdout (i.e., the terminal)
* `process name.log` - file containing various status, error, warning and debug messages


## Console output (stdout)

Take a look at the _standard out_ log files for the Broker service using the following command:

```bash
cat broker.stdout.log
```

A typical output will be:

```
Running [broker], logging to [/Users/yourname/learn-druid-logs/apache-druid-28.0.1/bin/../log/broker.log] if no changes made to log4j2.xml
```

## Processing logs

Use the following tail command to look at the last few lines of the Middle Manager's processing log:

```bash
tail 10 middleManager.log
```

If you are running JupyterLabs and Druid locally, run the following cell to start an ingestion:

In [None]:
import json
import requests

# Make sure you replace `your-instance`, and `port` with the values for your deployment.
url = "http://localhost:8888/druid/v2/sql/task/"

payload = json.dumps({
  "query": "INSERT INTO \"example-wikipedia-logs\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": \"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n  )\n)\nPARTITIONED BY DAY"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload)

print(response.text)

In the terminal monitoring the Middle Manager, you will see a number of log entries created, concluding in something similar to:

```
2024-02-07T12:55:56,434 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [query-58b4b3c6-617e-4201-b0a4-ef63af0ca39c] completed with status [SUCCESS].
```

## Reading the log files

* Understand the format of individual log entries
* Learn how to configure log entries in log4j2.xml
* Learn about log entry severity levels and how to filter using them


### Druid log patterns

Refer back to the log4j configuration, notice the `PatternLayout`.

For the `FileAppender` this has a default of:

`%d{ISO8601} %p [%t] %c -%notEmpty{ [%markerSimpleName]} %m%n`

* Timestamp (`%d{ISO8601}`)
* Message priority (`%p`)
* Thread name (`[%t]`)
* Class name (`%c`)
* Message (`%m%n`)


### Message priority

Community developers assign different log levels to different entries, indicating how severe an event is.

* FATAL (system failure)
* ERROR (functional failure)
* WARN (non-fatal issue)
* INFO (notable event)
* DEBUG (program debugging messages)
* TRACE (highly granular execution event)

The base level of logging is set in the `Root` section within `Loggers` in the `log4j2.xml` configuration file for the `FileAppender`. By default, `INFO` is set as the base level.

```xml
    <Root level="info">
        <AppenderRef ref="FileAppender"/>
    </Root>
```

Other base levels are set at a class level, reducing log noise. For example:

```xml
    <!-- Quieter KafkaSupervisors -->
    <Logger name="org.apache.kafka.clients.consumer.internals" level="warn" additivity="false">
        <Appender-ref ref="FileAppender"/>
    </Logger>
```

Run the following command to see `WARN`-level log messages in the Middle Manager log.

```bash
grep WARN middleManager.log
```

Run this command to amend the current logging level:

something

```<Root level="debug">```

something

something

### Threads

Log files are generated by individual processes as they run. In a Druid cluster where multiple instances of the same process are running independently, it can be helpful to collate the same types of process log together. In this case, the thread name becomes especially important.



### Class names

You can filter log events to specific classes in the underlying Druid code.

A typical entry in a log file contains the package and the class name, for example:

`org.apache.curator.utils.Compatibility`

Here the package is `org.apache.curator.utils` and the class name is `Compatibility`.

Run the following command to see events that have been emitted by the XXXX class:

something

something

something

### Messages

Developers include messages to describe what has happened, what the state of the process or some significant variable is.

In [None]:
## Operational logging



## Ingestion logging

The key processes are:

* Overlord
* MiddleManager
* Tasks
* Historical

### Watch logs while ingesting

```bash
multitail -du -P a \
    -f coordinator-overlord.log \
    -f middleManager.log \
    -f historical.log
```

Whilst in the multitail window, type `O` to clear the output and then hit enter to mark a line.

In a new terminal window, run this command to start an ingestion.

```bash
curl --location --request POST 'http://localhost:8888/druid/v2/sql/task/' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "query": "INSERT INTO \"example-wikipedia-logs\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": \"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n  )\n)\nPARTITIONED BY DAY"
  }'
```

Take note of the activity that is logged across the processes as they cooperate with one another to:

* Plan and distribute the work to Middle Managers.
* Spawn tasks to handle the ingest, optimization, and push to deep storage.
* Load the data out of deep storage and into historicals.

Feel free to repeat the steps above to watch what happens.

### Task logs

We will not see the task logs for the ingestion. Recall that each task creates a separate log, and that we use the API to retrieve the log. However, we will see (near the bottom of the middle manager log) the local location of the task log file. We can use the editor (or the less command) to peruse the task log file locally.

Use the Tasks API to list the available task logs.

```bash
curl http://localhost:8081/druid/indexer/v1/tasks | jq
```

To get the log for a task, use the task log API, providing the ID and the log endpoint.

In the following command, switch the ID for one of the IDs output above, and then run it to pull the log file of one of the tasks.

```bash
curl http://localhost:8888/druid/indexer/v1/task/<ID>/log
```

## Query logging

The key processes involved in interactive queries (>>> API <<<) are:

* Broker
* Historical
* Tasks (for streaming ingestion)

For non-interactive SQL queries, the processes are:

* Broker
* Overlord
* Middle Manager
* Tasks


### Enable request logging

Sometimes it may be helpful to understand what queries Druid is fielding as well as who is making the queries. [Request logs](https://druid.apache.org/docs/latest/operations/request-logging/) give us this information.

By default, request logging is disabled. So, in the next couple of steps we enable query logging and restart Druid so that the configuration change takes effect.

Run this script in your terminal to amend the `common.runtime.properties` file so that request logging is enabled.

```bash
sed -i '' 's/druid.startup.logging.logProperties=true/druid.startup.logging.logProperties=true\ndruid.request.logging.type=slf4j/' \
  ~/learn-druid-logs/apache-druid-28.0.1/conf/druid/auto/_common/common.runtime.properties
```

Restart your Druid deployment for the change to take effect.

Use CTRL+C to stop running processes, then repeat the druid-start above.

### Monitor an interactive query

```bash
multitail -du -P a \
    -f broker.log \
    -f historical.log
```

Run this command to run a query:

```bash
curl "http://localhost:8888/druid/v2/sql" \
--header 'Content-Type: application/json' \
--data '{
    "query": "SELECT * FROM \"example-wikipedia-logs\" ",
    "context" : {"sqlQueryId" : "learn-druid-logs-sample-query"},
    "header" : true
}'
```

Notice that the historical log entry for the query contains a JSON object which contains:
* Metrics about how long it took to run.
* An identifier for this query, as given in the query parameter context.
* The SQL that was run.
* Context parameters.

### Monitor a non-interactive query



## Learn more

In the lab you learned that you can turn on logging for query requests with the druid.startup.logging.logProperties setting. Read all the options - including other possible targets for these logs - in the documentation. An interesting configuration, for example, automatically filters query logging for you.

* [Request logging](https://druid.apache.org/docs/latest/configuration/index.html#request-logging)
* [Filtered request logging](https://druid.apache.org/docs/latest/configuration/index.html#filtered-request-logging)

This information can be really powerful: watch this Druid Summit presentation by Amir Youssefi and Pawas Ranjan from Conviva that describes how useful this information can be to tuning Druid clusters.

* [Druid optimizations for scaling customer facing analytics at Conviva](https://youtu.be/zkHXr-3GFJw?t=746)

Take a few minutes to scan the official documentation for information about logging configuration. You may want to keep this page to hand throughout the course.

* [Logging](https://druid.apache.org/docs/latest/configuration/logging.html)

You're about to learn more about Apache Druid's use of Apache Logging Services in the form of Log4J™. Get insight into the background and benefits of Log4J on the official project website:

* [Apache Logging Services](https://logging.apache.org/)