# Apache Druid and Log4j

Apache Druid uses Log4J to emit information as it runs. They not only enable you to investigate issues and solve problems, but to understand how each of Druid's processes work in isolation and in collaboration with one another.

In this learning module we will:

* Identify the various Druid process log files.
* Understand the role of the log files.
* Review some task log files.

The first step in making use of log files is to become aware of what logs are available. We'll see that the Druid processes each generate a couple of different files. In addition to the process log files, we will see that transient Druid worker tasks also generate log files.

As Druid processes run, they write status information into files called log files. We can use these files to understand the Druid processes' behaviors and diagnose problems.

Since Druid is a distributed system, we will find log files for each Druid process. In addition, Druid also captures the output written to the standard output.

Some processes may spin off tasks to perform sub-processing. In Druid, a task is separate process that usually runs in its own JVM. Each of these tasks create their own log files.

Many of the logs capture behavior during ingestion and other processing, but we can also configure Druid to capture specific query information.

# Installation

To use this notebook, you will need to run Druid locally.

You will also make extensive use of the terminal, which we suggest you place alongside this notebook.

>  You must install Druid locally. When running this tutorial in the learn-druid docker image, opening a terminal window will open a terminal on the pod in which Jupyter Labs is running, and you will not be able to install Druid.

## Install tools

```bash
brew install multitail && brew install wget
curl https://raw.githubusercontent.com/halturin/multitail/master/multitail.conf > ~/.multitailrc
```

## Install and start Apache Druid

> If you are running JupyterLab on your local machine, open a terminal window by clicking here.
> 
> <button data-commandLinker-command="terminal:create-new" href="#">Open a terminal</button>
> 
> Alternatively, start your local terminal in the usual way.

Use the following commands to install Druid.

Run the following to create a dedicated folder for learn-druid in your home directory:

```bash
cd && mkdir learn-druid-logs
cd learn-druid-logs
```

Now pull a compatible version of Druid.

> If you do not have wget on your Mac, you can install it with brew.

```bash
wget https://dlcdn.apache.org/druid/28.0.1/apache-druid-28.0.1-bin.tar.gz && tar -xzf apache-druid-28.0.1-bin.tar.gz &&
cd apache-druid-28.0.1
```

## Ingestion logging

When monitoring and troubleshooting [ingestion](https://druid.apache.org/docs/latest/ingestion/), including [SQL-based ingestion](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html) the key processes are:

* Router
* Overlord
* MiddleManager
* Historical

Recall that the MiddleManager spawns multiple other tasks (peons) that actually carry out the ingestion work.

Typically, you would start with the overlord, then move to the middle manager, where information about individual ingestion tasks can be found. Then you can move on to look at the ingestion tasks themselves.

According to the settings in the coordinator rules, ingested data may be pulled by historicals locally. These go on to be balanced over time. The processes involved in this are:

* Coordinator
* Historical

Therefore if you have segment loading and availability issues, you would look at these two processes.

### Process logs

Use multitail to monitor all these log files. Notice that the command below includes a filter to hides some messages.

```bash
multitail --config multitail.conf -CS log4jnew -du -P a -s 2 -sn 1,2 \
    -e "org.apache.druid.indexing" -f coordinator-overlord.log \
    -f middleManager.log \
    -f historical.log
```

In [None]:
# HOW CAN THIS WORK INSIIIIDE THE DOCKER ENVIRONMENT?

import json
import requests

# Make sure you replace `your-instance`, and `port` with the values for your deployment.
url = "http://localhost:8888/druid/v2/sql/task/"

payload = json.dumps({
  "query": "INSERT INTO \"example-wikipedia-logs\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": \"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n  )\n)\nPARTITIONED BY DAY"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload)

print(response.text)

Whilst in the multitail window, type `shift+O` to clear the output of all windows. Then press 0, 1, and 2 in turn to add a line marker to all windows.

In your second terminal window, run this command to start an ingestion.

```bash
curl --location --request POST 'http://localhost:8888/druid/v2/sql/task/' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "query": "INSERT INTO \"example-wikipedia-logs\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": \"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n  )\n)\nPARTITIONED BY DAY"
  }'
```

Take note of the activity that is logged across the processes as they cooperate with one another to:

* Plan and distribute the work to Middle Managers.
* Spawn tasks to handle the ingest, optimization, and push to deep storage.
* Load the data out of deep storage and into historicals.

Feel free to repeat the steps above several times.

### Task logs

In the terminal monitoring the Middle Manager, you will see a number of log entries created, concluding in something similar to:

```
2024-02-07T12:55:56,434 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [query-58b4b3c6-617e-4201-b0a4-ef63af0ca39c] completed with status [SUCCESS].
```

Each task creates a separate log in a location you configure as an administrator.

Use the Tasks API to list the available task logs.

```bash
curl http://localhost:8081/druid/indexer/v1/tasks | jq
```

The actual location of the log file is recorded in the middlemanager log for that task.

Run this command to see the location:

```bash
more something blah location
```

Druid may migrate local log files to long-term storage, following the configuration at XXXXXX. This is an essential configuration task when you run more than one middlemanager.

To get the log for a task, use the task log API, providing the ID and the log endpoint.

In the following command, switch the ID for one of the IDs output above, and then run it to pull the log file of one of the tasks.

```bash
curl http://localhost:8888/druid/indexer/v1/task/<ID>/log
```

## Example 1: errors in ingestion

In this part of the tutorial you will use the `less` and `grep` commands. For more information see their documentaiton.

Run the following ingestion task.

```bash
curl --location --request POST 'http://localhost:8888/druid/v2/sql/task/' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "query": "INSERT INTO \"example-wikipedia-logs\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": \"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n  )\n)\nPARTITIONED BY DAY"
  }'
```

This ingestion failed.

Use this command to look at the middlemanager log:

```bash
less middleManager.log
```

Use the search feature of less to look for the failure:

```
/FAIL
```



# Stop Druid

Run this command to stop Druid, replacing "{pid}" with the process Id you noted earlier.

```bash
kill {pid}
```

For example:

```bash
kill 9864
```

> If you do not remember your PID, use `ps` to look for the `supervise` process.

## Learn more

In the lab you learned that you can turn on logging for query requests with the druid.startup.logging.logProperties setting. Read all the options - including other possible targets for these logs - in the documentation. An interesting configuration, for example, automatically filters query logging for you.

* [Request logging](https://druid.apache.org/docs/latest/configuration/index.html#request-logging)
* [Filtered request logging](https://druid.apache.org/docs/latest/configuration/index.html#filtered-request-logging)

This information can be really powerful: watch this Druid Summit presentation by Amir Youssefi and Pawas Ranjan from Conviva that describes how useful this information can be to tuning Druid clusters.

* [Druid optimizations for scaling customer facing analytics at Conviva](https://youtu.be/zkHXr-3GFJw?t=746)

Take a few minutes to scan the official documentation for information about logging configuration. You may want to keep this page to hand throughout the course.

* [Logging](https://druid.apache.org/docs/latest/configuration/logging.html)

You're about to learn more about Apache Druid's use of Apache Logging Services in the form of Log4J™. Get insight into the background and benefits of Log4J on the official project website:

* [Apache Logging Services](https://logging.apache.org/)