# Asychronous processing logs

Apache Druid uses Log4J to emit information as it runs. They not only enable you to investigate issues and solve problems, but to understand how each of Druid's processes work in isolation and in collaboration with one another.

In this learning module we will:

* Identify the various Druid process log files.
* Understand the role of the log files.
* Review some task log files.

The first step in making use of log files is to become aware of what logs are available. We'll see that the Druid processes each generate a couple of different files. In addition to the process log files, we will see that transient Druid worker tasks also generate log files.

As Druid processes run, they write status information into files called log files. We can use these files to understand the Druid processes' behaviors and diagnose problems.

Since Druid is a distributed system, we will find log files for each Druid process. In addition, Druid also captures the output written to the standard output.

Some processes may spin off tasks to perform sub-processing. In Druid, a task is separate process that usually runs in its own JVM. Each of these tasks create their own log files.

Many of the logs capture behavior during ingestion and other processing, but we can also configure Druid to capture specific query information.

## Prerequisites

This tutorial works with Druid 28.0.1 or later.

### Run with Docker

Launch this tutorial and all prerequisites using the `jupyter` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see the Learn Druid repository [readme](https://github.com/implydata/learn-druid).

## Initialization

To use this notebook, you will need to run Druid locally.

You will also make extensive use of the terminal, which we suggest you place alongside this notebook or on another screen.

### Install tools

Open a local terminal window.

If you have not already install `wget` or `multitail`, run the following commands to install these tools using `brew`.

```bash
brew install multitail && brew install wget
```

Run the following command to pull the default configuration for `multitail` to your home folder. Do not run this command if you are already running `multitail` as it will overwrite your own configuration.

```bash
curl https://raw.githubusercontent.com/halturin/multitail/master/multitail.conf > ~/.multitailrc
```

### Install Apache Druid

Run the following to create a dedicated folder for learn-druid in your home directory:

```bash
cd && mkdir learn-druid-local
cd learn-druid-local
```

Now pull a compatible version of Druid.

> If you do not have wget on your Mac, you can install it with brew.

```bash
wget https://dlcdn.apache.org/druid/28.0.1/apache-druid-28.0.1-bin.tar.gz && tar -xzf apache-druid-28.0.1-bin.tar.gz &&
cd apache-druid-28.0.1
```

## Process logs

Monitor and troubleshoot [ingestion](https://druid.apache.org/docs/latest/ingestion/) or [SQL-based ingestion](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html) by monitoring logs emitted from these processes:

* Router
* Overlord
* MiddleManager
* Historical

The MiddleManager spawns multiple other tasks (peons) that carry out the work.

Typically, you would start with the overlord, then move to the middle manager, where information about individual ingestion tasks can be found. Then you can move on to look at the ingestion tasks themselves.

Additionally, for ingestion, data may be pulled by historicals locally. These go on to be balanced over time. The processes involved in this are:

* Coordinator
* Historical

Therefore if you have segment loading and availability issues, you would look at these two processes.

## Task logs

Each task creates a log in a location you configure as an administrator. The actual location of the log file is recorded in the middlemanager log for that task.

Run this command to see the location:

```bash
more something blah location
```

To get the log for a task, use the task log API, providing the ID and the log endpoint.

In the following command, switch the ID for one of the IDs output above, and then run it to pull the log file of one of the tasks.

```bash
curl http://localhost:8888/druid/indexer/v1/task/<ID>/log
```

Druid may migrate local log files to long-term storage, following the configuration at XXXXXX. This is an essential configuration task in a production cluster. It ensures centralization when you have ingestion that parallelises to multiple tasks - especially important with highly-parallized tasks, and reduces the use of node local disks.

## Task log retention

Use `druid.indexer.logs.kill` settings in your Coordinator `runtime.properties` to have the Coordinator manage the retention of task log files centrally.

For example, a configuration like the following will autoatically delete task logs from the central store that are older than 120,000 milliseconds:

```
druid.indexer.logs.kill.enabled=true
druid.indexer.logs.kill.durationToRetain=120000
druid.indexer.logs.kill.initialDelay=60000
druid.indexer.logs.kill.delay=60000
```

Note that you must restart the Coordinator process for these settings to take effect.

* `druid.indexer.logs.kill.enabled` to true tells Druid to delete old task log files.
* `druid.indexer.logs.kill.durationToRetain` tells Druid how old (in milliseconds) log files must be to be deleted.
* `druid.indexer.logs.kill.initialDelay` tells Druid how long to wait (in milliseconds) before attempting to delete old log files.
* `druid.indexer.logs.kill.delay` tells Druid how long to wait (in milliseconds) after between attempting to delete old log files for the first time since the process started.

## Example 1: Monitoring a SQL-based ingestion

Run the command below to use multitail to view these log files. Notice that the command below includes a filter to hides some messages.

```bash
multitail --config multitail.conf -CS log4jnew -du -P a -s 2 -sn 1,2 \
    -e "org.apache.druid.indexing" -f coordinator-overlord.log \
    -f middleManager.log \
    -f historical.log
```

Whilst in the multitail window, type `shift+O` to clear the output of all windows. Then press 0, 1, and 2 in turn to add a line marker to all windows.

In your second terminal window, run this command to start an ingestion.

```bash
curl --location --request POST 'http://localhost:8888/druid/v2/sql/task/' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "query": "INSERT INTO \"example-wikipedia-logs\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": \"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n  )\n)\nPARTITIONED BY DAY"
  }'
```

Take note of the activity that is logged across the processes as they cooperate with one another to:

* Plan and distribute the work to Middle Managers.
* Spawn tasks to handle the ingest, optimization, and push to deep storage.
* Load the data out of deep storage and into historicals.

Feel free to repeat the steps above several times.

In the terminal monitoring the Middle Manager, you will see a number of log entries created, concluding in something similar to:

```
2024-02-07T12:55:56,434 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [query-58b4b3c6-617e-4201-b0a4-ef63af0ca39c] completed with status [SUCCESS].
```

Use the Tasks API to list the available task logs.

```bash
curl http://localhost:8081/druid/indexer/v1/tasks | jq
```

## Example 2: Diagnosing an error in ingestion

In this part of the tutorial you will use the `less` and `grep` commands. For more information see their documentaiton.

Run the following ingestion task.

```bash
curl --location --request POST 'http://localhost:8888/druid/v2/sql/task/' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "query": "INSERT INTO \"example-wikipedia-logs\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": \"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n  )\n)\nPARTITIONED BY DAY"
  }'
```

This ingestion failed.

Use this command to look at the middlemanager log:

```bash
less middleManager.log
```

Use the search feature of less to look for the failure:

```
/FAIL
```



# Stop Druid

Run this command to stop Druid, replacing "{pid}" with the process Id you noted earlier.

```bash
kill {pid}
```

For example:

```bash
kill 9864
```

> If you do not remember your PID, use `ps` to look for the `supervise` process.

## Learn more

In the lab you learned that you can turn on logging for query requests with the druid.startup.logging.logProperties setting. Read all the options - including other possible targets for these logs - in the documentation. An interesting configuration, for example, automatically filters query logging for you.

Take a few minutes to scan the official documentation for information about logging configuration. You may want to keep this page to hand throughout the course.

* [Process logs](https://druid.apache.org/docs/latest/configuration/logging.html)
* [Task logs](https://druid.apache.org/docs/latest/ingestion/tasks.html#task-logs)
* [Task log configuration for remote storage](https://druid.apache.org/docs/latest/configuration/index.html#task-logging)
* [Task log configuration for retention](https://druid.apache.org/docs/latest/configuration/#log-retention-policy)
* [Setting up Min.io for remote log storage](something)