# Streaming and SQL-based ingestion logs

Apache Druid uses Log4J to emit information as it runs. They not only enable you to investigate issues and solve problems, but to understand how each of Druid's processes work in isolation and in collaboration with one another.

As well as the usual processes in Druid, Druid's highly-parallelised approach to both [streaming](https://druid.apache.org/docs/latest/ingestion/#streaming) and [batch](https://druid.apache.org/docs/latest/ingestion/#batch) ingestion is carried out by worker tasks. Each of these is a separate process that create their own log files.

In this notebook, you will look at log files for the key processes in SQL-based ingestion as well as the spawned peon processes, and see where to look to point the all-important task log files to durable storage.

## Prerequisites

This tutorial works with Druid 30.0.0 or later. It is designed to run from a Mac with a locally running instance of Druid but can also be run on common Linux distributions and on Windows with [WSL (Windows Subsystem for Linux)](https://learn.microsoft.com/en-us/windows/wsl/).

If you wish to use this tutorial within Jupyter through the [learn-druid](https://github.com/implydata/learn-druid) Docker Compose, use the `jupyter` profile to avoid starting a second instance of Druid that may cause conflicts.

## Initialization

In this step, you will find instructions to install prerequisite tools and to deploy Druid locally.

Before starting, open a terminal window.

### Install required tools

You will need the following tools:

* `brew` to install prerequisite tools.
* `wget` to pull Apache Druid from the official repository.
* `multitail` to view multiple logs files simultaneously.

For instructions on installing `brew`, see the [Homebrew homepage](https://brew.sh/).

Install `wget` and `multitail` using `brew`. For example:

```bash
brew install multitail
brew install wget
```

You may need to manually fetch a default configuration for `multitail`.

Skip this step if you are already running `multitail` as it will overwrite your own configuration.

Execute the following command to pull the default configuration to your home folder.

```bash
curl https://raw.githubusercontent.com/halturin/multitail/master/multitail.conf > ~/.multitailrc
```

### Install Apache Druid

Run the following to create a dedicated folder for learn-druid in your home directory:

```bash
cd ~ ; mkdir learn-druid-local
cd learn-druid-local
```

Pull and unpack a compatible version of Apache Druid from the [Apache Druid downloads page](https://dlcdn.apache.org/druid/), for example `30.0.0`.

```bash
version="30.0.0"
wget https://dlcdn.apache.org/druid/$version/apache-druid-$version-bin.tar.gz
tar -xzf apache-druid-$version-bin.tar.gz
```

Use the following commands to rename the folder.

```bash
mv apache-druid-$version apache-druid
cd apache-druid
```

## Process logs

Understand [SQL-based ingestion](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html) planning and execution by monitoring logs emitted from the [core processes for ingestion](https://druid.apache.org/docs/latest/ingestion/):

* Router
* Overlord
* MiddleManager
* Historical

Start Druid as a background task with the following command:

```bash
nohup ~/learn-druid-local/apache-druid/bin/start-druid & disown > log.out 2> log.err < /dev/null
```

In the terminal window, switch to the default log location:

```bash
cd log
```

Run the following command to start monitoring these logs.

```bash
multitail -CS log4jnew -du -P a -s 2 -sn 1,2 \
    -f coordinator-overlord.log \
    -f middleManager.log \
    -f historical.log
```

In the multitail window, press `0`, `1`, and `2` in turn to add a line marker to the log output.

Open a second terminal window, and run the following command to start an ingestion:

```bash
curl --location --request POST 'http://localhost:8888/druid/v2/sql/task/' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "query": "INSERT INTO \"example-wikipedia-logs\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": \"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n  )\n)\nPARTITIONED BY DAY"
  }'
```

Open Coordinator and Overlog combined log in a separate panel by pressing `b` and select the `coordinator-overlord` option.

Repeat pressing `b` until you find your line marker and notice the execution history is logged in detail:

* Tasks are queued up (`org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner`).
* Detail is written to the Metadata Database (`org.apache.druid.indexing.overlord.MetadataTaskStorage`).
* Segments identifiers are allocated and then created (`org.apache.druid.indexing.overlord.TaskLockbox`).
* Segments are completed and the shutdown is recognised.
* Segments are published to the metadata database (`Published segments to DB...`).

Press `q` to exit the panel, and then hit `b` and open the Middle Manager logs.

> Use `Y` in the `multitail` window to switch line wrapping on and off.

In the `middleManager` log, notice:

* The creation of Task Runners, each with their own Java configurations (`org.apache.druid.indexing.overlord.ForkingTaskRunner`).
* `FileTaskLogs` writing logs to specific folders for each Task Runner.
* Tasks being recorded as completed (`Task [---] completed with status [SUCCESS]`).

Press `q` to exit the panel, then hit `b` and open the Historical log.

Notice how the Historical process sees the segments and loads these from Deep Storage (being in `var/druid/segments` in this environment) and into its own local Segment Metadata cache (being in `var/druid/segment-cache`).

Spend some time looking through this log. Can you find the following important information about Druid's ingestion service?

* Ingestion planning at the Overlord.
* Parallel ingestion task creation and execution from the Overlord to the Middle Manager.
* Segment publishing by the Overlord after task completion.
* Data distribution by the Coordinator and Historicals.

When you are finished, type `q` to quit `multitail`.

## Task logs

In the example above, the MiddleManager spawned multiple other tasks (peons) that created logs then [written to a log location](https://druid.apache.org/docs/latest/ingestion/tasks#task-logs) that you can configure as an administrator.

Run this command to view the `common.runtime.properties` for the `start-druid` environment.

```bash
cat ~/learn-druid-local/apache-druid/conf/druid/auto/_common/common.runtime.properties
```

### Viewing local logs

The "Indexing Service Logs" section contains the configuration that caused the Middle Manager to write its logs to local disk:

```
druid.indexer.logs.type=file
druid.indexer.logs.directory=var/druid/indexing-logs
```

Run this command to see where task logs were written by looking at the Middle Manager log:

```bash
grep "Wrote task log to" middleManager.log
```

Notice that the log filename is made up of the query ID and the suffix `.log`.

Adapt the following `vi` command to open the log for viewing:

```bash
vi ~/learn-druid-local/apache-druid/<logfilelocation>
```

For example,

```bash
vi ~/learn-druid-local/apache-druid/var/druid/indexing-logsquery-83651faf-d179-4f4d-b0ad-ce61e36e85bd-worker0_0.log
```

The logs contain a large amount of information helpful in understanding what Druid does during an ingestion. 

* The initial Java process spin-up.
* Records of activity, broken up into "lifecycle stages" (`org.apache.druid.java.util.common.lifecycle.Lifecycle')
* The actual creation of segment files and their publication.
* A JSON object containing the task completion status.

Type `:q` and press enter to quit `vi`.

### Viewing logs via the API

Use the task log API with a query ID to access the logs over HTTP. This is the same as is used on the console to view logs.

When calling the API, you can use the Ids in the log files, or call the Tasks API to list all possible task IDs:

```bash
curl -s http://localhost:8081/druid/indexer/v1/tasks | jq '.[].id'
```

Switch `<queryID>` for an ID and then run it to pull the log file of one of the tasks.

```bash
curl http://localhost:8888/druid/indexer/v1/task/<queryID>/log
```

For example:

```bash
curl http://localhost:8888/druid/indexer/v1/task/query-83651faf-d179-4f4d-b0ad-ce61e36e85bd-worker0_0/log
```

Repeat the API call removing the `-worker0_0` portion. This will retrieve the overall log for the ingestion.

### Configurating long-term log storage

In a typical cluster with multiple nodes and Middle Managers, [configure a centralized log location](https://druid.apache.org/docs/latest/configuration/#task-logging). Processes will then migrate local log files to the long-term storage area, which is an important implementation step in a production environment.

Then either use a viewing tool for your configured long-term storage, or the API and / or console to view logs.

## Task log retention

Use `druid.indexer.logs.kill` settings in your Overlord `runtime.properties` to have Druid [manage the retention of task log files](https://druid.apache.org/docs/latest/configuration/#log-retention-policy) centrally as part of [on-going cluster hygiene](https://druid.apache.org/docs/latest/operations/clean-metadata-store#indexer-task-logs).

Run the following to take a copy of your existing runtime configuration for the Coordinator and Overlord:

```
cp ~/learn-druid-local/apache-druid/conf/druid/auto/coordinator-overlord/runtime.properties ~/learn-druid-local/apache-druid/conf/druid/auto/coordinator-overlord/runtime.properties.old
```

Before you change the configuration, stop your Druid instance by running this command:

```bash
kill $(ps -ef | grep 'supervise' | awk 'NF{print $2}' | head -n 1)
```

Now run the following to add a retention configuration that will delete task logs after 30 seconds:

```bash
echo -e "druid.indexer.logs.kill.enabled=true\ndruid.indexer.logs.kill.durationToRetain=30000\ndruid.indexer.logs.kill.initialDelay=100\ndruid.indexer.logs.kill.delay=100" >> ~/learn-druid-local/apache-druid/conf/druid/auto/coordinator-overlord/runtime.properties
```

* `druid.indexer.logs.kill.enabled` to true tells Druid to delete old task log files.
* `druid.indexer.logs.kill.durationToRetain` tells Druid how old (in milliseconds) log files must be to be deleted.
* `druid.indexer.logs.kill.initialDelay` tells Druid how long to wait (in milliseconds) before attempting to delete old log files.
* `druid.indexer.logs.kill.delay` tells Druid how long to wait (in milliseconds) after between attempting to delete old log files for the first time since the process started.

View the resulting configuration with a `cat`:

```bash
cat ~/learn-druid-local/apache-druid/conf/druid/auto/coordinator-overlord/runtime.properties
```

Run the following command to switch to the Indexing Logs location:

```bash
cd ~/learn-druid-local/apache-druid/var/druid/indexing-logs ; ls -l
```

Now restart Druid:

```bash
nohup ~/learn-druid-local/apache-druid/bin/start-druid ; disown > log.out 2> log.err < /dev/null ; disown
```

Running this command will show that the older task logs have now been deleted:

```bash
ls -l
```

# Stop Druid

Return the configuration to stop Druid.

```bash
kill $(ps -ef | grep 'supervise' | awk 'NF{print $2}' | head -n 1)
```

Run this final cell to restore the default configuration for your runtime properties files:

```bash
rm ~/learn-druid-local/apache-druid/conf/druid/auto/coordinator-overlord/runtime.properties ; \
mv ~/learn-druid-local/apache-druid/conf/druid/auto/coordinator-overlord/runtime.properties.old ~/learn-druid-local/apache-druid/conf/druid/auto/coordinator-overlord/runtime.properties
``` 

## Learn more

In this notebook you've seen how you can use Linux tools to inspect the logs of processes involved in ingestion, and seen the kinds of information that they contain. You've also seen the task-specific log files and read about how to configure Druid for long-term retention.

* Read more about [Process logs](https://druid.apache.org/docs/latest/configuration/logging.html) and [Task logs](https://druid.apache.org/docs/latest/ingestion/tasks.html#task-logs) in the official documentation, as well as:
  * [Task log configuration for remote storage](https://druid.apache.org/docs/latest/configuration/index.html#task-logging) and
  * [Task log configuration for retention](https://druid.apache.org/docs/latest/configuration/#log-retention-policy).