Skip to content

Commit

Permalink
Merge branch 'master' into msq-null-string-arrays
Browse files Browse the repository at this point in the history
  • Loading branch information
gianm committed Jul 27, 2023
2 parents fb2a498 + dd204e5 commit 9893393
Show file tree
Hide file tree
Showing 33 changed files with 1,761 additions and 630 deletions.
1,214 changes: 1,129 additions & 85 deletions docs/api-reference/service-status-api.md

Large diffs are not rendered by default.

Binary file added docs/assets/web-console-0.7-tasks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-01-home-view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-02-data-loader-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-03-data-loader-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-04-datasources.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-05-retention.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-06-segments.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-07-supervisors.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-08-supervisor-status.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-09-task-status.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-10-servers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/web-console-13-lookups.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
747 changes: 391 additions & 356 deletions docs/development/extensions-core/kinesis-ingestion.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/ingestion/ingestion-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ is:
|skipBytesInMemoryOverheadCheck|The calculation of maxBytesInMemory takes into account overhead objects created during ingestion and each intermediate persist. Setting this to true can exclude the bytes of these overhead objects from maxBytesInMemory check.|false|
|indexSpec|Defines segment storage format options to use at indexing time.|See [`indexSpec`](#indexspec) for more information.|
|indexSpecForIntermediatePersists|Defines segment storage format options to use at indexing time for intermediate persisted temporary segments.|See [`indexSpec`](#indexspec) for more information.|
|Other properties|Each ingestion method has its own list of additional tuning properties. See the documentation for each method for a full list: [Kafka indexing service](../development/extensions-core/kafka-supervisor-reference.md#tuningconfig), [Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md#tuningconfig), [Native batch](native-batch.md#tuningconfig), and [Hadoop-based](hadoop.md#tuningconfig).||
|Other properties|Each ingestion method has its own list of additional tuning properties. See the documentation for each method for a full list: [Kafka indexing service](../development/extensions-core/kafka-supervisor-reference.md#tuningconfig), [Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md#supervisor-tuning-configuration), [Native batch](native-batch.md#tuningconfig), and [Hadoop-based](hadoop.md#tuningconfig).||

### `indexSpec`

Expand Down
46 changes: 25 additions & 21 deletions docs/operations/web-console.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,14 +52,14 @@ The **Home** view displays the following cards:
* __Status__. Click this card for information on the Druid version and any extensions loaded on the cluster.
* [Datasources](#datasources)
* [Segments](#segments)
* [Supervisors](#supervisors-and-tasks)
* [Tasks](#supervisors-and-tasks)
* [Supervisors](#supervisors)
* [Tasks](#tasks)
* [Services](#services)
* [Lookups](#lookups)

You can access the [data loader](#data-loader) and [lookups view](#lookups) from the top-level navigation of the **Home** view.

![home-view](../assets/web-console-01-home-view.png "home view")
![Web console home view](../assets/web-console-01-home-view.png "home view")

## Query

Expand Down Expand Up @@ -107,15 +107,15 @@ After queries finish, you can access them by clicking on the query time indicato

You can use the data loader to build an ingestion spec with a step-by-step wizard.

![data-loader-1](../assets/web-console-02-data-loader-1.png)
![Data loader tiles](../assets/web-console-02-data-loader-1.png)

After selecting the location of your data, follow the series of steps displaying incremental previews of the data as it is ingested.
After filling in the required details on every step you can navigate to the next step by clicking **Next**.
You can also freely navigate between the steps from the top navigation.

Navigating with the top navigation leaves the underlying spec unmodified while clicking **Next** attempts to fill in the subsequent steps with appropriate defaults.

![data-loader-2](../assets/web-console-03-data-loader-2.png)
![Data loader ingestion](../assets/web-console-03-data-loader-2.png)

## Datasources

Expand All @@ -127,50 +127,54 @@ To display a timeline of segments, toggle the option for **Show segment timeline

Like any view that is powered by a Druid SQL query, you can click **View SQL query for table** from the ellipsis menu to run the underlying SQL query directly.

![datasources](../assets/web-console-04-datasources.png)
![Datasources](../assets/web-console-04-datasources.png)

You can view and edit retention rules to determine the general availability of a datasource.

![retention](../assets/web-console-05-retention.png)
![Retention](../assets/web-console-05-retention.png)

## Segments

The **Segments** view shows all the [segments](../design/segments.md) in the cluster.
Each segment has a detail view that provides more information.
The Segment ID is also conveniently broken down into Datasource, Start, End, Version, and Partition columns for ease of filtering and sorting.

![segments](../assets/web-console-06-segments.png)
![Segments](../assets/web-console-06-segments.png)

## Supervisors and tasks
## Supervisors

From this view, you can check the status of existing supervisors as well as suspend, resume, and reset them.
The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained.
The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. Submit a supervisor spec manually by clicking the ellipsis icon and selecting **Submit JSON supervisor**.

![Supervisors](../assets/web-console-07-supervisors.png)

Click the magnifying glass icon for any supervisor to see detailed reports of its progress.

![Supervisors status](../assets/web-console-08-supervisor-status.png)

## Tasks

The tasks table allows you to see the currently running and recently completed tasks.
To navigate your tasks more easily, you can group them by their **Type**, **Datasource**, or **Status**.
Submit a task manually by clicking the ellipsis icon and selecting **Submit JSON task**.

![supervisors](../assets/web-console-07-supervisors.png)

Click on the magnifying glass for any supervisor to see detailed reports of its progress.

![supervisor-status](../assets/web-console-08-supervisor-status.png)
![Tasks](../assets/web-console-0.7-tasks.png)

Click on the magnifying glass for any task to see more detail about it.
Click the magnifying glass icon for any task to see more detail about it.

![tasks-status](../assets/web-console-09-task-status.png)
![Tasks status](../assets/web-console-09-task-status.png)

## Services

The **Services** view lets you see the current status of the nodes making up your cluster.
You can group the nodes by type or by tier to get meaningful summary statistics.
You can group the nodes by **Type** or by **Tier** to get meaningful summary statistics.

![servers](../assets/web-console-10-servers.png)
![Services](../assets/web-console-10-servers.png)


## Lookups

Access the **Lookups** view from the **Lookups** card in the home view or by clicking on the gear icon in the upper right corner.
Access the **Lookups** view from the **Lookups** card in the home view or by clicking the ellipsis icon in the top-level navigation.
Here you can create and edit query time [lookups](../querying/lookups.md).

![lookups](../assets/web-console-13-lookups.png)
![Lookups](../assets/web-console-13-lookups.png)
2 changes: 1 addition & 1 deletion docs/querying/datasourcemetadataquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ sidebar_label: "DatasourceMetadata"
Data Source Metadata queries return metadata information for a dataSource. These queries return information about:

* The timestamp of latest ingested event for the dataSource. This is the ingested event without any consideration of rollup.
* The timestamp of the latest ingested event for the dataSource. This is the ingested event without any consideration of rollup.

The grammar for these queries is:

Expand Down
2 changes: 1 addition & 1 deletion docs/querying/multitenancy.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ stored on this tier.

## Supporting high query concurrency

Druid uses a [segment](../design/segments.md) as its fundamental unit of computation. Processes scan segments in parallel and a given process can scan `druid.processing.numThreads` concurrently. You can add more cores to a cluster to process more data in parallel and increase performance. Size your Druid segments such that any computation over any given segment should complete in at most 500ms. Use the the [`query/segment/time`](../operations/metrics.md#historical) metric to monitor computation times.
Druid uses a [segment](../design/segments.md) as its fundamental unit of computation. Processes scan segments in parallel and a given process can scan `druid.processing.numThreads` concurrently. You can add more cores to a cluster to process more data in parallel and increase performance. Size your Druid segments such that any computation over any given segment should complete in at most 500ms. Use the [`query/segment/time`](../operations/metrics.md#historical) metric to monitor computation times.

Druid internally stores requests to scan segments in a priority queue. If a given query requires scanning
more segments than the total number of available processors in a cluster, and many similarly expensive queries are concurrently
Expand Down
2 changes: 1 addition & 1 deletion docs/querying/querying.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ are designed to be lightweight and complete very quickly. This means that for mo
more complex visualizations, multiple Druid queries may be required.

Even though queries are typically made to Brokers or Routers, they can also be accepted by
[Historical](../design/historical.md) processes and by [Peons (task JVMs)](../design/peons.md)) that are running
[Historical](../design/historical.md) processes and by [Peons (task JVMs)](../design/peons.md) that are running
stream ingestion tasks. This may be valuable if you want to query results for specific segments that are served by
specific processes.

Expand Down
2 changes: 1 addition & 1 deletion docs/querying/searchquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ If any part of a dimension value contains the value specified in this search que

### `fragment`

If any part of a dimension value contains all of the values specified in this search query spec, regardless of case by default, a "match" occurs. The grammar is:
If any part of a dimension value contains all the values specified in this search query spec, regardless of case by default, a "match" occurs. The grammar is:

```json
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ services:
- KAFKA_ENABLE_KRAFT=false

coordinator:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: coordinator
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -88,7 +88,7 @@ services:
- environment

broker:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: broker
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -105,7 +105,7 @@ services:
- environment

historical:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: historical
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -123,7 +123,7 @@ services:
- environment

middlemanager:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: middlemanager
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -142,7 +142,7 @@ services:
- environment

router:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: router
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -169,6 +169,8 @@ services:
JUPYTER_TOKEN: "docker"
DOCKER_STACKS_JUPYTER_CMD: "lab"
NOTEBOOK_ARGS: "--NotebookApp.token=''"
DRUID_HOST: "${DRUID_HOST:-router}"
KAFKA_HOST: "${KAFKA_HOST:-kafka}"
ports:
- "${JUPYTER_PORT:-8889}:8888"
volumes:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ services:
- KAFKA_ENABLE_KRAFT=false

coordinator:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: coordinator
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -88,7 +88,7 @@ services:
- environment

broker:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: broker
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -105,7 +105,7 @@ services:
- environment

historical:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: historical
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -123,7 +123,7 @@ services:
- environment

middlemanager:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: middlemanager
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -142,7 +142,7 @@ services:
- environment

router:
image: apache/druid:${DRUID_VERSION}
image: apache/druid:${DRUID_VERSION:-26.0.0}
container_name: router
profiles: ["druid-jupyter", "all-services"]
volumes:
Expand All @@ -167,6 +167,8 @@ services:
JUPYTER_TOKEN: "docker"
DOCKER_STACKS_JUPYTER_CMD: "lab"
NOTEBOOK_ARGS: "--NotebookApp.token=''"
DRUID_HOST: "${DRUID_HOST:-router}"
KAFKA_HOST: "${KAFKA_HOST:-kafka}"
ports:
- "${JUPYTER_PORT:-8889}:8888"
volumes:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0cb3b009-ebde-4d56-9d59-a028d66d8309",
"metadata": {},
"source": [
"# Title\n",
"<!--\n",
" ~ Licensed to the Apache Software Foundation (ASF) under one\n",
" ~ or more contributor license agreements. See the NOTICE file\n",
" ~ distributed with this work for additional information\n",
" ~ regarding copyright ownership. The ASF licenses this file\n",
" ~ to you under the Apache License, Version 2.0 (the\n",
" ~ \"License\"); you may not use this file except in compliance\n",
" ~ with the License. You may obtain a copy of the License at\n",
" ~\n",
" ~ http://www.apache.org/licenses/LICENSE-2.0\n",
" ~\n",
" ~ Unless required by applicable law or agreed to in writing,\n",
" ~ software distributed under the License is distributed on an\n",
" ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
" ~ KIND, either express or implied. See the License for the\n",
" ~ specific language governing permissions and limitations\n",
" ~ under the License.\n",
" -->\n",
"Introduction to Notebook\n",
"Lorem Ipsum"
]
},
{
"cell_type": "markdown",
"id": "bbdbf6ad-ca7b-40f5-8ca3-1070f4a3ee42",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"This tutorial works with Druid XX.0.0 or later.\n",
"\n",
"Launch this tutorial and all prerequisites using the `all-services` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n"
]
},
{
"cell_type": "markdown",
"id": "7ee6aef8-a11d-48d5-bcdc-e6231ba594b7",
"metadata": {},
"source": [
"<details><summary> \n",
"<b>Run without Docker Compose</b> \n",
"</summary>\n",
"\n",
"In order to run this notebook you will need:\n",
"\n",
"<b>Required Services</b>\n",
"* <!-- include list of components needed for notebook, i.e. kafka, druid instance, etc. -->\n",
"\n",
"<b>Python packages</b>\n",
"* druidapi, a [Python client for Apache Druid](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md)\n",
"* <!-- include any python package dependencies -->\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "5007a243-b81a-4601-8f57-5b14940abbff",
"metadata": {},
"source": [
"### Initialization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1ec783b-df3f-4168-9be2-cdc6ad3e33c2",
"metadata": {},
"outputs": [],
"source": [
"import druidapi\n",
"import os\n",
"\n",
"if 'DRUID_HOST' not in os.environ.keys():\n",
" druid_host=f\"http://localhost:8888\"\n",
"else:\n",
" druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
" \n",
"print(f\"Opening a connection to {druid_host}.\")\n",
"druid = druidapi.jupyter_client(druid_host)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c075de81-04c9-4b23-8253-20a15d46252e",
"metadata": {},
"outputs": [],
"source": [
"# INCLUDE THIS CELL IF YOUR NOTEBOOK USES KAFKA \n",
"# Use kafka_host variable when connecting to kafka \n",
"import os\n",
"\n",
"if 'KAFKA_HOST' not in os.environ.keys():\n",
" kafka_host=f\"http://localhost:9092\"\n",
"else:\n",
" kafka_host=f\"{os.environ['KAFKA_HOST']}:9092\""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 9893393

Please sign in to comment.