Merge branch 'master' into msq-null-string-arrays

gianm · Jul 27, 2023 · 9893393 · 9893393
2 parents fb2a498 + dd204e5
commit 9893393
Show file tree

Hide file tree

Showing 33 changed files with 1,761 additions and 630 deletions.
diff --git a/docs/api-reference/service-status-api.md b/docs/api-reference/service-status-api.md
diff --git a/docs/assets/web-console-0.7-tasks.png b/docs/assets/web-console-0.7-tasks.png
diff --git a/docs/assets/web-console-01-home-view.png b/docs/assets/web-console-01-home-view.png
diff --git a/docs/assets/web-console-02-data-loader-1.png b/docs/assets/web-console-02-data-loader-1.png
diff --git a/docs/assets/web-console-03-data-loader-2.png b/docs/assets/web-console-03-data-loader-2.png
diff --git a/docs/assets/web-console-04-datasources.png b/docs/assets/web-console-04-datasources.png
diff --git a/docs/assets/web-console-05-retention.png b/docs/assets/web-console-05-retention.png
diff --git a/docs/assets/web-console-06-segments.png b/docs/assets/web-console-06-segments.png
diff --git a/docs/assets/web-console-07-supervisors.png b/docs/assets/web-console-07-supervisors.png
diff --git a/docs/assets/web-console-08-supervisor-status.png b/docs/assets/web-console-08-supervisor-status.png
diff --git a/docs/assets/web-console-09-task-status.png b/docs/assets/web-console-09-task-status.png
diff --git a/docs/assets/web-console-10-servers.png b/docs/assets/web-console-10-servers.png
diff --git a/docs/assets/web-console-13-lookups.png b/docs/assets/web-console-13-lookups.png
diff --git a/docs/development/extensions-core/kinesis-ingestion.md b/docs/development/extensions-core/kinesis-ingestion.md
diff --git a/docs/ingestion/ingestion-spec.md b/docs/ingestion/ingestion-spec.md
@@ -485,7 +485,7 @@ is:
 |skipBytesInMemoryOverheadCheck|The calculation of maxBytesInMemory takes into account overhead objects created during ingestion and each intermediate persist. Setting this to true can exclude the bytes of these overhead objects from maxBytesInMemory check.|false|
 |indexSpec|Defines segment storage format options to use at indexing time.|See [`indexSpec`](#indexspec) for more information.|
 |indexSpecForIntermediatePersists|Defines segment storage format options to use at indexing time for intermediate persisted temporary segments.|See [`indexSpec`](#indexspec) for more information.|
-|Other properties|Each ingestion method has its own list of additional tuning properties. See the documentation for each method for a full list: [Kafka indexing service](../development/extensions-core/kafka-supervisor-reference.md#tuningconfig), [Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md#tuningconfig), [Native batch](native-batch.md#tuningconfig), and [Hadoop-based](hadoop.md#tuningconfig).||
+|Other properties|Each ingestion method has its own list of additional tuning properties. See the documentation for each method for a full list: [Kafka indexing service](../development/extensions-core/kafka-supervisor-reference.md#tuningconfig), [Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md#supervisor-tuning-configuration), [Native batch](native-batch.md#tuningconfig), and [Hadoop-based](hadoop.md#tuningconfig).||
 
 ### `indexSpec`
 

diff --git a/docs/operations/web-console.md b/docs/operations/web-console.md
@@ -52,14 +52,14 @@ The **Home** view displays the following cards:
 * __Status__. Click this card for information on the Druid version and any extensions loaded on the cluster.
 * [Datasources](#datasources)
 * [Segments](#segments)
-* [Supervisors](#supervisors-and-tasks)
-* [Tasks](#supervisors-and-tasks)
+* [Supervisors](#supervisors)
+* [Tasks](#tasks)
 * [Services](#services)
 * [Lookups](#lookups)
 
 You can access the [data loader](#data-loader) and [lookups view](#lookups) from the top-level navigation of the **Home** view.
 
-![home-view](../assets/web-console-01-home-view.png "home view")
+![Web console home view](../assets/web-console-01-home-view.png "home view")
 
 ## Query
 
@@ -107,15 +107,15 @@ After queries finish, you can access them by clicking on the query time indicato
 
 You can use the data loader to build an ingestion spec with a step-by-step wizard.
 
-![data-loader-1](../assets/web-console-02-data-loader-1.png)
+![Data loader tiles](../assets/web-console-02-data-loader-1.png)
 
 After selecting the location of your data, follow the series of steps displaying incremental previews of the data as it is ingested.
 After filling in the required details on every step you can navigate to the next step by clicking **Next**.
 You can also freely navigate between the steps from the top navigation.
 
 Navigating with the top navigation leaves the underlying spec unmodified while clicking **Next** attempts to fill in the subsequent steps with appropriate defaults.
 
-![data-loader-2](../assets/web-console-03-data-loader-2.png)
+![Data loader ingestion](../assets/web-console-03-data-loader-2.png)
 
 ## Datasources
 
@@ -127,50 +127,54 @@ To display a timeline of segments, toggle the option for **Show segment timeline
 
 Like any view that is powered by a Druid SQL query, you can click **View SQL query for table** from the ellipsis menu to run the underlying SQL query directly.
 
-![datasources](../assets/web-console-04-datasources.png)
+![Datasources](../assets/web-console-04-datasources.png)
 
 You can view and edit retention rules to determine the general availability of a datasource.
 
-![retention](../assets/web-console-05-retention.png)
+![Retention](../assets/web-console-05-retention.png)
 
 ## Segments
 
 The **Segments** view shows all the [segments](../design/segments.md) in the cluster.
 Each segment has a detail view that provides more information.
 The Segment ID is also conveniently broken down into Datasource, Start, End, Version, and Partition columns for ease of filtering and sorting.
 
-![segments](../assets/web-console-06-segments.png)
+![Segments](../assets/web-console-06-segments.png)
 
-## Supervisors and tasks
+## Supervisors
 
 From this view, you can check the status of existing supervisors as well as suspend, resume, and reset them.
-The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained.
+The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. Submit a supervisor spec manually by clicking the ellipsis icon and selecting **Submit JSON supervisor**.
+
+![Supervisors](../assets/web-console-07-supervisors.png)
+
+Click the magnifying glass icon for any supervisor to see detailed reports of its progress.
+
+![Supervisors status](../assets/web-console-08-supervisor-status.png)
+
+## Tasks
 
 The tasks table allows you to see the currently running and recently completed tasks.
 To navigate your tasks more easily, you can group them by their **Type**, **Datasource**, or **Status**.
 Submit a task manually by clicking the ellipsis icon and selecting **Submit JSON task**.
 
-![supervisors](../assets/web-console-07-supervisors.png)
-
-Click on the magnifying glass for any supervisor to see detailed reports of its progress.
-
-![supervisor-status](../assets/web-console-08-supervisor-status.png)
+![Tasks](../assets/web-console-0.7-tasks.png)
 
-Click on the magnifying glass for any task to see more detail about it.
+Click the magnifying glass icon for any task to see more detail about it.
 
-![tasks-status](../assets/web-console-09-task-status.png)
+![Tasks status](../assets/web-console-09-task-status.png)
 
 ## Services
 
 The **Services** view lets you see the current status of the nodes making up your cluster.
-You can group the nodes by type or by tier to get meaningful summary statistics. 
+You can group the nodes by **Type** or by **Tier** to get meaningful summary statistics. 
 
-![servers](../assets/web-console-10-servers.png)
+![Services](../assets/web-console-10-servers.png)
 
 
 ## Lookups
 
-Access the **Lookups** view from the **Lookups** card in the home view or by clicking on the gear icon in the upper right corner.
+Access the **Lookups** view from the **Lookups** card in the home view or by clicking the ellipsis icon in the top-level navigation.
 Here you can create and edit query time [lookups](../querying/lookups.md).
 
-![lookups](../assets/web-console-13-lookups.png)
+![Lookups](../assets/web-console-13-lookups.png)
diff --git a/docs/querying/datasourcemetadataquery.md b/docs/querying/datasourcemetadataquery.md
@@ -29,7 +29,7 @@ sidebar_label: "DatasourceMetadata"
 
 Data Source Metadata queries return metadata information for a dataSource.  These queries return information about:
 
-* The timestamp of latest ingested event for the dataSource. This is the ingested event without any consideration of rollup.
+* The timestamp of the latest ingested event for the dataSource. This is the ingested event without any consideration of rollup.
 
 The grammar for these queries is:
 

diff --git a/docs/querying/multitenancy.md b/docs/querying/multitenancy.md
@@ -75,7 +75,7 @@ stored on this tier.
 
 ## Supporting high query concurrency
 
-Druid uses a [segment](../design/segments.md) as its fundamental unit of computation. Processes scan segments in parallel and a given process can scan `druid.processing.numThreads` concurrently. You can add more cores to a cluster to process more data in parallel and increase performance. Size your Druid segments such that any computation over any given segment should complete in at most 500ms. Use the  the [`query/segment/time`](../operations/metrics.md#historical) metric to monitor computation times.
+Druid uses a [segment](../design/segments.md) as its fundamental unit of computation. Processes scan segments in parallel and a given process can scan `druid.processing.numThreads` concurrently. You can add more cores to a cluster to process more data in parallel and increase performance. Size your Druid segments such that any computation over any given segment should complete in at most 500ms. Use the [`query/segment/time`](../operations/metrics.md#historical) metric to monitor computation times.
 
 Druid internally stores requests to scan segments in a priority queue. If a given query requires scanning
 more segments than the total number of available processors in a cluster, and many similarly expensive queries are concurrently

diff --git a/docs/querying/querying.md b/docs/querying/querying.md
@@ -57,7 +57,7 @@ are designed to be lightweight and complete very quickly. This means that for mo
 more complex visualizations, multiple Druid queries may be required.
 
 Even though queries are typically made to Brokers or Routers, they can also be accepted by
-[Historical](../design/historical.md) processes and by [Peons (task JVMs)](../design/peons.md)) that are running
+[Historical](../design/historical.md) processes and by [Peons (task JVMs)](../design/peons.md) that are running
 stream ingestion tasks. This may be valuable if you want to query results for specific segments that are served by
 specific processes.
 

diff --git a/docs/querying/searchquery.md b/docs/querying/searchquery.md
@@ -159,7 +159,7 @@ If any part of a dimension value contains the value specified in this search que
 
 ### `fragment`
 
-If any part of a dimension value contains all of the values specified in this search query spec, regardless of case by default, a "match" occurs. The grammar is:
+If any part of a dimension value contains all the values specified in this search query spec, regardless of case by default, a "match" occurs. The grammar is:
 
 ```json
 {

diff --git a/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose-local.yaml b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose-local.yaml
@@ -71,7 +71,7 @@ services:
       - KAFKA_ENABLE_KRAFT=false
 
   coordinator:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: coordinator
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -88,7 +88,7 @@ services:
       - environment
 
   broker:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: broker
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -105,7 +105,7 @@ services:
       - environment
 
   historical:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: historical
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -123,7 +123,7 @@ services:
       - environment
 
   middlemanager:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: middlemanager
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -142,7 +142,7 @@ services:
       - environment
 
   router:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: router
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -169,6 +169,8 @@ services:
       JUPYTER_TOKEN: "docker"
       DOCKER_STACKS_JUPYTER_CMD: "lab"
       NOTEBOOK_ARGS: "--NotebookApp.token=''"
+      DRUID_HOST: "${DRUID_HOST:-router}"
+      KAFKA_HOST: "${KAFKA_HOST:-kafka}"
     ports:
       - "${JUPYTER_PORT:-8889}:8888"
     volumes:

diff --git a/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose.yaml b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose.yaml
@@ -71,7 +71,7 @@ services:
       - KAFKA_ENABLE_KRAFT=false
 
   coordinator:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: coordinator
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -88,7 +88,7 @@ services:
       - environment
 
   broker:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: broker
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -105,7 +105,7 @@ services:
       - environment
 
   historical:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: historical
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -123,7 +123,7 @@ services:
       - environment
 
   middlemanager:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: middlemanager
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -142,7 +142,7 @@ services:
       - environment
 
   router:
-    image: apache/druid:${DRUID_VERSION}
+    image: apache/druid:${DRUID_VERSION:-26.0.0}
     container_name: router
     profiles: ["druid-jupyter", "all-services"]
     volumes:
@@ -167,6 +167,8 @@ services:
       JUPYTER_TOKEN: "docker"
       DOCKER_STACKS_JUPYTER_CMD: "lab"
       NOTEBOOK_ARGS: "--NotebookApp.token=''"
+      DRUID_HOST: "${DRUID_HOST:-router}"
+      KAFKA_HOST: "${KAFKA_HOST:-kafka}"
     ports:
       - "${JUPYTER_PORT:-8889}:8888"
     volumes:

diff --git a/examples/quickstart/jupyter-notebooks/notebooks/99-contributing/notebook-template.ipynb b/examples/quickstart/jupyter-notebooks/notebooks/99-contributing/notebook-template.ipynb
@@ -0,0 +1,129 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0cb3b009-ebde-4d56-9d59-a028d66d8309",
+   "metadata": {},
+   "source": [
+    "# Title\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "Introduction to Notebook\n",
+    "Lorem Ipsum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbdbf6ad-ca7b-40f5-8ca3-1070f4a3ee42",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid XX.0.0 or later.\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `all-services` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ee6aef8-a11d-48d5-bcdc-e6231ba594b7",
+   "metadata": {},
+   "source": [
+    "<details><summary>    \n",
+    "<b>Run without Docker Compose</b>    \n",
+    "</summary>\n",
+    "\n",
+    "In order to run this notebook you will need:\n",
+    "\n",
+    "<b>Required Services</b>\n",
+    "* <!-- include list of components needed for notebook, i.e. kafka, druid instance, etc. -->\n",
+    "\n",
+    "<b>Python packages</b>\n",
+    "* druidapi, a [Python client for Apache Druid](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md)\n",
+    "*  <!-- include any python package dependencies -->\n",
+    "</details>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5007a243-b81a-4601-8f57-5b14940abbff",
+   "metadata": {},
+   "source": [
+    "### Initialization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c1ec783b-df3f-4168-9be2-cdc6ad3e33c2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n",
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c075de81-04c9-4b23-8253-20a15d46252e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# INCLUDE THIS CELL IF YOUR NOTEBOOK USES KAFKA  \n",
+    "# Use kafka_host variable when connecting to kafka \n",
+    "import os\n",
+    "\n",
+    "if 'KAFKA_HOST' not in os.environ.keys():\n",
+    "   kafka_host=f\"http://localhost:9092\"\n",
+    "else:\n",
+    "    kafka_host=f\"{os.environ['KAFKA_HOST']}:9092\""
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}