The request contained an invalid host header [nifi:8443] in the request [/nifi-api]. Check for request manipulation or third-party intercept.
+ ENDPOINT = f"https://nifi-node-default-0.nifi-node-default.{os.environ['NAMESPACE']}.svc.cluster.local:8443" # For local testing / developing replace it, afterwards change back to f"https://nifi-node-default-0.nifi-node-default.{os.environ['NAMESPACE']}.svc.cluster.local:8443"
+ USERNAME = "admin"
+ PASSWORD = "adminadmin"
+ TEMPLATE_NAME = "IngestWaterLevelsToKafka"
+ TEMPLATE_FILE = f"{TEMPLATE_NAME}.xml"
+
+ urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
+
+ nipyapi.config.nifi_config.host = f"{ENDPOINT}/nifi-api"
+ nipyapi.config.nifi_config.verify_ssl = False
+
+ print("Logging in")
+ service_login(username=USERNAME, password=PASSWORD)
+ print("Logged in")
+
+ pg_id = get_root_pg_id()
+
+ upload_template(pg_id, TEMPLATE_FILE)
+
+ template_id = get_template(TEMPLATE_NAME).id
+ deploy_template(pg_id, template_id, 200, 0)
+
+ for controller in list_all_controllers():
+ schedule_controller(controller, scheduled=True)
+
+ schedule_process_group(pg_id, scheduled=True)
diff --git a/demos/kafka-druid-water-level-data/queries.txt b/demos/nifi-kafka-druid-water-level-data/queries.txt
similarity index 100%
rename from demos/kafka-druid-water-level-data/queries.txt
rename to demos/nifi-kafka-druid-water-level-data/queries.txt
diff --git a/demos/kafka-druid-water-level-data/setup-superset.yaml b/demos/nifi-kafka-druid-water-level-data/setup-superset.yaml
similarity index 93%
rename from demos/kafka-druid-water-level-data/setup-superset.yaml
rename to demos/nifi-kafka-druid-water-level-data/setup-superset.yaml
index 3a8358fd..7552fe85 100644
--- a/demos/kafka-druid-water-level-data/setup-superset.yaml
+++ b/demos/nifi-kafka-druid-water-level-data/setup-superset.yaml
@@ -9,17 +9,16 @@ spec:
containers:
- name: setup-superset
image: docker.stackable.tech/stackable/testing-tools:0.1.0-stackable0.1.0
- command: ["bash", "-c", "curl -o superset-assets.zip https://raw.githubusercontent.com/stackabletech/stackablectl/main/demos/kafka-druid-water-level-data/superset-assets.zip && python -u /tmp/script/script.py"]
+ command: ["bash", "-c", "curl -o superset-assets.zip https://raw.githubusercontent.com/stackabletech/stackablectl/main/demos/nifi-kafka-druid-water-level-data/superset-assets.zip && python -u /tmp/script/script.py"]
volumeMounts:
- name: script
mountPath: /tmp/script
- restartPolicy: OnFailure
volumes:
- name: script
configMap:
name: setup-superset-script
- restartPolicy: Never
- backoffLimit: 50 # It can take some time until Superset is ready
+ restartPolicy: OnFailure
+ backoffLimit: 50
---
apiVersion: v1
kind: ConfigMap
diff --git a/demos/kafka-druid-water-level-data/superset-assets.zip b/demos/nifi-kafka-druid-water-level-data/superset-assets.zip
similarity index 100%
rename from demos/kafka-druid-water-level-data/superset-assets.zip
rename to demos/nifi-kafka-druid-water-level-data/superset-assets.zip
diff --git a/demos/trino-taxi-data/create-table-in-trino.yaml b/demos/trino-taxi-data/create-table-in-trino.yaml
index a1934fa8..97627e39 100644
--- a/demos/trino-taxi-data/create-table-in-trino.yaml
+++ b/demos/trino-taxi-data/create-table-in-trino.yaml
@@ -13,13 +13,12 @@ spec:
volumeMounts:
- name: script
mountPath: /tmp/script
- restartPolicy: OnFailure
volumes:
- name: script
configMap:
name: create-ny-taxi-data-table-in-trino-script
- restartPolicy: Never
- backoffLimit: 50 # It can take some time until Trino is ready
+ restartPolicy: OnFailure
+ backoffLimit: 50
---
apiVersion: v1
kind: ConfigMap
diff --git a/demos/trino-taxi-data/load-test-data.yaml b/demos/trino-taxi-data/load-test-data.yaml
index 566b8929..f7263615 100644
--- a/demos/trino-taxi-data/load-test-data.yaml
+++ b/demos/trino-taxi-data/load-test-data.yaml
@@ -11,3 +11,4 @@ spec:
image: "bitnami/minio:2022-debian-10"
command: ["bash", "-c", "cd /tmp && for month in 2020-01 2020-02 2020-03 2020-04 2020-05 2020-06 2020-07 2020-08 2020-09 2020-10 2020-11 2020-12 2021-01 2021-02 2021-03 2021-04 2021-05 2021-06 2021-07 2021-08 2021-09 2021-10 2021-11 2021-12 2022-01 2022-02 2022-03 2022-04; do curl -O https://repo.stackable.tech/repository/misc/ny-taxi-data/yellow_tripdata_$month.parquet && mc --insecure alias set minio http://minio-trino:9000/ demo demodemo && mc cp yellow_tripdata_$month.parquet minio/demo/ny-taxi-data/raw/; done"]
restartPolicy: OnFailure
+ backoffLimit: 50
diff --git a/demos/trino-taxi-data/setup-superset.yaml b/demos/trino-taxi-data/setup-superset.yaml
index 8c0884b6..bdc3e803 100644
--- a/demos/trino-taxi-data/setup-superset.yaml
+++ b/demos/trino-taxi-data/setup-superset.yaml
@@ -13,13 +13,12 @@ spec:
volumeMounts:
- name: script
mountPath: /tmp/script
- restartPolicy: OnFailure
volumes:
- name: script
configMap:
name: setup-superset-script
- restartPolicy: Never
- backoffLimit: 50 # It can take some time until Superset is ready
+ restartPolicy: OnFailure
+ backoffLimit: 50
---
apiVersion: v1
kind: ConfigMap
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/overview.png b/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/overview.png
deleted file mode 100644
index 594e77a4..00000000
Binary files a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/overview.png and /dev/null differ
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_1.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_1.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_1.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_1.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_2.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_2.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_2.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_2.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_3.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_3.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_3.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_3.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_4.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_4.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_4.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_4.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_5.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_5.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_5.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_5.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_6.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_6.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_6.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_6.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_7.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_7.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_7.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_7.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_8.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_8.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/druid_8.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/druid_8.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_1.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_1.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_1.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_1.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_2.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_2.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_2.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_2.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_3.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_3.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_3.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_3.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_4.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_4.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_4.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_4.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_5.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_5.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/minio_5.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/minio_5.png
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_1.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_1.png
new file mode 100644
index 00000000..b7301dc4
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_1.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_10.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_10.png
new file mode 100644
index 00000000..360bf0de
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_10.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_11.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_11.png
new file mode 100644
index 00000000..c8f916d4
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_11.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_12.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_12.png
new file mode 100644
index 00000000..59f8820a
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_12.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_2.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_2.png
new file mode 100644
index 00000000..f28f74a9
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_2.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_3.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_3.png
new file mode 100644
index 00000000..91663776
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_3.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_4.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_4.png
new file mode 100644
index 00000000..32f45be8
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_4.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_5.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_5.png
new file mode 100644
index 00000000..309551a4
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_5.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_6.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_6.png
new file mode 100644
index 00000000..054b1d89
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_6.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_7.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_7.png
new file mode 100644
index 00000000..20eb80d2
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_7.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_8.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_8.png
new file mode 100644
index 00000000..72a77f94
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_8.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_9.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_9.png
new file mode 100644
index 00000000..0ba8b007
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/nifi_9.png differ
diff --git a/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/overview.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/overview.png
new file mode 100644
index 00000000..eae66ee2
Binary files /dev/null and b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/overview.png differ
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_1.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_1.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_1.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_1.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_10.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_10.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_10.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_10.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_11.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_11.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_11.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_11.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_12.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_12.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_12.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_12.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_13.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_13.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_13.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_13.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_2.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_2.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_2.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_2.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_3.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_3.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_3.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_3.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_4.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_4.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_4.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_4.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_5.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_5.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_5.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_5.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_6.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_6.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_6.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_6.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_7.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_7.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_7.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_7.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_8.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_8.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_8.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_8.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_9.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_9.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/superset_9.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/superset_9.png
diff --git a/docs/modules/ROOT/images/demo-kafka-druid-water-level-data/topics.png b/docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/topics.png
similarity index 100%
rename from docs/modules/ROOT/images/demo-kafka-druid-water-level-data/topics.png
rename to docs/modules/ROOT/images/demo-nifi-kafka-druid-water-level-data/topics.png
diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc
index 37b35f0e..92655ae8 100644
--- a/docs/modules/ROOT/nav.adoc
+++ b/docs/modules/ROOT/nav.adoc
@@ -8,8 +8,8 @@
** xref:commands/stack.adoc[]
* xref:demos/index.adoc[]
** xref:demos/airflow-scheduled-job.adoc[]
-** xref:demos/kafka-druid-water-level-data.adoc[]
** xref:demos/nifi-kafka-druid-earthquake-data.adoc[]
+** xref:demos/nifi-kafka-druid-water-level-data.adoc[]
** xref:demos/trino-taxi-data.adoc[]
* xref:customization.adoc[]
* xref:troubleshooting.adoc[]
diff --git a/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc
index 00dc4567..32f169b1 100644
--- a/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc
+++ b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc
@@ -16,7 +16,7 @@ This demo will
** *NiFi*: An easy-to-use, powerful system to process and distribute data. This demos uses it to fetch earthquake-data from the internet and ingest it into Kafka
** *Druid*: A real-time database to power modern analytics applications. This demo uses it to ingest the near real-time data from Kafka, store it and enable to access the data via SQL
** *MinIO*: A S3 compatible object store. This demo uses it as persistent storage for Druid to store all the data used
-* Continuously emit approximately 10.000 records/s of https://earthquake.usgs.gov/[earthquake data] into Kafka
+* Continuously emit approximately 10,000 records/s of https://earthquake.usgs.gov/[earthquake data] into Kafka
* Start a Druid ingestion job that ingests the data into the Druid instance
* Create Superset dashboards for visualization of the data
@@ -136,7 +136,7 @@ Head over to the Tab `PROPERTIES`.
image::demo-nifi-kafka-druid-earthquake-data/nifi_4.png[]
-Here you can see the setting `Remote URl` which specifies the download URL from where the CSV file is retrieved.
+Here you can see the setting `Remote URl`, which specifies the download URL from where the CSV file is retrieved.
Close the processor details popup by clicking `OK`.
Afterwards double-click on the processor `PublishKafkaRecord_2_6`.
@@ -309,7 +309,7 @@ image::demo-nifi-kafka-druid-earthquake-data/minio_4.png[]
If you open up a prefix for a specific year you can see that Druid has placed a file containing the data of that year there.
== Summary
-The demo streamed 10.000 earthquake records/s for a total of ~3 million earthquakes into a Kafka steaming pipeline.
+The demo streamed 10,000 earthquake records/s for a total of ~3 million earthquakes into a Kafka steaming pipeline.
Druid ingested the data near real-time into its data source and enabled SQL access to it.
Superset was used as a web-based frontend to execute SQL statements and build dashboards.
diff --git a/docs/modules/ROOT/pages/demos/kafka-druid-water-level-data.adoc b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc
similarity index 68%
rename from docs/modules/ROOT/pages/demos/kafka-druid-water-level-data.adoc
rename to docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc
index 9df1adb1..af6b4036 100644
--- a/docs/modules/ROOT/pages/demos/kafka-druid-water-level-data.adoc
+++ b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc
@@ -1,10 +1,10 @@
-= kafka-druid-water-level-data
+= nifi-kafka-druid-water-level-data
[NOTE]
====
-This guide assumes you already have the demo `kafka-druid-water-level-data` installed.
+This guide assumes you already have the demo `nifi-kafka-druid-water-level-data` installed.
If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
-To put it simply you have to run `stackablectl demo install kafka-druid-water-level-data`.
+To put it simply you have to run `stackablectl demo install nifi-kafka-druid-water-level-data`.
====
This demo will
@@ -12,7 +12,8 @@ This demo will
* Install the required Stackable operators
* Spin up the following data products
** *Superset*: A modern data exploration and visualization platform. This demo utilizes Superset to retrieve data from Druid via SQL queries and build dashboards on top of that data
-** *Kafka*: A distributed event streaming platform for high-performance data pipelines, streaming analytics and data integration. This demos uses it as a event streaming platform to stream the data in near real-time
+** *Kafka*: A distributed event streaming platform for high-performance data pipelines, streaming analytics and data integration. This demos uses it as an event streaming platform to stream the data in near real-time
+** *NiFi*: An easy-to-use, powerful system to process and distribute data. This demos uses it to fetch water-level-data from the internet and ingest it into Kafka
** *Druid*: A real-time database to power modern analytics applications. This demo uses it to ingest the near real-time data from Kafka, store it and enable to access the data via SQL
** *MinIO*: A S3 compatible object store. This demo uses it as persistent storage for Druid to store all the data used
* Ingest water level data from the https://www.pegelonline.wsv.de/webservice/ueberblick[PEGELONLINE webservice] into Kafka. The data contains measured water levels of different measuring stations all around Germany. If the webservice is not available this demo will not work, as it needs the webservice to ingest the data.
@@ -25,7 +26,7 @@ The whole data pipeline will have a very low latency from putting a record into
You can see the deployed products as well as their relationship in the following diagram:
-image::demo-kafka-druid-water-level-data/overview.png[]
+image::demo-nifi-kafka-druid-water-level-data/overview.png[]
== List deployed Stackable services
To list the installed installed Stackable services run the following command:
@@ -42,13 +43,15 @@ $ stackablectl services list --all-namespaces
router-http http://172.18.0.4:30899
kafka kafka default kafka 172.18.0.3:32536
+
+ nifi nifi default https https://172.18.0.3:32440 Admin user: admin, password: adminadmin
- superset superset default external-superset http://172.18.0.4:32251 Admin user: admin, password: admin
+ superset superset default external-superset http://172.18.0.4:32251 Admin user: admin, password: admin
zookeeper zookeeper default zk 172.18.0.3:31615
- minio minio-druid default http http://172.18.0.5:30016 Third party service
- console-http http://172.18.0.5:32595 Admin user: root, password: rootroot
+ minio minio-druid default http http://172.18.0.5:30016 Third party service
+ console-http http://172.18.0.5:32595 Admin user: root, password: rootroot
----
[NOTE]
@@ -152,7 +155,7 @@ The records of the two topics only contain the needed data.
The measurement records contain a `station_uuid` to refer to the measuring station.
The relationship is illustrated below.
-image::demo-kafka-druid-water-level-data/topics.png[]
+image::demo-nifi-kafka-druid-water-level-data/topics.png[]
The reason for splitting the data up into two different topics is the improved performance.
One simpler solution would be to use a single topic and produce records that look like the following:
@@ -210,24 +213,111 @@ Topic measurements / Partition 0 / Offset: 7586541 / Timestamp: 1660831499070
The output shows that the last measurement record was produced at the timestamp `1660831499070` which translates to `Do 18. Aug 16:04:59 CEST 2022` (using the command `date -d @1660831499`).
You can also see that it was the record number `7586541` send to this topic, so ~7.6 million records have been produced so far.
+== NiFi
+
+NiFi is used to fetch water-level-data from the internet and ingest it into Kafka near-realtime.
+This demo includes a workflow ("process group") that fetches the last 30 days of historical measurements and produces the records into Kafka.
+It also keeps streaming near-realtime updates for every available measuring station.
+
+=== View testdata-generation job
+You can have a look at the ingestion job running in NiFi by opening the given `nifi` endpoint `https` from your `stackablectl services list` command output.
+You have to use the endpoint from your command output, in this case it is https://172.18.0.3:32440. Open it with your favorite browser.
+If you get a warning regarding the self-signed certificate generated by the xref:secret-operator::index.adoc[Secret Operator] (e.g. `Warning: Potential Security Risk Ahead`), you have to tell your browser to trust the website and continue.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_1.png[]
+
+Log in with the username `admin` and password `adminadmin`.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_2.png[]
+
+As you can see, the NiFi workflow consists of lot's of components.
+It is split into two main components:
+
+1. On the left is the part bulk-loading all the known stations and the historical data of the last 30 days
+2. On the right it the other part iterating over all stations and emitting the current measurement in an endless loop
+
+You can zoom in by using your mouse and mouse wheel.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_3.png[]
+image::demo-nifi-kafka-druid-water-level-data/nifi_4.png[]
+
+The left workflows works as follows:
+
+1. The `Get station list` processors fetches the current list of stations as JSON via HTTP from the https://www.pegelonline.wsv.de/webservice/ueberblick[PEGELONLINE webservice].
+2. `Produce stations records` takes the list and produces a Kafka record for every station into the topic `stations`
+3. `SplitRecords` simultaneously takes the single FlowFile (NiFI record) containing all the stations and crates a new FlowFile for every station
+4. `Extract station_uuid` takes every FlowFile representing a station and extract the attribute `station_uuid` into the metadata of the FlowFile
+5. `Get historic measurements` calls the https://www.pegelonline.wsv.de/webservice/ueberblick[PEGELONLINE webservice] for every station and fetches the measurements of the last 30 days. All failures are routed to the `LogAttribute` processor to inspect them in case any failure occur.
+6. `Add station_uuid` will add the attribute `station_uuid` to the JSON list of measurements returned from the https://www.pegelonline.wsv.de/webservice/ueberblick[PEGELONLINE webservice], which is missing this information.
+7. `PublishKafkaRecord_2_6` finally emits every measurement as a Kafka records to the topic `measurements`. All failures are routed to the `LogAttribute` processor to inspect them in case any failures occur.
+
+The right side works similar, but is executed in an endless loop to stream the data in near-realtime.
+
+Double-click on the `Get station list` processor to show the processor details.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_5.png[]
+
+Head over to the tab `PROPERTIES`.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_6.png[]
+
+Here you can see the setting `Remote URl`, which specifies the download URL from where the JSON file containing the stations is retrieved.
+Close the processor details popup by clicking `OK`.
+You can also have a detailed view of the `Produce station records` processor by double-clicking it.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_7.png[]
+
+Within this processor the Kafka connection details - like broker addresses and topic name - are specified.
+It uses the `JsonTreeReader` to parse the downloaded JSON and the `JsonRecordSetWriter` to split it into individual JSON records before writing it out.
+
+Double-click the `Get historic measurements` processor.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_8.png[]
+
+This processor fetched the historical data for every station.
+Click on the `Remote URL` property.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_9.png[]
+
+The `Remote URL` does contain the `${station_uuid}` placeholder, which get's replaced for every station.
+
+Double-click the `PublishKafkaRecord_2_6` processor.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_10.png[]
+
+You can also see the number of produced records by right-clicking on `PublishKafkaRecord_2_6` and selecting `View status history`.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_11.png[]
+
+You have to choose `Messages Send (5 mins)` in the top right corner.
+Afterwards you can see that ~10 million records got produced in ~5 minutes, which corresponds to ~30k measurements/s.
+Keep in mind that the demos uses a single-node NiFi setup, the performance can been increased by using multiple Nodes.
+
+Speaking of the NiFi resources, on the top right corner use the hamburger menu icon and select `Node Status History`.
+
+image::demo-nifi-kafka-druid-water-level-data/nifi_12.png[]
+
+The diagram shows the used heap size of the NiFi node.
+You can also select other metrics to show in the top right corner.
+
== Druid
Druid is used to ingest the near real-time data from Kafka, store it and enable SQL access to it.
The demo has started two ingestion jobs - one reading from the topic `stations` and the other from `measurements` - and saving it into Druids deep storage.
The Druid deep storage is based on the S3 store provided by MinIO.
=== View ingestion job
-You can have a look at the ingestion jobs running in Druid by opening the given `druid` endpoint `router-http` from your `stackablectl services list` command output. You have to use the endpoint from your command output, in this case it is http://172.18.0.4:30899. Open it with your favorite browser.
+You can have a look at the ingestion jobs running in Druid by opening the given `druid` endpoint `router-http` from your `stackablectl services list` command output (http://172.18.0.4:30899 in this case).
-image::demo-kafka-druid-water-level-data/druid_1.png[]
+image::demo-nifi-kafka-druid-water-level-data/druid_1.png[]
By clicking on `Ingestion` at the top you can see the running ingestion jobs.
-image::demo-kafka-druid-water-level-data/druid_2.png[]
+image::demo-nifi-kafka-druid-water-level-data/druid_2.png[]
After clicking on the magnification glass to the right side of the `RUNNING` supervisor you can see additional information (here the supervisor `measurements` was chosen).
On the tab `Statistics` on the left you can see the number of processed records as well as the number of errors.
-image::demo-kafka-druid-water-level-data/druid_3.png[]
+image::demo-nifi-kafka-druid-water-level-data/druid_3.png[]
The statistics show that Druid is currently ingesting `3597` records/s and has ingested ~10 million records so far.
All records have been ingested successfully, which is indicated by having no `processWithError`, `thrownAway` or `unparseable` records.
@@ -236,7 +326,7 @@ All records have been ingested successfully, which is indicated by having no `pr
The started ingestion jobs have automatically created the Druid data sources `stations` and `measurements`.
You can see the available data sources by clicking on `Datasources` at the top.
-image::demo-kafka-druid-water-level-data/druid_4.png[]
+image::demo-nifi-kafka-druid-water-level-data/druid_4.png[]
The `Avg. row size (bytes)` shows, that a typical `measurement` record has `4` bytes, while a `station` record has `213` bytes, which is more than 50 times the size.
So with choosing two dedicated topics over a single topic, this demo was able to save 50x of storage and computation costs.
@@ -244,12 +334,12 @@ So with choosing two dedicated topics over a single topic, this demo was able to
By clicking on the `measurements` data source you can see the segments of which the data source consists of.
In this case the `measurements` data source is partitioned by the day of the measurement, resulting in 33 segments.
-image::demo-kafka-druid-water-level-data/druid_5.png[]
+image::demo-nifi-kafka-druid-water-level-data/druid_5.png[]
Druid offers a web-based way of querying the data sources via SQL.
To achieve this you first have to click on `Query` at the top.
-image::demo-kafka-druid-water-level-data/druid_6.png[]
+image::demo-nifi-kafka-druid-water-level-data/druid_6.png[]
You can now enter any arbitrary SQL statement, to e.g. list 10 stations run
@@ -258,7 +348,7 @@ You can now enter any arbitrary SQL statement, to e.g. list 10 stations run
select * from stations limit 10
----
-image::demo-kafka-druid-water-level-data/druid_7.png[]
+image::demo-nifi-kafka-druid-water-level-data/druid_7.png[]
To count the measurements per day run
@@ -272,40 +362,40 @@ group by 1
order by 1 desc
----
-image::demo-kafka-druid-water-level-data/druid_8.png[]
+image::demo-nifi-kafka-druid-water-level-data/druid_8.png[]
== Superset
Superset provides the ability to execute SQL queries and build dashboards.
Open the `superset` endpoint `external-superset` in your browser (http://172.18.0.4:32251 in this case).
-image::demo-kafka-druid-water-level-data/superset_1.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_1.png[]
Log in with the username `admin` and password `admin`.
-image::demo-kafka-druid-water-level-data/superset_2.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_2.png[]
=== View dashboard
The demo has created a Dashboard to visualize the water level data.
To open it click on the tab `Dashboards` at the top.
-image::demo-kafka-druid-water-level-data/superset_3.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_3.png[]
Click on the dashboard called `Water level data`.
It might take some time until the dashboards renders all the included charts.
-image::demo-kafka-druid-water-level-data/superset_4.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_4.png[]
=== View charts
The dashboard `Water level data` consists of multiple charts.
To list the charts click on the tab `Charts` at the top.
-image::demo-kafka-druid-water-level-data/superset_5.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_5.png[]
Click on the Chart `Measurements / hour`.
On the left side you can modify the chart and click on `Run` to see the effect.
-image::demo-kafka-druid-water-level-data/superset_6.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_6.png[]
You can see that starting from `2022/08/12` some stations didn't measure or transmit their data.
They started sending measurements again at `2022/08/14`.
@@ -315,24 +405,24 @@ They started sending measurements again at `2022/08/14`.
To look at the geographical distribution of the stations you have to click on the tab `Charts` at the top again.
Afterwards click on the chart `Stations distribution`.
-image::demo-kafka-druid-water-level-data/superset_7.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_7.png[]
The stations are of course placed alongside of waters.
They are colored by the waters they measure, so all stations alongside a body of water have the same color.
You can move and zoom the map with your mouse to interactively explore the map.
You can e.g. have a detailed look at the water https://en.wikipedia.org/wiki/Rhine[Rhein].
-image::demo-kafka-druid-water-level-data/superset_8.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_8.png[]
=== Execute arbitrary SQL statements
Within Superset you can not only create dashboards but also run arbitrary SQL statements.
On the top click on the tab `SQL Lab` -> `SQL Editor`.
-image::demo-kafka-druid-water-level-data/superset_9.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_9.png[]
On the left select the database `druid`, the schema `druid` and set `See table schema` to `stations` or `measurements`.
-image::demo-kafka-druid-water-level-data/superset_10.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_10.png[]
On the right textbox enter the desired SQL statement.
We need to join the two tables to get interesting results.
@@ -348,7 +438,7 @@ group by 1
order by 2 desc
----
-image::demo-kafka-druid-water-level-data/superset_11.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_11.png[]
You can also find out the number of measurements for every body of water:
@@ -362,7 +452,7 @@ group by 1
order by 2 desc
----
-image::demo-kafka-druid-water-level-data/superset_12.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_12.png[]
What might also be interesting is the average and current measurement of the stations:
@@ -378,39 +468,39 @@ group by 1
order by 2 desc
----
-image::demo-kafka-druid-water-level-data/superset_13.png[]
+image::demo-nifi-kafka-druid-water-level-data/superset_13.png[]
== MinIO
The S3 provided by MinIO is used as a persistent deep storage for Druid to store all the data used.
Open the `minio` endpoint `console-http` retrieved by `stackablectl services list` in your browser (http://172.18.0.5:32595 in this case).
-image::demo-kafka-druid-water-level-data/minio_1.png[]
+image::demo-nifi-kafka-druid-water-level-data/minio_1.png[]
Log in with the username `root` and password `rootroot`.
-image::demo-kafka-druid-water-level-data/minio_2.png[]
+image::demo-nifi-kafka-druid-water-level-data/minio_2.png[]
Click on the blue button `Browse` on the bucket `druid` and open the folders `data`.
-image::demo-kafka-druid-water-level-data/minio_3.png[]
+image::demo-nifi-kafka-druid-water-level-data/minio_3.png[]
You can see the druid has created a folder for both data sources.
Go ahead and open the folder `measurements`.
-image::demo-kafka-druid-water-level-data/minio_4.png[]
+image::demo-nifi-kafka-druid-water-level-data/minio_4.png[]
As you can see druid saved 35MB of data within 33 prefixes (folders).
One prefix corresponds to one segment which in turn contains all the measurements of a day.
If you don't see any folders or files, the reason is that Druid has not saved its data from memory to the deep storage yet.
After waiting for a few minutes the data should have been flushed to S3 and show up.
-image::demo-kafka-druid-water-level-data/minio_5.png[]
+image::demo-nifi-kafka-druid-water-level-data/minio_5.png[]
If you open up a prefix for a specific day you can see that Druid has placed a file containing the data of that day there.
== Summary
The demo put station records into the Kafka stream pipeline topic `station`.
-It also streamed ~3500 measurements/s for a total of ~11 million measurements into the topic `measurements`.
+It also streamed ~30,000 measurements/s for a total of ~11 million measurements into the topic `measurements`.
Druid ingested the data near real-time into its data source and enabled SQL access to it.
Superset was used as a web-based frontend to execute SQL statements and build dashboards.
@@ -427,6 +517,7 @@ You also have the possibility to create additional charts and bundle them togeth
Have a look at https://superset.apache.org/docs/creating-charts-dashboards/creating-your-first-dashboard#creating-charts-in-explore-view[the Superset documentation] on how to do that.
=== Load additional data
+You can use the NiFi web interface to collect arbitrary data and write it to Kafka (it's recommended to use new Kafka topics for that).
You can use a Kafka client like https://github.com/edenhill/kcat[kafkacat] to create new topics and ingest data.
Using the Druid web interface, you can start an ingestion job that consumes the data and stores it in an internal data source.
There is a great https://druid.apache.org/docs/latest/tutorials/tutorial-kafka.html#loading-data-with-the-data-loader[tutorial] from Druid on how to do this.