This repository was archived by the owner on Feb 16, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
[Merged by Bors] - Improve water-level demo #126
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d746d6b
to
12e94ce
Compare
sbernauer
commented
Sep 28, 2022
6739f9f
to
6acc900
Compare
maltesander
approved these changes
Sep 29, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
bors r+ |
bors bot
pushed a commit
that referenced
this pull request
Sep 30, 2022
## Description Run with `stackablectl --additional-demos-file demos/demos-v1.yaml --additional-stacks-file stacks/stacks-v1.yaml demo install nifi-kafka-druid-water-level-data` Tested demo with 2.500.000.000 records Hi all, here a short summary of the observations of the water-level demo: NiFi uses content-repo pvc but keeps it at ~50% usage => Shoud be fine forever Actions: * Increase content-repo 5->10 gb, better safe than sorry. I was able to crash it by using large queues and stalling processors. Kafka uses pvc (currently 15gb) => Should work fine for ~1 week Actions: * Look into retentions settings (low priority as it should work ~1 week) so that it works forever Druid uses S3 for deep storage (S3 has 15gb). But currently it also cashes *everything* locally at the historical because we set `druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"300g"}]` (hardcoded in https://github.com/stackabletech/druid-operator/blob/45525033f5f3f52e0997a9b4d79ebe9090e9e0a0/deploy/config-spec/properties.yaml#L725) This does *not* really effect the demo, as 100.000.000 records (let's call it data of ~1 week) have ~400MB. I think the main problem with the demo is that queries take > 5 minutes to complete and Superset shows timeouts. The historical pod suspiciously uses exactly one core of cpu and the queries are really slow for a "big data" system IMHO. This could be because either druid is only using a single core or because we dont set any resources (yet!) and the node does not have more cores available. Going to reasearch that. Actions: * Created stackabletech/druid-operator#306 * In the meantime configure overwrite in the demo `druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"3g","freeSpacePercent":"5.0"}]` * Research slow query performance * Have a look at the queries the Superset Dashboard executes and optimize them * Maybe we should bump the druid-operator versions in the demo (e.g. create release 22.09-druid which basically is 22.09 with a newer druid-op version). Therefore we get stable resources. * Enable Druid auto compaction to reduce number of segments
Pull request successfully merged into main. Build succeeded: |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Run with
stackablectl --additional-demos-file demos/demos-v1.yaml --additional-stacks-file stacks/stacks-v1.yaml demo install nifi-kafka-druid-water-level-data
Tested demo with 2.500.000.000 records
Hi all, here a short summary of the observations of the water-level demo:
NiFi uses content-repo pvc but keeps it at ~50% usage => Shoud be fine forever
Actions:
Kafka uses pvc (currently 15gb) => Should work fine for ~1 week
Actions:
Druid uses S3 for deep storage (S3 has 15gb). But currently it also cashes everything locally at the historical because we set
druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"300g"}]
(hardcoded in https://github.com/stackabletech/druid-operator/blob/45525033f5f3f52e0997a9b4d79ebe9090e9e0a0/deploy/config-spec/properties.yaml#L725)This does not really effect the demo, as 100.000.000 records (let's call it data of ~1 week) have ~400MB.
I think the main problem with the demo is that queries take > 5 minutes to complete and Superset shows timeouts.
The historical pod suspiciously uses exactly one core of cpu and the queries are really slow for a "big data" system IMHO.
This could be because either druid is only using a single core or because we dont set any resources (yet!) and the node does not have more cores available. Going to reasearch that.
Actions:
druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"3g","freeSpacePercent":"5.0"}]
Review Checklist
Once the review is done, comment
bors r+
(orbors merge
) to merge. Further information