Skip to content
This repository was archived by the owner on Feb 16, 2024. It is now read-only.

Conversation

sbernauer
Copy link
Member

@sbernauer sbernauer commented Sep 28, 2022

Description

Run with
stackablectl --additional-demos-file demos/demos-v1.yaml --additional-stacks-file stacks/stacks-v1.yaml demo install nifi-kafka-druid-water-level-data

Tested demo with 2.500.000.000 records

Hi all, here a short summary of the observations of the water-level demo:

NiFi uses content-repo pvc but keeps it at ~50% usage => Shoud be fine forever
Actions:

  • Increase content-repo 5->10 gb, better safe than sorry. I was able to crash it by using large queues and stalling processors.

Kafka uses pvc (currently 15gb) => Should work fine for ~1 week
Actions:

  • Look into retentions settings (low priority as it should work ~1 week) so that it works forever

Druid uses S3 for deep storage (S3 has 15gb). But currently it also cashes everything locally at the historical because we set druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"300g"}] (hardcoded in https://github.com/stackabletech/druid-operator/blob/45525033f5f3f52e0997a9b4d79ebe9090e9e0a0/deploy/config-spec/properties.yaml#L725)
This does not really effect the demo, as 100.000.000 records (let's call it data of ~1 week) have ~400MB.
I think the main problem with the demo is that queries take > 5 minutes to complete and Superset shows timeouts.
The historical pod suspiciously uses exactly one core of cpu and the queries are really slow for a "big data" system IMHO.
This could be because either druid is only using a single core or because we dont set any resources (yet!) and the node does not have more cores available. Going to reasearch that.
Actions:

  • Created Make segment-cache size configurable and use emptyDir for it druid-operator#306
  • In the meantime configure overwrite in the demo druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"3g","freeSpacePercent":"5.0"}]
  • Research slow query performance
  • Have a look at the queries the Superset Dashboard executes and optimize them
  • Maybe we should bump the druid-operator versions in the demo (e.g. create release 22.09-druid which basically is 22.09 with a newer druid-op version). Therefore we get stable resources.
  • Enable Druid auto compaction to reduce number of segments

Review Checklist

  • Code contains useful comments
  • (Integration-)Test cases added (or not applicable)
  • Documentation added (or not applicable)
  • Changelog updated (or not applicable)
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Once the review is done, comment bors r+ (or bors merge) to merge. Further information

@sbernauer sbernauer force-pushed the improve-water-levels-demo branch from d746d6b to 12e94ce Compare September 28, 2022 16:31
@sbernauer sbernauer force-pushed the improve-water-levels-demo branch from 6739f9f to 6acc900 Compare September 28, 2022 18:32
@sbernauer sbernauer requested a review from a team September 28, 2022 18:35
Copy link
Member

@maltesander maltesander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sbernauer
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Sep 30, 2022
## Description

Run with
`stackablectl --additional-demos-file demos/demos-v1.yaml --additional-stacks-file stacks/stacks-v1.yaml demo install nifi-kafka-druid-water-level-data`

Tested demo with 2.500.000.000 records


Hi all, here a short summary of the observations of the water-level demo:

NiFi uses content-repo pvc but keeps it at ~50% usage => Shoud be fine forever
Actions:
* Increase content-repo 5->10 gb, better safe than sorry. I was able to crash it by using large queues and stalling processors.

Kafka uses pvc (currently 15gb) => Should work fine for ~1 week
Actions:
* Look into retentions settings (low priority as it should work ~1 week) so that it works forever

Druid uses S3 for deep storage (S3 has 15gb). But currently it also cashes *everything* locally at the historical because we set `druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"300g"}]` (hardcoded in https://github.com/stackabletech/druid-operator/blob/45525033f5f3f52e0997a9b4d79ebe9090e9e0a0/deploy/config-spec/properties.yaml#L725)
This does *not* really effect the demo, as 100.000.000 records (let's call it data of ~1 week) have ~400MB.
I think the main problem with the demo is that queries take > 5 minutes to complete and Superset shows timeouts.
The historical pod suspiciously uses exactly one core of cpu and the queries are really slow for a "big data" system IMHO.
This could be because either druid is only using a single core or because we dont set any resources (yet!) and the node does not have more cores available. Going to reasearch that.
Actions:
* Created stackabletech/druid-operator#306
* In the meantime configure overwrite in the demo `druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"3g","freeSpacePercent":"5.0"}]`
* Research slow query performance
* Have a look at the queries the Superset Dashboard executes and optimize them
* Maybe we should bump the druid-operator versions in the demo (e.g. create release 22.09-druid which basically is 22.09 with a newer druid-op version). Therefore we get stable resources.
* Enable Druid auto compaction to reduce number of segments
@bors
Copy link

bors bot commented Sep 30, 2022

Pull request successfully merged into main.

Build succeeded:

@bors bors bot changed the title Improve water-level demo [Merged by Bors] - Improve water-level demo Sep 30, 2022
@bors bors bot closed this Sep 30, 2022
@bors bors bot deleted the improve-water-levels-demo branch September 30, 2022 12:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants