Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Performance Testing for AD Extension #725

Closed
35 tasks done
owaiskazi19 opened this issue May 3, 2023 · 7 comments
Closed
35 tasks done

[FEATURE] Performance Testing for AD Extension #725

owaiskazi19 opened this issue May 3, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@owaiskazi19
Copy link
Member

owaiskazi19 commented May 3, 2023

Is your feature request related to a problem?

With the current development of AD Extension on feature/extensions benchmarking numbers are still unknown. This issue talks about using opensearch-benchmark to get the new extension performance numbers.
OpenSearch cluster can be deployed using opensearch-cluster-cdk

Get performance numbers of API's on Single Node cluster Extensions vs Plugin

What solution would you like?

Deploy the cluster using above mentioned CDK. Use the macrobenching tool opensearch-benchmark to run performance test against the deployed cluster.

Steps to setup performance testing:

  • Setup opensearch cluster
  • Run ad extension
  • setup opensearch benchmark in the instance
  • do benchmarking for the cluster running with Ad extension
  • do benchmarking for the cluster running with AD plugin
  • Post benchmarking of cluster explore how to apple to apple comparison between running an api on extension vs plugin
  • Write Javascript code for perf testing of specific apis to get total time of execution
  • Run JS code for extensions and plugins
  • Plot the data in the graph

Api's

  • GET _plugins/_anomaly_detection/detectors/
  • DELETE _plugins/_anomaly_detection/detectors/
  • POST _plugins/_anomaly_detection/detectors/_preview
  • GET _plugins/_anomaly_detection/detectors/_search
  • POST _plugins/_anomaly_detection/detectors/_search
  • POST _plugins/_anomaly_detection/detectors/results/_search/
  • POST _plugins/_anomaly_detection/detectors/results/_search/<custom_result_index>
  • POST _plugins/_anomaly_detection/detectors/results/_search/<custom_result_index>?only_query_custom_result_index=false
  • POST _plugins/_anomaly_detection/detectors/results/_search/<custom_result_index>?only_query_custom_result_index=true
  • DELETE _plugins/_anomaly_detection/detectors/results
  • GET _plugins/_anomaly_detection/detectors//results/_topAnomalies?historical=false
  • GET _plugins/_anomaly_detection/detectors//results/_topAnomalies?historical=true
  • GET _plugins/_anomaly_detection/detectors/tasks/_search
  • POST _plugins/_anomaly_detection/detectors/tasks/_search
  • GET _plugins/_anomaly_detection/detectors/count
  • GET _plugins/_anomaly_detection/detectors/match
  • GET _plugins/_anomaly_detection/detectors//_profile/
  • GET _plugins/_anomaly_detection/detectors//_profile?_all=true
  • GET _plugins/_anomaly_detection/detectors//_profile/
  • GET _plugins/_anomaly_detection/detectors//_profile/,
  • GET _plugins/_anomaly_detection/detectors//_profile/ad_task
  • GET _plugins/_anomaly_detection/stats
  • GET _plugins/_anomaly_detection//stats
  • GET _plugins/_anomaly_detection//stats/
  • POST _plugins/_anomaly_detection/detectors/{detector_id}/_run
  • POST _plugins/_anomaly_detection/detectors//_start
@vibrantvarun
Copy link
Member

Benchmarking of OpenSearch Cluster Running with AD extension.


_______             __   _____

/ () ____ / / / /_____ ________
/ /_ / / __ / __ `/ / __ / / __ / / _
/ __/ / / / / / /
/ / / / / // // / / / __/
/
/ /
/
/ /
/_
,// /__/_/_// ___/

Metric Task Value Unit
Cumulative indexing time of primary shards 0.0112 min
Min cumulative indexing time across primary shards 0.0112 min
Median cumulative indexing time across primary shards 0.0112 min
Max cumulative indexing time across primary shards 0.0112 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 0 min
Cumulative merge count of primary shards 0
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 0 min
Max cumulative merge time across primary shards 0 min
Cumulative merge throttle time of primary shards 0 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0 min
Max cumulative merge throttle time across primary shards 0 min
Cumulative refresh time of primary shards 0.0022 min
Cumulative refresh count of primary shards 5
Min cumulative refresh time across primary shards 0.0022 min
Median cumulative refresh time across primary shards 0.0022 min
Max cumulative refresh time across primary shards 0.0022 min
Cumulative flush time of primary shards 0 min
Cumulative flush count of primary shards 0
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 0 min
Max cumulative flush time across primary shards 0 min
Total Young Gen GC time 0 s
Total Young Gen GC count 0
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 0.000253449 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 8
Min Throughput index 5585.73 docs/s
Mean Throughput index 5585.73 docs/s
Median Throughput index 5585.73 docs/s
Max Throughput index 5585.73 docs/s
50th percentile latency index 166.95 ms
100th percentile latency index 173.231 ms
50th percentile service time index 166.95 ms
100th percentile service time index 173.231 ms
error rate index 0 %
Min Throughput wait-until-merges-finish 96.03 ops/s
Mean Throughput wait-until-merges-finish 96.03 ops/s
Median Throughput wait-until-merges-finish 96.03 ops/s
Max Throughput wait-until-merges-finish 96.03 ops/s
100th percentile latency wait-until-merges-finish 10.1884 ms
100th percentile service time wait-until-merges-finish 10.1884 ms
error rate wait-until-merges-finish 0 %
Min Throughput default 19.63 ops/s
Mean Throughput default 19.63 ops/s
Median Throughput default 19.63 ops/s
Max Throughput default 19.63 ops/s
100th percentile latency default 55.6047 ms
100th percentile service time default 4.49441 ms
error rate default 0 %
Min Throughput range 60.82 ops/s
Mean Throughput range 60.82 ops/s
Median Throughput range 60.82 ops/s
Max Throughput range 60.82 ops/s
100th percentile latency range 21.4162 ms
100th percentile service time range 4.84291 ms
error rate range 0 %
Min Throughput distance_amount_agg 36.31 ops/s
Mean Throughput distance_amount_agg 36.31 ops/s
Median Throughput distance_amount_agg 36.31 ops/s
Max Throughput distance_amount_agg 36.31 ops/s
100th percentile latency distance_amount_agg 31.2152 ms
100th percentile service time distance_amount_agg 3.54171 ms
error rate distance_amount_agg 0 %
Min Throughput autohisto_agg 49.36 ops/s
Mean Throughput autohisto_agg 49.36 ops/s
Median Throughput autohisto_agg 49.36 ops/s
Max Throughput autohisto_agg 49.36 ops/s
100th percentile latency autohisto_agg 25.6165 ms
100th percentile service time autohisto_agg 5.19717 ms
error rate autohisto_agg 0 %
Min Throughput date_histogram_agg 104.86 ops/s
Mean Throughput date_histogram_agg 104.86 ops/s
Median Throughput date_histogram_agg 104.86 ops/s
Max Throughput date_histogram_agg 104.86 ops/s
100th percentile latency date_histogram_agg 14.149 ms
100th percentile service time date_histogram_agg 4.4783 ms
error rate date_histogram_agg 0 %

[INFO] SUCCESS (took 13 seconds)

@vibrantvarun
Copy link
Member

PreviewDetector
profileDetector
SearchDetector
GetStatsDetector
GetDetector
ExecuteDetector
CreateDetector
DeleteDetector
Start StopDetector

@minalsha
Copy link
Collaborator

Thanks @vibrantvarun for updating the issue with benchmarking data. Is there anything else pending for this issue? If not, can we close it? Thanks cc @dbwiddis

@vibrantvarun
Copy link
Member

vibrantvarun commented Jun 20, 2023

@minalsha There is nothing pending from my side. I have covered following tasks mentioned below in the issue

  1. Benchmarking of opensearch cluster running extensions
  2. Performance testing of extensions API's vs plugin Api's

@dbwiddis waiting for your insights on it.

Thanks

@minalsha
Copy link
Collaborator

Thanks @vibrantvarun . Curious why Create/get/Search Detector graphs are not consistent? Please share the findings. Thanks

@dbwiddis
Copy link
Member

Looks great, thanks @vibrantvarun !

Some context: the human brain can't really detect differences less than 250 ms, so that is somewhat of a good threshold for performance differences.

Some thoughts:

  1. As expected there is some latency in the transport communications. Search detector and Get Detector are probably the best one to evaluate that impact as it's mostly a single REST call so there are 4 transport hops. The network latency only looks like about 25-50ms total for those. It looks like there's a lot of variation visually but it's a small number of milliseconds that are probably within various measurement tolerances.
  2. Create Detector also has a bit of variance, but there are a lot of back-and-forth network hops for that (checking non-existence of indices, creating them, checking existence). This call used to take almost 13 seconds so it's awesome that @joshpalis work to rewrite cluster state calls brought it down to a few hundred milliseconds.
  3. Preview and Profile probably shift some work from the cluster to the extension node. Not quite sure how to explain the length of time and variance, but could be related to different JVM settings on extension node vs. OpenSearch cluster nodes. If we do further fine tuning work we should probably focus on these to identify where the bottlenecks are and if there's any way to work around them.

@minalsha
Copy link
Collaborator

Closing this issue since we completed AD Extension API performance testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants