[RFC] OpenSearch Performance Testing Proposal #7499

anasalkouz · 2023-05-10T02:16:42Z

Overview
OpenSearch performance is crucial to the success of the software. Having a mechanism to measure the performance on daily basis and point out any degradation; and providing a standardized process to test the performance of new features are really important to keep the performance at high standards.

Problem Statement
The issue has two main aspects. Firstly, The lack of standardized process for contributors to follow in order to test the performance of their core features before releasing them (i.e. before merging the code or before move it out of experimental), such as Segment Replication GA. Secondly, The lack of automated tools to identify ad-hoc changes and small commits that may degrade the performance of the software, such as in the change on the GeoJson Point format that cause a degradation on point data indexing, which results to revert the change. The lack of proper processes often results in delayed feature releases and erodes confidence in our software.

Proposal
Core Features: Proactive mechanism to identify any performance degradation for core features ahead of time and before releasing the feature. We propose developing a public process for performance testing that can be followed by all contributors and the following are the set of requirements:

The process should be designed for OpenSearch software and should be leveraged by any contributor.
The process should provides templates for testing plan and step-by-step instructions to make testing easier and more standardized, those templates can be defined by use-cases, such as indexing, search, geospatial ...etc.
The plan should provides the recommended workloads to be used from the existing list of workloads.
The plan should provides set of metrics to measure the performance of the system by use-case, such as query response time, indexing throughput, CPU utilization ..etc.
The plan should provides guidelines for testing different cluster configurations and settings, such as single/multi node cluster, low specs hardware, 100+ primary shards, with/without replicas.
The process provides a template for a report summarizing the results, that shows the benchmarking comparison, issues or concerns that were discovered during testing and the analysis.
The process provides the ability to customize the testing plan templates, the performance metrics and workloads based on feature requirements.
Testing should uses OpenSearch Benchmark.
The report should be reviewed by a group of maintainers to determine whether the testing results meet the criteria for sign-off.

In order to achieve the previously mentioned requirements, we propose to create new repository under the OpenSearch project for performance testing, the repo will have different templates for performance testing per use-case, users can submit new issue/PR to add new template that will cover missing use-cases. The repository will include templates to report the testing results. Owner of the feature will submit the testing report as PR and two or more of the repository’s maintainers should approve the PR in order to meet the criteria of sign-off. Initially, we will start with simple process that cover few use-cases, then we will evolve and improve it overtime as needed, based on feedback and changes to the software.

Ad-hoc Changes: Reactive mechanism to identify any commits that may cause a performance degradation and addressing them promptly. It is particularly effective in cases where we cannot anticipate the potential for performance degradation until the code merged. Additionally, this approach is beneficial in addressing small, unmeasurable slowdowns that may accumulate over time, suffering the fate of the boiling frog and result in an overall drop in the software’s performance.

In order to achieve it, we need a system to run a nightly benchmark that cover most common use-cases such as logging and geospatial, then it will generate a public dashboard with read-only access to review and compare against previous runs. This effort is already in progress that you can track it here, which is similar to Lucene’s nightly benchmarks. After completing the foundation of the nightly benchmarks, there may be opportunities for further enhancements, such as:

Notifications: The system should track and detect any performance degrading comparing to previous runs or comparing to a baseline, then it will create an auto-cut github issue to the corresponding repository, instead of depending on maintainers to monitor and identify those issues manually.
Profiling: The system should generate a flame graph profile report for each run. So, contributors can investigate any potential performance issues easily.

Overtime we should keep enriching our nightly benchmarks with more test-cases to increase the coverage. Despite our efforts to expand the coverage of the nightly benchmarks, it will stay limited to a certain number of workloads and use-cases. However, by developing the additional mechanism to test and report the performance of new features, we will make sure to keep the software’s performance at high bar and users will have more confidence to upgrade and adopt new features.

We are looking forward to your feedback and support for this proposal.

Related Issues:
opensearch-project/opensearch-benchmark#102
#3983

References:
https://blog.mikemccandless.com/2011/04/catching-slowdowns-in-lucene.html
https://webtide.com/the-jetty-performance-effort/

dblock · 2023-05-11T20:58:28Z

A big +1 on documenting what a developer needs to do to performance test their change in DEVELOPER_GUIDE or TESTING, a new PERFORMANCE.md or the combination thereof. I'd like maintainers to be able to say "this change will negatively/positively impact performance, can you please run benchmarks on the change, see here about how", before code gets merged.
Once code is merged, we want to run performance tests all the time on it with bigger clusters and better data sets. Since relatively recently, that has been the focus of the maintainers and contributors (@IanHoang @gkamat + @bbarani) of opensearch-benchmark. That project aims to create representative data sets, tools, and operate infrastructure that should satisfy many of the requirements here after the code is committed, and requirements are captured in issues like [RFC] Enhancements for OSB Workloads opensearch-benchmark#253. I imagine you proposed a new repo because you don't want to mix tools and data?
We need good models for publicizing improvements. I think the segment replication post is a great start.

reta · 2023-05-12T16:12:09Z

We also apparently have some automation already (on a limited workloads, opensearch-project/opensearch-build#129) but it is for releases only. Once we have a nightly one, it should not be needed anymore I think. Certainly +1 to have benchmarks run regularly and results being public, similar to https://home.apache.org/~mikemccand/lucenebench/ and https://elasticsearch-benchmarks.elastic.co/

anasalkouz · 2023-05-17T16:42:29Z

A big +1 on documenting what a developer needs to do to performance test their change in DEVELOPER_GUIDE or TESTING, a new PERFORMANCE.md or the combination thereof. I'd like maintainers to be able to say "this change will negatively/positively impact performance, can you please run benchmarks on the change, see here about how", before code gets merged.

Yes, we should do that. PERFORMANCE.md can explain the guidelines on how to run performance testing, point users to OpenSearch Benchmark and Workloads.

Once code is merged, we want to run performance tests all the time on it with bigger clusters and better data sets. Since relatively recently, that has been the focus of the maintainers and contributors (@IanHoang @gkamat + @bbarani) of opensearch-benchmark. That project aims to create representative data sets, tools, and operate infrastructure that should satisfy many of the requirements here after the code is committed, and requirements are captured in issues like [RFC] Enhancements for OSB Workloads opensearch-benchmark#253. I imagine you proposed a new repo because you don't want to mix tools and data?

Currently: we have a repo to for tool (OpenSearch Benchmark) and a repo for DataSets/Workloads, but we don't have a place to record core features performance testing plan and results and that why I am proposing to have a separate repo for that purposes. In addition, It will give us some sign off mechanism to ship or not to ship features based on the performance testing results.

We need good models for publicizing improvements. I think the segment replication post is a great start.

We will use the new repo to publish the details of performance testing results for new features, and for sure we can use blog posts to share the summary.

bbarani · 2023-05-19T16:27:10Z

@anasalkouz My 2 cents... Even if you end up creating your own repo for storing your specific workloads, I would also recommend adding any core specific workloads to the https://github.com/opensearch-project/opensearch-benchmark-workloads repo to make it a single source of truth for accessing all workloads for OpenSearch. Our goal is to execute and surface perf metrics to community at core engine level (using generic, specific workloads), plugin level, distribution level for multiple versions on a regular cadence along with providing a run-book, template to the community for reproducing the setup with simple steps for localized testing as well.

You should be able to able to quickly setup the infra required to run your own tests pointing to workload branch to get metrics for blog and share the summary but eventually we want all these test to run on regular cadence and surface the metrics to community for better visibility and transparency.

anasalkouz · 2023-05-19T18:15:56Z

@anasalkouz My 2 cents... Even if you end up creating your own repo for storing your specific workloads,

I think there is a misunderstanding, The new proposed repo only to track performance testing plan and results of a specific features such as SegRep. But if you have new/customized workloads, this still should be added to opensearch-project/opensearch-benchmark-workloads.

TL;DR
Performance Tooling: we should use OpenSearch Benchmark to run all performance testing
Workloads: we should use opensearch-project/opensearch-benchmark-workloads, we can add new/customized workloads as needed.
Performance Testing plan and results: we will use the new proposed repository to share our testing plan, then record our testing results to get the sign off.

andrross · 2023-05-23T18:00:21Z

Performance Testing plan and results: we will use the new proposed repository to share our testing plan, then record our testing results to get the sign off.

Would it possibly make sense to start off by putting this data in a new folder within the OpenSearch repo itself? It's easy enough to split out to new repo if that becomes necessary/desirable. We may end up doing that quite quickly, but it'll be easier to have an opinion about this once it is concrete with a specific example.

anasalkouz · 2023-05-24T17:28:32Z

Would it possibly make sense to start off by putting this data in a new folder within the OpenSearch repo itself? It's easy enough to split out to new repo if that becomes necessary/desirable. We may end up doing that quite quickly, but it'll be easier to have an opinion about this once it is concrete with a specific example.

Sure, I think this make sense. But not sure if this should be part of OpenSearch main repo or opensearch-project/opensearch-benchmark-workloads.

@dblock @bbarani what do you think?

Jon-AtAWS · 2023-05-25T22:31:34Z

First, +1 - anything we do that improves our testing is a good thing.

Second, why do we need a repo for test plans? The opensearch-project/opensearch-benchmark-workloads are (or can be made to be) self-documenting. Many contributions could re-use the datasets from those existing, with different tests.

The only thing not covered is where to store test results. Maybe there's some benefit to centralizing all results, but why not have a folder in each of the workload folders to store test results. And send them to the OpenSearch cluster that's backing the read-only Dashboards accessing them.

So: Workload - choose a data set from existing, or create a more targeted one
Test Plan - a folder in opensearch-project/opensearch-benchmark-workloads with a README that describes it
Tests - in test_procedures
Output - A folder in the workload folder
Output 2 - when we have it, some helpful automation to deliver test results to an OpenSearch cluster that is world-readable.

anasalkouz · 2023-06-27T17:23:22Z

Thanks all for the feedback.
We will start with simple process by submitting performance testing results into a folder in opensearch-project/opensearch-benchmark-workloads. We are working on a live example and will share it as an example/template to use.

anasalkouz changed the title ~~[RFC] OpenSearch Performance Testing~~ [RFC] OpenSearch Performance Testing Proposal May 10, 2023

nibix mentioned this issue Dec 29, 2023

[RFC] Security Performance Test Suite opensearch-project/security#3903

Open

anasalkouz closed this as completed Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] OpenSearch Performance Testing Proposal #7499

[RFC] OpenSearch Performance Testing Proposal #7499

anasalkouz commented May 10, 2023

dblock commented May 11, 2023

reta commented May 12, 2023 •

edited

anasalkouz commented May 17, 2023

bbarani commented May 19, 2023 •

edited

anasalkouz commented May 19, 2023

andrross commented May 23, 2023

anasalkouz commented May 24, 2023 •

edited

Jon-AtAWS commented May 25, 2023 •

edited

anasalkouz commented Jun 27, 2023 •

edited

[RFC] OpenSearch Performance Testing Proposal #7499

[RFC] OpenSearch Performance Testing Proposal #7499

Comments

anasalkouz commented May 10, 2023

dblock commented May 11, 2023

reta commented May 12, 2023 • edited

anasalkouz commented May 17, 2023

bbarani commented May 19, 2023 • edited

anasalkouz commented May 19, 2023

andrross commented May 23, 2023

anasalkouz commented May 24, 2023 • edited

Jon-AtAWS commented May 25, 2023 • edited

anasalkouz commented Jun 27, 2023 • edited

reta commented May 12, 2023 •

edited

bbarani commented May 19, 2023 •

edited

anasalkouz commented May 24, 2023 •

edited

Jon-AtAWS commented May 25, 2023 •

edited

anasalkouz commented Jun 27, 2023 •

edited