-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Testing Set Up For Extensibility #3012
Comments
For future reference : Steps to set up EC-2 Instance for Extensibility performance test
|
Initial benchmarking results. Both tests used the nyc_taxis workload, 2 warm-up iterations, 3 test-iterations, and ran sequentially on a custom EC2 instance of type m5.xlarge
|
What’s the code diff between baseline and POC? What are the changes at a high level? |
I don't see anything in this changeset that would yield that kind throughput improvement. |
To answer the diff between
Our tests use OpenSearch
|
Benchmarking Test Set 2
|
Benchmarking Test Set 3
|
These numbers feel bogus. There weren't any performance improvement PRs in baseline main between 4/29 and 5/3. This looks like something is being mis-measured in the benchmarks. The extensibility branch doesn't change the indexing logic, it just connects a new node to the cluster who's sole purpose is to register an IndexModule listener. Are we sure mensor is measuring the core indexing operations and not just measuring some log events in the event listeners? |
Check whether in both measurements the JDK used is the same? (on OpenSearch start header it will say what the version of Java is) |
JDK 11 is being used for all tests. All test are run on separate EC2 instances with the same test and system configurations |
Totally agreed. We are surprised (rather shocked) to see improvements. |
What is meant by "sequentially" here? Are the tests being run against the same EC2 instance with changes made between tests? If this is the case I would try running performance tests against the POC changes running on a fresh instance. This more accurately matches the automated testing that we do where there is a new cluster created for each test |
The first test was run on the same instance, the second and third test sets ran on their own new instances for each baseline and POC test. |
Note that we now bundle JDK17 for 2.0 by default, so I would confirm that it's actually using your global JDK11 set via |
If the clusters are still around, we should check the opensearch logs for the latency being reported for the indexing operations on both clusters and compare those. |
Benchmarking Test Set 4. All tests are using OpenSearch feature/extensions branch. First test uses just the feature tarball, second test uses the same feature tarball with the custom plugin installed, and the final test uses the same feature tarball with the custom plugin running on a separate process. Previous test sets used OpenSearch main branch for the baseline, however, system and test configuration for these new tests are exactly the same as the previous 3 test sets. By using the feature/extensions branch as the baseline for this set, we hope to eliminate any additional uncertainties in the results.
|
@joshpalis @saratvemulapalli Can we run the tests on rally/OS benchmark to rule out anything wonky being added by the testing framework? |
Here are the output logs for the POC test (d5f71bd7-2429-4b43-a52b-e1273250ce41) and the baseline test (621526c8-9da5-472c-aa07-c3f01b503589). The first 2000 lines of the opensearch output logs are pasted here, the original file is too large to post however they are still available on the EC2 instance if needed. |
These numbers make more sense. The fact that "Baseline - OpenSearch [feature/extensions] + Custom Plugin Installed" is a lot slower than "POC - OpenSearch [feature/extensions], Custom Plugin on a separate process" tells me that the custom plugin introduces some kind of contention. This could be as simple as a log statement that spews a very large amount of log output that needs to be written to disk. Eliminate anything other than the boilerplate in that example, there should be almost zero overhead from the "OpenSearch [feature/extensions]" baseline. |
Thanks @dblock for taking look. @joshpalis helped us kick off another set of tests, removing the INFO logging:
|
Benchmarking Test Set 5. Previously, the custom plugin installed into OpenSearch logged every index event operation to an output file. Modifications to the custom plugin has been made to remove any additional logging in order to reduce the amount of information written to the disk and eliminate any performance overhead this may cause. The following three tests have been conducted using the same system and test configurations as before, with preliminary analysis of the results confirming that the latency overhead observed previously was due to the excess logging. Latency between our baseline and just opensearch are now comparable, while the POC test shows an increase in latency, which is expected.
|
So an 8-10% variance, that looks like a reasonable trade-off. I think we'll have to continue offering users an in-proc option. |
Benchmarking Test Set 6. System and test configuration is the same as the previous tests, using the same feature/extensions tarball from test set 5
|
@joshpalis could you help understand the diff between Test Set 5 vs Test Set 6. |
The goal is to set up benchmarking for extensibility
The text was updated successfully, but these errors were encountered: