-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML QA tasks are slow, take up 20 minutes of build time #37339
Comments
Pinging @elastic/ml-core |
I guess the slow part of Having that said, I can move the slow parts into a separate package if that helps, I guess that's what you mean with "dedicated task". A followup option could be to split |
We are not generally thinking about a slow vs fast split for testing yet and we shouldn't weaken the test to make it faster. We could maybe structure it so we can run more of them in parallel ? This is what I meant by a separate task, tests that inherit from |
I'm pretty sure the reason it's so slow and has got slower over the years is the complex setup and teardown that is done for each of the hundreds of tests. We delete all the internal indices in between tests, including the These same problems also affect watcher and monitoring REST tests, but there are fewer of them so it's not as noticeable. It's possible that this could be speeded up by having a more streamlined setup and teardown process between the tests. But we need to be careful here - in 2017 we were plagued by X-Pack REST tests failing because of side effects of previous REST tests. We prevented these by having the complex setup and teardown steps we have today. However maybe we can cut back on the setup and teardown time by making it more specific to different test suites. For example, instead of deleting every index including |
After removing the These tests create a 3 node cluster, which is one reason why they are so slow. Is it possible to parallelise this as I assume that would involve multiple clusters each with 3 jvms |
The test test plugin configures |
The slowest test in a local test I just ran was
Looking in the server log the following gaps exist: 26 seconds:
8 seconds:
11 seconds:
23 seconds followed by 49(!) seconds:
|
This test was certainly not that slow when it was first written. Not sure if the test changed or it just takes slower to run. |
I had a PR failure due to the slow |
We should certainly spend some time understanding why it takes more time. I'll check it out. |
This commit parallelizes some parts of the test and its remove an unnecessary refresh call. On my local machine it shaves off about 15 seconds. This test is still slow but progress over perfection. Relates elastic#37339
This commit parallelizes some parts of the test and its remove an unnecessary refresh call. On my local machine it shaves off about 15 seconds for a test execution time of ~64s (down from ~80s). This test is still slow but progress over perfection. Relates elastic#37339
This commit parallelizes some parts of the test and its remove an unnecessary refresh call. On my local machine it shaves off about 15 seconds for a test execution time of ~64s (down from ~80s). This test is still slow but progress over perfection. Relates #37339
It's related to type removal. I started looking at the failure of
Simply logging all those errors must dramatically slow the test down. The test is using a hardcoded type of |
This commit parallelizes some parts of the test and its remove an unnecessary refresh call. On my local machine it shaves off about 15 seconds for a test execution time of ~64s (down from ~80s). This test is still slow but progress over perfection. Relates #37339
This commit parallelizes some parts of the test and its remove an unnecessary refresh call. On my local machine it shaves off about 15 seconds for a test execution time of ~64s (down from ~80s). This test is still slow but progress over perfection. Relates #37339
This commit parallelizes some parts of the test and its remove an unnecessary refresh call. On my local machine it shaves off about 15 seconds for a test execution time of ~64s (down from ~80s). This test is still slow but progress over perfection. Relates #37339
The problems with type removal logging and audit logging slowing down the tests are now resolved. (These problems appeared after this issue was first raised, so don't justify closing this issue.) Returning to the fundamental question of what to do about ML integration tests that take a long time, there is more discussion in #39859 (comment). |
@droberts195 both propositions seem too drastic to me right now. |
Looking at this build scan from a master intake CI job:
https://scans.gradle.com/s/wqhe4ax2bx7zc/performance/execution
It looks like 2 of the ML test tasks take up 20 minutes of our non parallelized section of the build:
From the same logs:
and
We should look into why these tests are so slow and ways to make them faster.
One things that sticks out is that there are both
ESIntegTestCase
andESRestTestCase
mixed in the same tasks. The former can be spread out across multiple JVMs running in parallel if we were to have a dedicated task for those.The text was updated successfully, but these errors were encountered: