Rationalise ML native multi node test base classes, setup and teardown #49582

droberts195 · 2019-11-26T11:16:04Z

The ML native multi node tests now contain a large number of test classes.

Some of these extend MlNativeIntegTestCase which in turn extends ESIntegTestCase. Others extend ESRestTestCase. ESIntegTestCase and ESRestTestCase both extend ESTestCase and both add extra setup/teardown code and hooks, but this is different between the two.

All the ML native multi node tests share the same external test cluster. The differences in base classes between the various test classes makes it extremely hard to reason about exactly what state the cluster is in when a particular test runs, as the test that ran previously could have been in a different test class running different teardown code. In particular, it is not clear whether the previous test removed all the ML index templates or not. This can lead to very intermittent failures. For example, we suspect that https://gradle-enterprise.elastic.co/s/qbotz2vnmsija/tests/zcquf3hc3eoda-2zjlbn7rqx7gc was caused by this, but that is not a common failure.

As a proposal to make the state of the external multi node test cluster easier to reason about, I propose that:

All tests in the ML native multi node tests should (directly or indirectly) extend MlNativeIntegTestCase, and that should be changed to extend ESRestTestCase
Index templates should be preserved between tests - these tests don't do upgrades or restarts, so there is no need to mess with the index templates
There should be no custom cleanup methods in the individual test classes - it should be done by methods in the base classes MlNativeIntegTestCase, MlNativeAutodetectIntegTestCase or MlNativeDataFrameAnalyticsIntegTestCase

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-11-26T11:16:06Z

Pinging @elastic/ml-core (:ml)

dimitris-athanasiou · 2019-11-26T11:30:47Z

Why prefer ESRestTestCase rather than ESIntegTestCase? The latter allows us to call actions using the java objects instead of REST.

I suppose another option would be to use the HLRC if we go with REST. Do we have that option?

droberts195 · 2019-11-26T11:59:47Z

The latter allows us to call actions using the java objects instead of REST.

I guess at the time the tests were first written that was a good reason and probably explains why most tests indirectly extend ESIntegTestCase.

But that is using the node client which is not possible for real users.

It seems like it would be desirable now the HLRC is feature complete to switch to it. And it's supposed to be easy to migrate to because all the method signatures are identical between the transport client and HLRC. So migrating to that should just involve changing imports, and then we'll be using our external cluster in a more realistic way.

The only problem will come if there are places in the native multi node tests where we call actions that are meant to be internal only, and don't have corresponding REST endpoints. KillProcessAction springs to mind.

Another thing we could do if switching all the tests to use the same base class is problematic would be to split the tests into two parts: native-multi-node-rest-tests and native-multi-node-node-client-tests. Then at least the setup/teardown in between tests would be consistent within each of these.

benwtrent · 2019-11-26T12:16:33Z

Also, there have been multiple times when working on a new project when the HLRC actions have yet to be written. BUT I am writing a new API that needs integration testing across multiple nodes. Manually creating REST requests is a pain in this scenario.

droberts195 · 2023-11-06T11:02:39Z

Since this issue was first opened things have changed because the transport client no longer exists. These ML tests are now the only significant part of the code still using the node client to automate external integration tests - see #101808.

We should start to plan the migration strategy to move all the ML native multi node tests to be REST tests.

`ExternalTestCluster` doesn't really make sense now that the transport client is removed. We only use it in the ML integ test suite and it'd be good to avoid expanding its usage further, so this commit deprecates it and removes the functionality in `ESIntegTestCase` that might quietly switch to using it in a new test suite if running with certain system properties. Relates elastic#49582

elasticsearchmachine · 2023-11-06T18:36:47Z

Pinging @elastic/ml-core (Team:ML)

`ExternalTestCluster` doesn't really make sense now that the transport client is removed. We only use it in the ML integ test suite and it'd be good to avoid expanding its usage further, so this commit deprecates it and removes the functionality in `ESIntegTestCase` that might quietly switch to using it in a new test suite if running with certain system properties. Relates #49582

droberts195 added >refactoring :ml Machine learning team-discuss labels Nov 26, 2019

droberts195 mentioned this issue Apr 19, 2020

Some ML Test are failing with "Accounting breaker not reset to" errors #55420

Closed

DaveCTurner mentioned this issue Nov 6, 2023

Deprecate ExternalTestCluster #101844

Merged

elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 6, 2023

droberts195 mentioned this issue Feb 13, 2024

[CI] DatafeedJobsRestIT multiple tests failing due to unfinished tasks #105239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationalise ML native multi node test base classes, setup and teardown #49582

Rationalise ML native multi node test base classes, setup and teardown #49582

droberts195 commented Nov 26, 2019 •

edited

elasticmachine commented Nov 26, 2019

dimitris-athanasiou commented Nov 26, 2019

droberts195 commented Nov 26, 2019

benwtrent commented Nov 26, 2019

droberts195 commented Nov 6, 2023

elasticsearchmachine commented Nov 6, 2023

Rationalise ML native multi node test base classes, setup and teardown #49582

Rationalise ML native multi node test base classes, setup and teardown #49582

Comments

droberts195 commented Nov 26, 2019 • edited

elasticmachine commented Nov 26, 2019

dimitris-athanasiou commented Nov 26, 2019

droberts195 commented Nov 26, 2019

benwtrent commented Nov 26, 2019

droberts195 commented Nov 6, 2023

elasticsearchmachine commented Nov 6, 2023

droberts195 commented Nov 26, 2019 •

edited