Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalise ML native multi node test base classes, setup and teardown #49582

Open
droberts195 opened this issue Nov 26, 2019 · 6 comments
Open
Labels
:ml Machine learning >refactoring Team:ML Meta label for the ML team team-discuss

Comments

@droberts195
Copy link
Contributor

droberts195 commented Nov 26, 2019

The ML native multi node tests now contain a large number of test classes.

Some of these extend MlNativeIntegTestCase which in turn extends ESIntegTestCase. Others extend ESRestTestCase. ESIntegTestCase and ESRestTestCase both extend ESTestCase and both add extra setup/teardown code and hooks, but this is different between the two.

All the ML native multi node tests share the same external test cluster. The differences in base classes between the various test classes makes it extremely hard to reason about exactly what state the cluster is in when a particular test runs, as the test that ran previously could have been in a different test class running different teardown code. In particular, it is not clear whether the previous test removed all the ML index templates or not. This can lead to very intermittent failures. For example, we suspect that https://gradle-enterprise.elastic.co/s/qbotz2vnmsija/tests/zcquf3hc3eoda-2zjlbn7rqx7gc was caused by this, but that is not a common failure.

As a proposal to make the state of the external multi node test cluster easier to reason about, I propose that:

  1. All tests in the ML native multi node tests should (directly or indirectly) extend MlNativeIntegTestCase, and that should be changed to extend ESRestTestCase
  2. Index templates should be preserved between tests - these tests don't do upgrades or restarts, so there is no need to mess with the index templates
  3. There should be no custom cleanup methods in the individual test classes - it should be done by methods in the base classes MlNativeIntegTestCase, MlNativeAutodetectIntegTestCase or MlNativeDataFrameAnalyticsIntegTestCase
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@dimitris-athanasiou
Copy link
Contributor

Why prefer ESRestTestCase rather than ESIntegTestCase? The latter allows us to call actions using the java objects instead of REST.

I suppose another option would be to use the HLRC if we go with REST. Do we have that option?

@droberts195
Copy link
Contributor Author

The latter allows us to call actions using the java objects instead of REST.

I guess at the time the tests were first written that was a good reason and probably explains why most tests indirectly extend ESIntegTestCase.

But that is using the node client which is not possible for real users.

It seems like it would be desirable now the HLRC is feature complete to switch to it. And it's supposed to be easy to migrate to because all the method signatures are identical between the transport client and HLRC. So migrating to that should just involve changing imports, and then we'll be using our external cluster in a more realistic way.

The only problem will come if there are places in the native multi node tests where we call actions that are meant to be internal only, and don't have corresponding REST endpoints. KillProcessAction springs to mind.

Another thing we could do if switching all the tests to use the same base class is problematic would be to split the tests into two parts: native-multi-node-rest-tests and native-multi-node-node-client-tests. Then at least the setup/teardown in between tests would be consistent within each of these.

@benwtrent
Copy link
Member

Also, there have been multiple times when working on a new project when the HLRC actions have yet to be written. BUT I am writing a new API that needs integration testing across multiple nodes. Manually creating REST requests is a pain in this scenario.

@droberts195
Copy link
Contributor Author

Since this issue was first opened things have changed because the transport client no longer exists. These ML tests are now the only significant part of the code still using the node client to automate external integration tests - see #101808.

We should start to plan the migration strategy to move all the ML native multi node tests to be REST tests.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 6, 2023
`ExternalTestCluster` doesn't really make sense now that the transport
client is removed. We only use it in the ML integ test suite and it'd be
good to avoid expanding its usage further, so this commit deprecates it
and removes the functionality in `ESIntegTestCase` that might quietly
switch to using it in a new test suite if running with certain system
properties.

Relates elastic#49582
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 6, 2023
elasticsearchmachine pushed a commit that referenced this issue Nov 6, 2023
`ExternalTestCluster` doesn't really make sense now that the transport
client is removed. We only use it in the ML integ test suite and it'd be
good to avoid expanding its usage further, so this commit deprecates it
and removes the functionality in `ESIntegTestCase` that might quietly
switch to using it in a new test suite if running with certain system
properties.

Relates #49582
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >refactoring Team:ML Meta label for the ML team team-discuss
Projects
None yet
Development

No branches or pull requests

5 participants