Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST test against clusters with dedicated master nodes #34563

Open
DaveCTurner opened this issue Oct 17, 2018 · 14 comments
Open

REST test against clusters with dedicated master nodes #34563

DaveCTurner opened this issue Oct 17, 2018 · 14 comments
Labels
:Delivery/Build Build or test infrastructure >enhancement help wanted adoptme Team:Delivery Meta label for Delivery team

Comments

@DaveCTurner
Copy link
Contributor

Today we run our REST tests against clusters with nodes that are both data nodes and master-eligible nodes. We therefore miss tests against the (recommended) configuration of dedicated master nodes and separate data nodes, and this includes things like tests of the rolling upgrade procedure which may be sensitive to the order in which nodes are upgraded.

Thanks to Armin for spotting this in #34560 (comment).

@DaveCTurner DaveCTurner added >enhancement :Delivery/Build Build or test infrastructure labels Oct 17, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@DaveCTurner DaveCTurner added the help wanted adoptme label Oct 17, 2018
@original-brownbear
Copy link
Member

@DaveCTurner I was gonna look into this. I guess one question that comes to mind is: How many nodes do we want to test here? We probably still want 3 master eligible nodes and then some data nodes (maybe one suffices ?) on top of that ... this may get a little trickier resource usage wise?

@nik9000
Copy link
Member

nik9000 commented Oct 25, 2018

I suspect we'd be ok with three master eligible nodes and one data node. It is much more realistic than one node....

As with all of our new QA style tests, I wonder if this is a thing that should run with gradlew check or not. I think, probably, yes, it should, mostly because we found so much good stuff when we switched the rolling upgrade tests to a more realistic cluster.

@DaveCTurner
Copy link
Contributor Author

3+1 sounds good, but there might be tests that require 2 data nodes. I don't have a good feeling for whether starting 4 (or even 5) nodes is too many for our poor CI machines to handle.

@original-brownbear
Copy link
Member

@DaveCTurner I guess CI machines can be upgraded :) I'm somewhat optimistic here too. I can run tests with 3 nodes on an 8 GB RAM box => 5 shouldn't be too bad?

@DaveCTurner
Copy link
Contributor Author

Agreed. Let's try it and see what breaks...

@original-brownbear original-brownbear self-assigned this Oct 25, 2018
@alpar-t
Copy link
Contributor

alpar-t commented Oct 25, 2018

We have 60GB or RAM for the ephemeral workers right now, and we dedicate 512MB to a node by default (tests.heap.size) so 6 nodes = 3GB not that bad, especially that we don't run these tests in parallel.
If we get to run say 8 of these in parallel we'll require 24GB of ram - not that bad.
I did see nodes linger trough the tests which can be worse if we run more of them.
The new test-clusters plugin #30904 will help with that.

@original-brownbear
Copy link
Member

@atorok I guess with #30904 incoming it may be better to hold off on this one? This will require adding some more logic to the existing cluster formation task and it looks like you're working on making this kind of thing easier at the moment anyway?

@alpar-t
Copy link
Contributor

alpar-t commented Oct 26, 2018

It would make sense to do this with the new code, and I'm actively working on it as a priority, but not sure when it will be complete. If extending the tests is not urgent I would definitely wait.

@original-brownbear original-brownbear removed their assignment Oct 26, 2018
@alpar-t
Copy link
Contributor

alpar-t commented Sep 5, 2019

We now have the testclusters plugin used in every project, so there's only one thing to change, but we don't have a way to pass on per node configuration from a build script.
The plugin itself tracks the configuration per node, so it's easy to implement there.

@original-brownbear @DaveCTurner which tests would we want to change for this ? Is it some in particular and if so how many or a change across the board ? If it's the lather it might make sense to implement it as a flag that would interpret the number of nodes configured for the node as data nodes and add 3 master nodes by default to the test cluster. We would set this up in a new CI job.
I'm not too worried about resource usage, we might have to reduce parallelism for this particular job until Gradle 6 comes out, but it should be doable.

@DaveCTurner
Copy link
Contributor Author

All tests should pass whether they are running on mixed master/data nodes or nodes dedicated for each role, so I think we should be choosing randomly (and deterministically).

@alpar-t
Copy link
Contributor

alpar-t commented Sep 5, 2019

I'm not entirely convinced randomization is the right approach, especially considering how that would interfere with how long a build takes an the amount of resources it needs.
If we think this is a source of bugs, we are effectively aiming for coverage and we should make that explicit by having a dedicated CI job that would add the master nodes to each test cluster and consider the number of configured nodes as data nodes.

@jasontedor
Copy link
Member

Similar to my comment on #35585, I would prefer that we not use randomness for this. Rather, as @atorok alludes to, a dedicated CI job could be considered. While I share the lack of concern for resource usage, I do have a concern for an explosion in the number of CI jobs.

@DaveCTurner
Copy link
Contributor Author

Ok, dedicated jobs for this are good too.

@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
@mark-vieira mark-vieira added Team:Delivery Meta label for Delivery team and removed Team:Core/Infra Meta label for core/infra team labels Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure >enhancement help wanted adoptme Team:Delivery Meta label for Delivery team
Projects
None yet
Development

No branches or pull requests

8 participants