Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix backwards compatibility tests in 2.x #13477

Merged
merged 1 commit into from
Nov 2, 2015

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 10, 2015

These tests have gone stale because 2.0 hasn't needed to be protocol compatible with anything. But 2.0.1 and 2.1.0 will have to be backwards compatible with 2.0.0. Since we don't have 2.0.0 released but we still want to work on features for 2.1.0 that have to be backwards compatible with 2.0.0 we should use 2.0.0-beta1 as a stand in for the eventual 2.0.0 release.

Closes #13425
Closes #13484

@nik9000 nik9000 added >test Issues or PRs that are addressing/adding tests WIP v2.1.0 v2.0.0 labels Sep 10, 2015
@nik9000
Copy link
Member Author

nik9000 commented Sep 10, 2015

This is very much a work in progress. It'll take a bit more banging around to fix the remaining BWC tests.

@nik9000 nik9000 changed the title [test] Fix backwards compatibility tests in 2.x Fix backwards compatibility tests in 2.x Sep 10, 2015

---------------------------------------------------------------------------
mvn test -Dtests.filter="@backwards" -Dtests.bwc.version=x.y.z -Dtests.security.manager=false
$ mkdir -p core/backwards && pushd core/backwards
$ curl -O https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.0.0-beta1/elasticsearch-2.0.0-beta1.tar.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the provided dev-tools script for this? get-bwc-version.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I forgot about it. I'll update.

@rjernst
Copy link
Member

rjernst commented Sep 10, 2015

Thanks @nik9000! I left a couple comments. The changes so far look fine, I think.

@@ -62,7 +61,7 @@ public void testSimpleFunctionScoreParsingWorks() throws IOException, ExecutionE
.endObject()
.endObject()
.endObject()
.endObject()));
.endObject()).setSettings("number_of_shards", 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was burried in a commit message:
FunctionScoreBackwardCompatibilityIT relied on tf/idf information but didn't
set the number of shards to 1 or use dfs_query_then_fetch. Now it uses one
shard.

I'll add it to a comment here if you approve.

@nik9000 nik9000 removed the v2.0.0 label Sep 15, 2015
@nik9000
Copy link
Member Author

nik9000 commented Sep 15, 2015

I'm dropping 2.0.0 from this because its not strictly required for 2.0.0 - its needed for 2.0.1 and 2.1.0. We'll get this soon but I don't want it to be a blocker.

@nik9000
Copy link
Member Author

nik9000 commented Sep 17, 2015

Ok - I've finally got this ready for review. It doesn't fix all the tests but I was sinking hour and hours and hours per busted test and I didn't want each one to block getting the rest fixed.

I did squash and rebase this during the review and for that I'm sorry - it was getting pretty sprawling and had some twists and turns that ended up not being good ways to fix the tests. But its really really ready for review now.

@nik9000
Copy link
Member Author

nik9000 commented Sep 17, 2015

Added 3.0 label. This should be merged into 2.0, 2.x, and master branches even though we don't expect master to be backwards compatible with anything.

@nik9000
Copy link
Member Author

nik9000 commented Sep 17, 2015

Ping to @rjernst and probably @dakrone to review.

I've disabled custom data paths for these tests for now because they don't seem to be working properly - probably because the data paths are relative to the path.home and the upgraded nodes have a different path.home than the backwards nodes.

return settings;
}
Settings.Builder builder = Settings.settingsBuilder().put(settings);
builder.remove(IndexMetaData.SETTING_DATA_PATH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should log something here to make it apparent to anyone reading that the setting was removed? Maybe at warn level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@nik9000 nik9000 added the v2.2.0 label Oct 21, 2015


1. Go to
https://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/distribution/tar/elasticsearch/2.0.0-beta2-SNAPSHOT/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use rc1 now, but I certainly hope that by the time this PR gets, GA is released

@nik9000
Copy link
Member Author

nik9000 commented Oct 22, 2015

Quick update for anyone following along at home: as of the last patch the BWC tests that are not @AwaitsFix are now passing reasonably consistently. All but UnicastBackwardsCompatibilityIT pass all the times I've tried them. I'm working on that last one.

After that I think we have two options:

  1. Just merge this.
  2. I keep working on un @AwaitsFixing things until 2.0 is released and then I switch this to use 2.0 and roll back the Version hack I added way back at the beginning of this process and then we merge it.

Then you can run the backwards compatibility tests:

---------------------------------------------------------------------------
mvn verify -pl core -Dskip.unit.tests -Pdev -Dtests.filter="@backwards and not @awaitsfix" -Dtests.bwc.version=2.0.0-rc1 -Dtests.bwc.path="$(pwd)/backwards" -Dtests.security.manager=false -Dtests.seccomp=false -Dtests.jvms=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are -Pdev and -Dtests.jvms=1 required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Pdev isn't really required, but it does speed up the process.

@javanna
Copy link
Member

javanna commented Oct 23, 2015

I had a quick look and left some comments. Also ran tests on my machine, UpgradeIT failed with java.nio.file.NoSuchFileException: /Users/lucacavanna/dev/elasticsearch/core/core/target/J0/./temp/bwc/external_1/data, maybe something is wrong on my machine? Also added 2.0.1 label as this will need to go to 2.0 branch too.

@nik9000
Copy link
Member Author

nik9000 commented Oct 23, 2015

Also added 2.0.1 label as this will need to go to 2.0 branch too.

Awesome.

UpgradeIT failed

Booo! It didn't fail for me but let me try again.

// enable secure computing mode
if (seccomp) {
Natives.trySeccomp(tmpFile);
} else {
logger.warn("Not enabling seccomp because it was explicitly disabled");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the whitespace changes and the warning? I would also not like to document this in TESTING.md or anywhere else. It defeats the whole point of it. I also don't think we should support allowing tests to do this for any reason, but I know I stand alone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the whitespace changes and the warning?

I can.

I would also not like to document this in TESTING.md or anywhere else. It defeats the whole point of it. I also don't think we should support allowing tests to do this for any reason, but I know I stand alone.

I'm happy to spend the next few days working on getting it running under seccomp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think we can launch a little perl script in pre-integration-test, kill it in post-integration-test, that sits on a socket and starts up nodes. Its a perl script for testing. Then the rest of the stuff can run without us having to defeat every security mechanism we are trying to add...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - it looks like some kind of remote executor helper is required for this. I feel like it'd be easier to yank some of the code we already use to start the bwc nodes into a java process started into pre-integration-test than to write it in perl and fight with windows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, i dont really think it is. I guess i don't understand why we are running backwards tests this way, versus doing real integration tests with the rest suite and similar (like we do in qa). All the stuff needed for that is published as maven artifacts. One of these approaches is a "fantasy world" that isn't quite realistic, it requires manual setup and is difficult to run, and flaky as far as jenkins goes. the other is automatic, realistic and reliable... what am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the dependency resolution is funky here. I think we'd be better off with dependency resolution handled by maven/gradle/whatever then how it works now. I suspect the reason the resolution isn't done that way now is because these are left over from single module days.

I think giving the test process some degree of control of the bwc nodes is useful though - it wants to be able to shut them down and spin up new nodes to replace them to simulate rolling upgrades which seems pretty fair.

The way I see this we need these tests working as soon as we want to make 2.0.1 so I don't know that we want to do major surgery to them. I'm all for figuring a way to get them working under seccomp and I'll work on it. With the goal of eventually moving them into the qa project with real resolution. And no elasticsearch nodes running in process with the tests.

@rmuir
Copy link
Contributor

rmuir commented Oct 23, 2015

I'm really really concerned about the execution here. I do not think it should be allowed, even in tests.

If these tests must really run this way (which i dont believe), we should start up some other background test listening on a socket or similar responsible for launching them, instead of allowing execution from the ES environment. (i already see this "setting a precendent" for more leniency).

@s1monw
Copy link
Contributor

s1monw commented Oct 23, 2015

I'm really really concerned about the execution here. I do not think it should be allowed, even in tests.

the reason why we run this way is a historic one. This is how I added it back in 1.2 when we had no BWC tests at all. From the todays perspective I think we should move away from how it's done. The question is if it makes sense to make the test work again and then work from there to cut over to a new system that is controlled from the outside ie. basic tests that replace the once that we have?
To me what is most important are 2 kinds of tests:

  • upgrade tests where we upgrade a cluster from one version to another either as rolling or full cluster restart
  • running simple tests against a mixed version cluster to ensure our wire protocols are working and we can operate in such a scenario

both can relatively simple be done without starting / stopping nodes from within ES and maybe thats a good start? I still wonder if we should make the ones we have work first and then go from there?

@nik9000
Copy link
Member Author

nik9000 commented Oct 23, 2015

I still wonder if we should make the ones we have work first and then go from there?

This patch as it stands gets them working in the same way they worked before.

I'm going to work on a little daemon we can start pre-integration-test that can shim the start/stop of ExternalNodes. We could cut over to using that and keep the tests as they stand and reenable seccomp. I'd be much happier with the PR if we can do that.

There are lots of great things we could do here - move to the qa module and get proper maven dependency resolution so they run on every run - invoke current version of elasticsearch as an ExternalNode so its less mock-ed.

@nik9000
Copy link
Member Author

nik9000 commented Oct 28, 2015

OK - I've build a service that can be forked pre-integration-test to run these the external nodes. Right now it forks in core's build only if tests.bwc.path is set. I'd love to move all of these tests to qa and turn on proper dependency resolution so we don't need tests.bwc.path.

I'm in no way proud of the service's implementation - it gets the job done but I'm sure I've done tons of non normal stuff so feel free to complain about it.

@nik9000
Copy link
Member Author

nik9000 commented Oct 28, 2015

Another update: the tests now pass with the security manager enabled so I've removed the recommendation to disable it. The external nodes and the internal nodes run under the security manager. Only ExternalNodeService runs without it and it runs in the maven JVM, at least it does the way I've got it rigged right now.

@nik9000
Copy link
Member Author

nik9000 commented Oct 29, 2015

Yet another update: the tests now no longer need the external setup and download step, they resolve their dependencies using maven.

* Key used to set the path for the elasticsearch executable used to run backwards compatibility tests from
* via the commandline -D{@value #TESTS_BACKWARDS_COMPATIBILITY}
*/
public static final String TESTS_BACKWARDS_COMPATIBILITY = "tests.bwc";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, can we rather allow customizing repro info from subclasses of ESIntegTestCase? I don't like having to add everything here...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I agree - I think this is nasty. I did it without a whole lot of thought trying to get the move working.

@rjernst
Copy link
Member

rjernst commented Oct 30, 2015

LGTM, let's get this into 2.x/2.1. It won't work on master until we port to gradle, but maybe by then we can re-evaluate how to do bwc tests to simplify this (and use all real nodes instead of some mock and some external).

Moves backwards compatibility tests from core to qa/backwards and uses a
service running inside of maven to fork the elasticsearch nodes so the tests
can continue to run in seccomp.

Because the tests were moved ReproduceInfoPrinter now cannot hardcode which
parameters it prints - it scans them using a new annotation.

`CompositeTestCluster#client` now remakes its client with new node addresses
after every node is upgraded. That makes it possiblie for it to survive a
rolling restart where the new nodes come up using different ports.

`CompositeTestCluster#upgradeOneNode` now brings up the upgraded node with
the same data directory as the old node which prevents the loss of data when
there are no replicas.

`ExternalNode` now uses a `unicastHosts` list to find the other running nodes
rather than relying on multicast which was moved to a plugin in 2.0.
`unicastHosts` is carefully setup by `CompositeTestCluster` to list all the
other nodes that are currently running.

`FunctionScoreBackwardCompatibilityIT` now forces a single shard so the tf/idf
parts of the score don't get juggled.

`ReproduceInfoPrinter` now appends properties required to properly reproduce
the backwards compatibility tests and properly quotes parameters like
`-Dtests.filter="@Backwards and not @awaitsfix`.

About 2/3 of the backwards compatibility tests are passing and the rest are
marked with
`@AwaitsFix(bugUrl="elastic#13522
That issue will track all work getting the backwards compatibility tests
back online. The following tests are so marked:
- BasicAnalysisBackwardCompatibilityIT
- BasicBackwardsCompatibilityIT
- RecoveryBackwardsCompatibilityIT
- SignificantTermsBackwardCompatibilityIT
- SnapshotBackwardsCompatibilityIT

Closes elastic#13425
@nik9000
Copy link
Member Author

nik9000 commented Oct 30, 2015

Squashed and rebased.

nik9000 added a commit that referenced this pull request Nov 2, 2015
Fix backwards compatibility tests in 2.x
@nik9000 nik9000 merged commit 38010f6 into elastic:2.x Nov 2, 2015
@nik9000
Copy link
Member Author

nik9000 commented Nov 2, 2015

Merged to 2.x. Cherry-picking to 2.1 now.

@nik9000
Copy link
Member Author

nik9000 commented Nov 2, 2015

And merged into 2.1: 10315d8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Issues or PRs that are addressing/adding tests v2.0.1 v2.1.0 v2.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants