Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail startup (and tests) on jar hell #11932

Merged
merged 3 commits into from
Jun 30, 2015
Merged

Fail startup (and tests) on jar hell #11932

merged 3 commits into from
Jun 30, 2015

Conversation

rmuir
Copy link
Contributor

@rmuir rmuir commented Jun 30, 2015

Note its imperfect, there is currently some intentional "hell" done by ES:

  • specifying elasticsearch.jar twice, in order to override a joda-time class
  • the overriding of said joda-time class
  • some clashes with log4j/log4j-extras

But I removed all the jar hell in the test classpath and so on.

Closes #11926

@rmuir
Copy link
Contributor Author

rmuir commented Jun 30, 2015

On my machine, the scan takes 18-20ms on average: so the cost is negligible, it just scans the entry names from the zip...

@rjernst
Copy link
Member

rjernst commented Jun 30, 2015

LGTM

rmuir added a commit that referenced this pull request Jun 30, 2015
Fail startup (and tests) on jar hell
@rmuir rmuir merged commit 0b2070e into elastic:master Jun 30, 2015
@clintongormley clintongormley added >enhancement :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts labels Jun 30, 2015
@synhershko
Copy link
Contributor

While I admire the effort and understand the drive behind this, this causes Elasticsearch hell for us on several projects due to issues that are beyond our control. This renders using the ES jar impossible for even the minimal use of TransportClient, and blocks us from fully moving to ES 2.x.

Here is a concrete example, one of many I'm afraid.

We are using Elasticsearch from a Storm bolt, and are using Maven as our build system. storm-core has a dependency in kryo, which in turn has some issues with shaded dependencies (EsotericSoftware/kryo#189). This may or may not have been fixed there, but the latest storm-core still uses the affected version (http://mvnrepository.com/artifact/org.apache.storm/storm-core/0.10.0).

As a result, we are seeing the following:

class: com.esotericsoftware.reflectasm.ConstructorAccess
jar1: /home/dan/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar
jar2: /home/dan/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar

Due to:

[INFO] +- org.apache.storm:storm-core:jar:0.10.0:compile
[INFO] |  +- com.esotericsoftware.kryo:kryo:jar:2.21:compile
[INFO] |  |  +- com.esotericsoftware.reflectasm:reflectasm:jar:shaded:1.07:compile
[INFO] |  |  |  \- org.ow2.asm:asm:jar:4.0:compile
[INFO] |  |  \- com.esotericsoftware.minlog:minlog:jar:1.2:compile

And all we need the ES jar for is for TransportClient, not going to run a full blown server / cluster there. And tests (Unit and ITs).

So after fixing all the Jar Hell issues that are indeed under our control (> 1 week worth of work), we are now stuck with Jar Hell issues that are beyond our control.

It appears you yourself experienced the same:

if (clazz.startsWith("org.apache.log4j")) {
. So guess what, the list of exceptions to the rule need to be extended, and the least you could do is allow this. Otherwise, again, the Java API will just be useless by projects with large codebases.

I'm aware of @rmuir's stand on the topic, but this practically renders the ES Java API useless in any large scale project, exactly because the current Jar Hell check bites way too deep. It at least needs to ignore mismatches in shaded dependencies or the like in certain scenarios, which I agree needs to be well defined.

Please allow customisations of exclusions, or disabling Jar Hell altogether At Your Own Risk(TM) at least while running ES as a library (as opposed to as a server instance).

@elastic elastic locked and limited conversation to collaborators Dec 31, 2015
@jpountz
Copy link
Contributor

jpountz commented Dec 31, 2015

I don't really care about your personal fucking opinion
Yes users are absolute idiots

Can we please avoid such aggressivity and only discuss facts and technical arguments.

@rjernst
Copy link
Member

rjernst commented Dec 31, 2015

Running ES either embedded, or as a java client does not run the jarhell check. The issue here is about tests. But testing transport client with an in memory ES is not a good test! If you want to test your use of transport client, run a real integration test, where ES is started in a separate process. That is the only real test, anything else is a fantasyland which does not actually confirm that the system works as expected. And regarding unit tests, you are free to not use the ES test framework. It is intended to test ES components, where jarhell is not allowed.

@mikemccand
Copy link
Contributor

Please keep the discussions here civil and technical! Don't attack the people raising ideas or asking questions; answer with technical counter arguments instead.

@elastic elastic unlocked this conversation Jan 4, 2016
@s1monw
Copy link
Contributor

s1monw commented Jan 4, 2016

@synhershko thanks for bringing this up, I think lots of users might run into problems with our strict checks and I this we should clarify why we are doing the checks and who they are supposed to work.

First lemme clarify your statement:

This renders using the ES jar impossible for even the minimal use of TransportClient, and blocks us from fully moving to ES 2.x.

as @rjernst already mentioned this is not true. The JarHell checks are run doing Bootstrap which is run in 2 ways:

  • if ES is started with the commandline as a service
  • if the ESTestCase is initialized in a static block.

if you run ES embedded or as a transport client this will not be executed. Hence you statement is not correct and will cause a lot of confusion.

If you use the ES test framework you will run into the JarHell checks and that is unfortunate. I think we might be able to help here and maybe move the checks under a special flag only for tests where you can disable the JarHell check for testing in BootstrapForTesting with a system property but not in the actual Bootstrap that we use to run ES. I can totally see this as a compromise. A switch that allows you to not run without the jarhell check in production is on my end not considered as an option since we have to maintain strictness on the server end. If your project can't use the test framework I think we can help with a system property but for the production case that's absolutely required. I still want to run this idea passed @rmuir and others since rob has put a lot of effort into locking the daemon down to pave the way for a painless future.

@synhershko please lemme know if that would help you along the lines...

@synhershko
Copy link
Contributor

Thanks @s1monw.

I stand corrected with regards to the usage of TransportClient. The blocker for us was building unit tests and integration tests with the excellent ES test suite. We have a rather complex system and ES is being used in various parts of it. We want to be able to both unit- and integration- test those components, and creating a test-local cluster is the only way we could do this efficiently, mocking an ES cluster with test data. Maintaining a test cluster is not an option in that case, and we are fully aware of the consequences of running in a non-clean environment. For us they are negligible, given our use cases.

We spent about 2 weeks cleaning up jar hell issues, and ended up complaining about it because we hit walls due to misconfigurations of dependencies which are out of our reach. Since then we were able to bypass Jar Hell with a nice and elegant hack, without forking.

Your suggestion is exactly what we were after, and I think I explicitly stated that above. I never wanted to discard strictness for service / server mode, only for test scope in clients due to various circumstances which are beyond our control. Quick googling on this will show we are not the only ones with this kind of issues.

I hope this clarifies things further, and thank you for chiming in.

@rjernst
Copy link
Member

rjernst commented Jan 4, 2016

@synhershko See the client qa test for an example of how to create a real integration test. This would be much better than using ESIntegTestCase, as it will test against a real ES node, not an in memory one that could be mucked up by jarhell. What does ESTestCase get you that you need for your own unit tests, that you would not get by using either junit directly, or LuceneTestCase?

@synhershko
Copy link
Contributor

@rjernst like I said, maintaining an external cluster for tests is an overkill for us. A test-scope cluster is exactly what we need, and if we get to a point where we see failures that we can attribute to jar hell or the like we will revisit this decision. I've been doing this for a while and it hasn't happened yet.

ESSmokeClientTestCase gives me nothing that I can't achieve with ESIntegTestCase with less headaches. LuceneTestCase is irrelevant for our purposes.

@s1monw
Copy link
Contributor

s1monw commented Jan 5, 2016

@rjernst like I said, maintaining an external cluster for tests is an overkill for us. A test-scope cluster is exactly what we need, and if we get to a point where we see failures that we can attribute to jar hell or the like we will revisit this decision. I've been doing this for a while and it hasn't happened yet.

@synhershko either way we might even go and make that mandatory in upcoming version since it is the right way to test the integration of elasticsearch. Now that in 3.0 you can reset cluster settings there is no real need for Test scope in 99% of the cases so in upcoming version this might be the way to go. For instance all our REST tests run against such a cluster, they are way more reliable than all other tests.

@synhershko
Copy link
Contributor

@s1monw I'd really appreciate if you wouldn't go that way. Having test-scoped, in-memory clusters is perfect for various tests which make light use of Elasticsearch and therefore aren't exposed (or don't care) about the various failures you are afraid of. This is exactly what we are using it for, and would really like to see this functionality stay as it greatly simplifies and speeds-up our testing infrastructure. Not many databases / data-stores provide volatile versions of themselves for testing purposes, and those which do are much easier to work with in dev environments.

@rjernst
Copy link
Member

rjernst commented Jan 5, 2016

I'd really appreciate if you wouldn't go that way. Having test-scoped, in-memory clusters is perfect for various tests which make light use of Elasticsearch and therefore aren't exposed (or don't care) about the various failures you are afraid of.

We have been gradually moving away from this for a reason. It doesn't really test integration, it is really just a complicated unit test. It is much better to have real unit tests (which are actually light weight, ESIntegTestCase is so far from light weight it is not funny), and test integration with a real node. The difference in time between an in memory cluster for tests, and actually spinning up an ES node (which the infrastructure provides in both maven for 2.x and gradle for master), isn't that much, and the latter actually checks it works against elasticsearch.

s1monw added a commit to s1monw/elasticsearch that referenced this pull request Jan 17, 2016
This change moves the JarHell check under the test.security.manager guard
such that users that have to opt out of JarHell for testing can do so by disabling
security. It will all users to use our test framework but when doing so the parameter
`-Dtests.security.manager=false` will signal how serious it is to opt out of this check

Relates to elastic#11932
@s1monw
Copy link
Contributor

s1monw commented Jan 17, 2016

@synhershko here is a workaround for tests #16042

@synhershko
Copy link
Contributor

Looks good, thanks @s1monw . I did some homework wrt to the tests framework the past week, I'll post about it later today or tomorrow.

@synhershko
Copy link
Contributor

Took longer than than expected, but here is my take on testing with Elasticsearch.

There aren't many technologies which provide a good testing framework bundled with the tech, and even fewer data store technologies which provide in-memory testing capabilities. But those which do provide this, are by far easier to get things right with.

Worth noting that the advice given in the discussion here is different than what's advertised in the docs. With regard to the ESSmokeClientTestCase class, it is already implemented as ExternalTestCluster in the aforementioned Java Testing Framework. And then there is also the InternalTestCluster implementation which spins an in-memory, test-scoped cluster for a test to run with.

As a long time user and an Elasticsearch consultant I've seen pretty much every use case you can imagine, and at the end of the day when it comes to testing there are those systems with those Units that can only be really tested when they are paired with Elasticsearch. And for the cases where you are actually running on the JVM, you are particularly lucky.

Think about the Storm use case, where somewhere in the topology there are bolts that issue a query (we are not using es-hadoop). The only real way of testing those bolts and their logic is to have a contained Elasticsearch cluster, with contained index, issue the query based on the inputs, and compare the results. This is a Unit Test per se, one that can only be conducted on a clean index with a known set of documents, which is presumably populated at the beginning of each test method / class.

Now, sure, one could spin his own cluster somewhere and run those tests each against it's own index. But as everyone who had to deal with CI and deployments knows - infrastructure is a bitch. We all want to avoid work there. Every test will have to run on at least 2 different environments (dev local, and CI, and maybe via production probes etc) and requiring an Elasticsearch cluster just for testing on each is an overkill. Ideally, each test should be able to run it's own ES cluster if needed, at least as a fallback, so at least the local dev's experience is flawless. This is uber important in large teams, where many technologies are in use.

There's also questions of security and costs involved.

This isn't about testing ES plugins or cluster configurations, and I couldn't care less about performance or the like. Most of the time it's also not about integration tests. It is just about having (even a minimal) version of Elasticsearch available, with the least headache possible, to perform what we call Unit Tests, but ones that mean nothing without Elasticsearch playing along for basic functionality. For those cases, shard allocation, performance etc aren't a concern.

To facilitate that, we ended up doing the following:

  1. In our testing framework (which happens to be testing our classes and storm topologies we built with it) we took dependency of elasticsearch and the integration tests package.
  2. An abstract class BaseTestWithElasticsearch uses as our base class for unit tests requiring Elasticsearch. The initialiseElasticsearch() and shutdownElasticsearch() are executed at the beginning and end of each test suite / class using the runner's attributes (historically we use TestNG, so @BeforeClass and @AfterClass). Using TestNG wasn't possible before because of the randomizedtesting framework used internally by ES testing.
  3. Initialising Elasticsearch is done by first trying to figure out if there is a cluster available (mostly via configs and env vars) and if there isn't, we initialise an instance of InternalTestCluster. Since the original class has a lot of mocks and randomised testing stuff that we don't really need, we copied it over and stripped it of that complexity. Wasn't too hard to do.
  4. If an external, real ES cluster is indeed available, we initialise an instance of ExternalTestCluster with the URLs to initialise TransportClients with. We use the original class in that case.
  5. Since for some of our internal tests we need to inherit from another class to facilitate Storm topology testing etc, we also enable using BaseTestWithElasticsearch via composition. That is not currently possible via the ES testing infrastructure and provides great aid for our infrastructure.
  6. The BaseTestWithElasticsearch class is where we have all the REALLY USEFUL syntactic sugar around indexing, index creation, maintenance etc. Save lots of code when writing tests and populating initial test data, or validating state.

Wherever we need to test cluster configurations, real integration etc, we obviously use real cluster deployments. This isn't the case in many of our tests, and exactly why we could use such support from the official ES project. As it's already there, it might be worth knowing that there are people who find it useful before deciding to remove it or bluntly declaring that if you use it you are doing something wrong.

We will work on extracting this from our code base and open-sourcing it soon in case someone else finds it useful.

@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >enhancement Team:Delivery Meta label for Delivery team v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fail startup on jar hell
8 participants