Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Dependency Isolation #302

Closed
wants to merge 13 commits into
from

Conversation

Projects
None yet
8 participants
Collaborator

xumingming commented Aug 20, 2012

for issue #115

Owner

nathanmarz commented Aug 21, 2012

So I have a bunch of questions/comments. I want to make sure that we get this right.

  1. What happens if I define my bolt like this: new BasicBoltExecutor(new MyBoltImplementation(...))? The bolt that Storm deserializes is a system class, but the inner class I provide is a user class. Storm will use the system classloader for BasicBoltExecutor, will it then know to use the user classloader for the inner class?
  2. What happens is the user code creates a system object like backtype.storm.utils.RotatingMap that's only on the system classpath?

More generally, the semantics we should have are:

  1. When a system class loads a class, it should first check the system classpath and then check the user classpath.
  2. When a user class loads a class, it should first check the user classpath and then check the system classpath.

It would be good to explain exactly how this pull request satisfies these two properties.

Finally, we need to add tests to the codebase that validate the behavior. I propose something like this:

  1. We make a subproject in test/test-topology with its own build that makes a fake topology.
  2. We create a class backtype.storm.testing.Foo in both the Storm codebase and the test-topology codebase
  3. We add a hook to the build that will build an uberjar for test-topology before running the tests.
Collaborator

xumingming commented Aug 28, 2012

What happens if I define my bolt like this: new BasicBoltExecutor(new MyBoltImplementation(...))? The bolt that Storm deserializes is a system class, but the inner class I provide is a user class. Storm will use the system classloader for BasicBoltExecutor, will it then know to use the user classloader for the inner class?

new BasicBoltExecutor(new MyBoltImplementation(...)) this line of code will be part of a XXXTopology class, XXXTopology will be loaded by TopologyClassLoader, then TopologyClassLoader will try to load XXXTopology's symbolic links(which are BasicBoltExecutor and MyBoltImplementation in this case), BasicBoltExecutor will be loaded by the AppClassLoader; for MyBoltImplementation, it will first try to load it with TopologyClassLoader, and find it!.

Collaborator

xumingming commented Aug 28, 2012

What happens is the user code creates a system object like backtype.storm.utils.RotatingMap that's only on the system classpath?

Storm will first try to load RotatingMap with TopologyClassLoader, becuase it is the system class loader(xumingming/storm@d7ea887), but can not find it, so it will delegate the loading to the parent classloader: AppClassLoader, becuase storm jar is in the classpath, so AppClassLoader will find the class and load it.

Collaborator

xumingming commented Aug 28, 2012

More generally, the semantics we should have are:

  1. When a system class loads a class, it should first check the system classpath and then check the user classpath.
  2. When a user class loads a class, it should first check the user classpath and then check the system classpath.

Only the second semantics is necessary. System classloader doesn't need to care about user class, it is guaranteed by the symbolic links theory.

Collaborator

xumingming commented Aug 28, 2012

For the testing, I have tested in my local desktop, but there seems no good way to put the tests into the unit test: we need to run storm run xxxx command for local mode.

Owner

nathanmarz commented Aug 28, 2012

OK. At the very least, we should do the tests in its own project or subproject. Ideally the tests would be set up to automatically test against the latest version of Storm.

Collaborator

xumingming commented Aug 29, 2012

How about something like this?

  1. Add a subproject in $STORM_HOME/test/classloader_test
  2. We provide a script named run_test.sh in $STORM_HOME/test/classloader_test/bin
  3. run_test.sh will run the storm run xxx command and will show the classloader of backtype.storm.testing.Foo.

One thing is that in order to make backtype.storm.testing.Foo loaded by storm core code, we need to reference it somewhere in the codebase, how about put it in backtype.storm.testing.clj?

Collaborator

xumingming commented Sep 7, 2012

added the testing for BasicBoltExecutor

mkscrg commented Oct 9, 2012

Any recent movement on this PR? We're currently having trouble with the zookeeper artifact, where

storm -> curator -> zookeeper-3.3.3
hbase -> zookeeper-3.4.3-cdh4.0.1

And by trouble I mean I don't think we can use hbase from Storm until this is resolved. (Unless we do something drastic like fork hbase and zookeeper, re-root all of zookeeper's packages, and update all of hbase's imports.)

Owner

nathanmarz commented Oct 9, 2012

@mkscrg Theoretically, this patch should work, but it definitely could use more testing and feedback. If you could give it a shot, I'd appreciate it.

mkscrg commented Oct 9, 2012

Ok, I just tried using this to run our topology in local mode via the new storm run command. I get this error:

log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by
log4j:ERROR [sun.misc.Launcher$AppClassLoader@37b90b39] whereas object of type
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [backtype.storm.classloader.TopologyClassLoader@766e3d60].
log4j:ERROR Could not instantiate appender named "A1".
...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/mcraig/Development/storm/storm-0.8.2-wip8/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/mcraig/Development/voltron/application/target/voltron-application-0.0.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
...
log4j:WARN No appenders could be found for logger (backtype.storm.zookeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

I think the first section is the most relevant. The A1 appender is specified by storm's logging config, but it's getting its ConsoleAppender instance from the TopologyClassLoader. How should this be dealt with?

Collaborator

xumingming commented Oct 9, 2012

I guess this is due to how log4j classes are loaded. Our solution is based on a assumption that all storm's dependencies will be referenced by the Storm core class, seems log4j is not? I will have a deep look into it.

Collaborator

xumingming commented Nov 13, 2012

@nathanmarz The reason for the logj4j failure is indeed due to how log4j locate classes, it is analyzed deeply in this article: http://articles.qos.ch/classloader.html . I quote:

"JCL relies on dynamic discovery to determine the user's preferred logging API. In theory, log4j, java.util.logging and Avalon logkit are supported. In order to locate classes and resources, JCL's discovery mechanism relies exclusively on the thread context class loader which is by definition the class loader returned by the getContextClassLoader() method for the current thread. As long as the thread context class loader (TCCL) is not explicitly defined, it defaults to the system class loader, that is the class loader used to load the application."

In short, log4j relies on the ContextClassLoader to locate the classes. i.e. If class A is loaded by ClassLoaderA, class A depends on log4j, the log4j MAY NOT loaded by ClassLoaderA, instead it is loaded by the ContextClassLoader which could be other class loaders, so we saw the error:

"org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.

So we still need to set the ContextClassLoader to the right classloader at every entry point from Storm Server context to user topology's context. But as you have said before, there are a lot of entry points from storm server code to user topology's code, we need to think more how to apropriatly implement this.

coltnz commented Nov 24, 2012

Its re-inventing the wheel not to use OSGi. Its really not that complicated.

I'd recommend squashing some of these commits to avoid unnecessary git log clutter

mkscrg commented Nov 30, 2012

Any updates on this issue? As our system grows and gains dependencies, we're running into this more and more often.

Contributor

strongh commented Apr 4, 2013

Bump.

Also running into similar problems when trying to use httpcore-4.2 in our code vs. the httpcore-4.1 dependency from storm.

lorthos commented Oct 10, 2013

Anyone aware of a Maven Shade Plugin equivalent for Clojure? I guess this would solve most of our problems

@xumingming xumingming closed this Dec 14, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment