Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IDEA] Add ability for Resource/Channel-specific classloaders to load "child-first" or "parent-first" #5644

Closed
pladesma opened this issue Jan 30, 2023 · 20 comments
Labels
enhancement New feature or request Fix-Commited Issue fixed and will be available in milestone Internal-Issue-Created An issue has been created in NextGen's internal issue tracker RS-9482
Milestone

Comments

@pladesma
Copy link
Collaborator

When users include custom libraries in a Library Resource and then include that resource in a channel, a custom ClassLoader gets created. Java’s built-in classloaders load classes “parent-first”, meaning that it first looks at the parent ClassLoader to see if the class is already loaded and, if so, just uses that class.

This means that if someone attempts to use a class that is already included with Mirth Connect (for example a PostgreSQL driver, a different version of a library like PDFBox, etc.), it will still use the class loaded from the parent ClassLoader, instead of the custom version the user wants to use.

This would entail adding a new “Load Classes Parent-First” or similar option to Resources. If disabled, then the ClassLoader that gets created would always attempt to load classes from the included libraries first, before delegating to the parent ClassLoader.

We will be changing the default from parent-first to child-first (so we will use the custom libraries by default) since this really how it should have always worked.

@pladesma pladesma added the enhancement New feature or request label Jan 30, 2023
@pladesma pladesma added this to the 4.3.0 milestone Jan 30, 2023
@pladesma pladesma added Internal-Issue-Created An issue has been created in NextGen's internal issue tracker RS-9482 Fix-Commited Issue fixed and will be available in milestone labels Jan 30, 2023
@jonbartels
Copy link
Contributor

#4075

@jonbartels
Copy link
Contributor

The label says "fix-committed" will this have a public branch or PR prior to the 4.3.0 release?

@pladesma
Copy link
Collaborator Author

pladesma commented Feb 1, 2023

Yes, I believe we should be able to push our development branch prior to the release.

@ChristopherSchultz
Copy link
Contributor

Looks like this was indeed pushed to development branch. It's not completely clear to me how to use this feature yet, as the commit doesn't include any documentation or instructions.

@pladesma
Copy link
Collaborator Author

pladesma commented Feb 8, 2023

We haven't updated our documentation yet, but basically on the Resources settings screen, there's a new checkbox for each resource called "Load Parent-First" that you can enable or disable. When disabled, the resource will load classes child-first.

@pladesma pladesma closed this as completed Feb 8, 2023
@ChristopherSchultz
Copy link
Contributor

Is MirthContextFactory considered to be a "public API"? If so, these changes break backward compatibility by changing the signature of the one-and-only-one constructor.

If the previous constructor is left there with a trivial call to a new constructor with false as the default value for loadParentFirst, then the whole PR could be smaller (existing code need not be changed unless necessary) plus you would achieve backward-compatibility for those who rely upon it.

@narupley
Copy link
Collaborator

I would not consider it to be "public API", no. In this case I considered that new flag to be important and should be actively considered anytime that MirthContextFactory class is used, rather than passively defaulted.

@ChristopherSchultz
Copy link
Contributor

there's a new checkbox for each resource called "Load Parent-First" that you can enable or disable. When disabled, the resource will load classes child-first.

It's not clear to me what the ClassLoader hierarchy looks like for a channel. You can assign one or more "Library Resources" to a channel. What ends up being the parent? What ends up being the child or children? The pre-existing "isolated ClassLoader" had even different semantics, like having the JVM's application ClassLoader as the parent without including the Mirth server libraries, etc.

Will the child-first ClassLoader behavior extend to everything related to the channel? e.g. plug-ins invoked from source transformers, any js code which attempts to call Java code, etc.

The existing implementation does not attempt to protect Mirth from a channel supplying a class that may conflict with something Mirth either provides itself (such as a dependency) or Mirth itself. I think the ChildFirstClassLoader should refuse to load classes within certain packages e.g. com.mirth.*.

@narupley
Copy link
Collaborator

There are no "parents" or "children" among different resources. Whatever resources are included with a channel or connector, all libraries from all those resources are included in the ClassLoader.

By default it was using Java's standard URLClassLoader, which always loaded parent-first. Now, by default, those channel-specific or connector-specific classloaders will load child-first, unless you enable the new Load Parent-First option.

This only extends to the places where that classloader is used, which is typically just any JavaScript script in the channel. There are a few other places that use it to load classes too, like the Database Reader/Writer, JMS Reader/Writer, and Web Service Listener/Sender.

You can also get more fine-grained in your Channel Dependencies settings, and decide which resources you want to include in which specific connectors. So if you need a resource just to load a driver for your Database Reader, then you only need to include that resource on the Source Connector.

There was some discussion about whether to disallow some packages to be loaded child-first like java. or com.mirth. or so on, but I don't think it's necessary. Custom libraries are already introducing custom code into the JVM, so it's up to the administrator of the Mirth Connect instance to ensure that those libraries are correct. Maybe they actually do want to override a com.mirth class for a specific channel. For example you could include an older version of mirth-client-core.jar to send API requests to a different MC server that may be on a different version than your server.

@ChristopherSchultz
Copy link
Contributor

ChristopherSchultz commented Mar 1, 2023

There are no "parents" or "children" among different resources.

But what is the parent of the channel's ClassLoader? Is it the Mirth class loader, or do you get something which is (somewhat?) separate from that?

This has significant bearing on whether the channel can use a library whose version is different from that provided by Mirth itself.

Maybe they actually do want to override a com.mirth class for a specific channel. For example you could include an older version of mirth-client-core.jar to send API requests to a different MC server that may be on a different version than your server.

That may be the case, but consider this situation:

The channel needs com.mirth.Foo to run properly. The channel supplies its own com.mirth.Foo class.

Assuming that Mirth's server ClassLoader is the parent of the channel's, if Mirth, during message processing, loads com.mirth.Foo first, then Mirth will work while the channel's libraries, etc. may fail because the version does not match. If the channel-supplied library loads that class first, then when Mirth uses it, it may fail due to a version-mismatch.

The child-first ClassLoader only loads child-first if the parent hasn't already loaded the class. It always checks the parent first to see if the class is defined at all before loading the class itself.

The same is true of any other library Mirth may supply, such as commons-foo or slf4-bar.

If a channel can't be isolated from the primary Mirth ClassLoader (which has loaded a large number of classes and libraries), then the child-first ClassLoader is of limited usefulness IMO.

@ChristopherSchultz
Copy link
Contributor

My mental model of this is that of the Java Servlet Specification: the container provides a small number of packages and nothing else. If the container needs to load supporting libraries, it can do that in a separate ClassLoader that the applications (the analog of Mirth channels, here) are never bothered with.

This model may not be perfect for Mirth because Mirth provides significant features, services, etc. e.g. extensions, connectors, etc. which are difficult to separate from e.g. stuff running from javascript.

What would be most useful is the ability to use "local" libraries to do work scripted from javascript and/or from the transitive libraries that are being used from there. Connectors and extensions probably don't need to be overridable in that way.

@narupley
Copy link
Collaborator

narupley commented Mar 1, 2023

But what is the parent of the channel's ClassLoader?

The same as it was before. The Mirth overall main classloader.

if Mirth, during message processing, loads com.mirth.Foo first, then Mirth will work while the channel's libraries, etc. may fail because the version does not match. If the channel-supplied library loads that class first, then when Mirth uses it, it may fail due to a version-mismatch.

That is not correct. Loading an overridden class in a channel-specific child-first classloader will not cause Mirth to start using that class. Nor vice versa.

The child-first ClassLoader only loads child-first if the parent hasn't already loaded the class. It always checks the parent first to see if the class is defined at all before loading the class itself.

That is not correct. It will only check the parent if the class has not been loaded by the child-first classloader yet.

@tonygermano
Copy link
Collaborator

I haven't looked closely at the implementation yet, but my assumption would be that a child-first classloader would

  1. use a class which it has already loaded
  2. else, attempt to load the class from the resources available to it
  3. else, check to see if the parent can provide the class.

If the child-first ClassLoader skipped loading a class when the parent already has it loaded, that's how the standard java ClassLoader model (and URLClassLoader) works.

This model may not be perfect for Mirth because Mirth provides significant features, services, etc. e.g. extensions, connectors, etc. which are difficult to separate from e.g. stuff running from javascript.

I disagree with this. As more third-party extensions are added, it is easy to have conflicts with

  1. library versions between two extensions
  2. library versions between an extension and the core mirth server

I already opened a separate issue (#4350) to address this, and it is perfectly illustrated in #4897

@ChristopherSchultz
Copy link
Contributor

I haven't looked closely at the implementation yet, but my assumption would be that a child-first classloader would

1. use a class which it has already loaded
2. else, attempt to load the class from the resources available to it
3. else, check to see if the parent can provide the class.

I think all hell breaks loose if you don't check the parent to see if the class has already been loaded "upstream".

If the child-first ClassLoader skipped loading a class when the parent already has it loaded, that's how the standard java ClassLoader model (and URLClassLoader) works.

No, the standard URLClassLoader asks the parent to load the class and, failing that, will load the class itself. The child-first loader will ask the parent if it's already loaded, then load it itself if it's not there. There is a subtle difference there.

This model may not be perfect for Mirth because Mirth provides significant features, services, etc. e.g. extensions, connectors, etc. which are difficult to separate from e.g. stuff running from javascript.

I disagree with this. As more third-party extensions are added, it is easy to have conflicts with

1. library versions between two extensions
2. library versions between an extension and the core mirth server

I already opened a separate issue (#4350) to address this, and it is perfectly illustrated in #4897

Given the complexity of Mirth, I'm not sure if a "truly useful" child-first ClassLoader is possible.

If extensions must be isolated from each other but also still be supported by the Mirth core (which itself relies on certain libraries) then you already have a problem: a single extension can conflict with Mirth directly. Adding multiple extensions to the mix further complicates things. Then you have all the places where js code can run which may or may not interact with any particular extension, library, etc.

I used to think that the "isolated ClassLoader" was useless because it forced you to add every single library you might want or need to support your script, etc. but maybe in hindsight, it's the only way for things to actually work. It's like the OSGi model where everything is isolated. I'm afraid that a child-first ClassLoader won't (or possibly can't?) provide the balance I was looking for between "Mirth provides everything I could need" and "I want to bring my own everything, some of which will actively break Mirth components used in the channel".

@narupley
Copy link
Collaborator

narupley commented Mar 1, 2023

The child-first loader will ask the parent if it's already loaded, then load it itself if it's not there. There is a subtle difference there.

Again, this is not correct. The child-first classloader will check if it itself (not the parent) has loaded the class. If not, it will attempt to load the class. If that fails, only then will it delegate to the parent.

@narupley
Copy link
Collaborator

narupley commented Mar 1, 2023

You can see for yourself with a simple test.

JavaScript Writer channel:

logger.info(new org.bouncycastle.jce.provider.BouncyCastleProvider().getInfo());

Spits out BouncyCastle Security Provider v1.71.

Now add a custom version of the BC JARs in your own custom resource, loading child-first, and include it on that channel.

Now it spits out BouncyCastle Security Provider v1.72 (or whatever custom version you included).

Go back to the resource and check "Load Parent-First", and save the resource. You do not need to redeploy the channel. Just send another message, and it'll spit out BouncyCastle Security Provider v1.71 again.

@tonygermano
Copy link
Collaborator

If extensions must be isolated from each other but also still be supported by the Mirth core (which itself relies on certain libraries) then you already have a problem: a single extension can conflict with Mirth directly

Yep

I think all hell breaks loose if you don't check the parent to see if the class has already been loaded "upstream".

Can you explain this? I thought the whole point was to override the parent class if a different version is available to the child.

No, the standard URLClassLoader asks the parent to load the class and, failing that, will load the class itself. The child-first loader will ask the parent if it's already loaded, then load it itself if it's not there. There is a subtle difference there.

I fail to see the effective difference between these two options. The parent will nearly always have the class loaded if it's a class being provided by mirth. And if it doesn't, there's nothing stopping another channel without a custom resource from causing the parent to load the class, thus changing the behavior of the channel with the custom resource.

I think I would actually prefer something like the OSGi model if I understand how it works correctly. If you could register bundles of jars with mirth that could then be utilized by the server, extensions, other bundles, and channels (similarly to the current resource dirs) without worrying about conflicting versions of the same library existing in different unused bundles, that would be ideal.

I realize that is out of scope for this issue, though, and I think this is a step in the right direction.

@ChristopherSchultz
Copy link
Contributor

ChristopherSchultz commented Mar 2, 2023

I think all hell breaks loose if you don't check the parent to see if the class has already been loaded "upstream".

Can you explain this? I thought the whole point was to override the parent class if a different version is available to the child.

Classes in Java are defined by both their FQCN and by their ClassLoader. If both the parent and child ClassLoader separately load com.example.Foo then the following behavior can occur:

From ClassLoader X:
Foo a = new Foo("a");
Foo b = new Foo("a")
System.out.println(a.equals(b)); // prints "true"
From ClassLoader Y:
Foo c = new Foo("a");
System.out.println(a.equals(c)); // prints "false"
System.out.println(a.equals(c)); // prints "false"

This of course depends upon the implementation of Foo.equals, but that is commonly something like:

public boolean equals(Object o) {
  return null != o && o instanceof Foo && ((Foo)[Something foo-related]);
}

(A better implementation checks for this.getClass().isAssignableFrom(o.getClass()) but the point is the same).

You can't treat o like a Foo unless it's actually a Foo. But this check fails because the objects were created from classes loaded by two separate ClassLoaders. So unless your class has been specifically written to allow cross-ClassLoader comparisons, it's likely to fail this kind of usage. Note that this includes heterogeneous Sets and Map keys, etc.

Note that this isn't any different from a situation where two unrelated ClassLoaders are being used. That is, this situation is not unique to parent/child loaders (in either load-order configuration); it can happen with ClassLoaders which are essentially unrelated. I just think that doing things this way makes it easier to mix things up.

This implementation allows a channel (resource, really) to bring its own copies of any class it wants, including things like java.lang.Object or, more likely, com.mirth.whatever. If this happens, the extensions running within the channel can become seriously confused or broken because an incompatible dependency (eg some lib) can be loaded from the child while the dependent code (eg. a Mirth extension, or even the core channel code itself) can be loaded from the parent and then you have a boom.

This is why I've asked for clarification: is Mirth doing anything to protect the running server/core from a developer/library/channel that is either malicious (unlikely, because as Nick rightly claims, anyone who can install classes can own the server) or misguided? I don't see such protections and I think it's likely to cause problems with all but trivial cases.

I had a version mismatch with http-client/http-core with one of my libraries and an older version shipped with Mirth and I had to re-write my code to use the older API because they were just incompatible. If I had the child-first loader available, I would have happily dropped those libraries into it and tried to use them, potentially breaking things in the core during interactions between the two.

No, the standard URLClassLoader asks the parent to load the class and, failing that, will load the class itself. The child-first loader will ask the parent if it's already loaded, then load it itself if it's not there. There is a subtle difference there.

@narupley points out that this is wrong, and he's correct: I was thinking that findLoadedClass was consulting the parent, but it's not: it only looks to see if this ClassLoader loaded that class.

I think I would actually prefer something like the OSGi model if I understand how it works correctly. If you could register bundles of jars with mirth that could then be utilized by the server, extensions, other bundles, and channels (similarly to the current resource dirs) without worrying about conflicting versions of the same library existing in different unused bundles, that would be ideal.

This may be where the project needs to go in order to solve "all possible problems" with custom libraries, etc. but it's a Big Job to do something like that.

I realize that is out of scope for this issue, though, and I think this is a step in the right direction.

+1

I need to build myself a running version of the dev branch and actually play with it instead of just reading code and stirring things up on GH.

@ChristopherSchultz
Copy link
Contributor

This only extends to the places where that classloader is used, which is typically just any JavaScript script in the channel. There are a few other places that use it to load classes too, like the Database Reader/Writer, JMS Reader/Writer, and Web Service Listener/Sender.

A thorough explanation of this will be very helpful. It's not clear to me (or probably most users) where "this ClassLoader is used." I think this was the root of my confusion.

@narupley
Copy link
Collaborator

narupley commented Mar 2, 2023

image

https://docs.nextgen.com/bundle/Mirth_User_Guide_42/page/connect/connect/topics/c_Library_Resources_set_dep_connect_ug.html

For the Source Connector Receiver:

  • Database Reader (to load the JDBC driver, or for the JS script)
  • JMS Reader (to load the connection factory)
  • JavaScript Reader (for the JS script)
  • Web Service Listener (to load the custom endpoint class, or for any Java/JS authentication provider)
  • HTTP Listener (for any Java/JS authentication provider)

For the Destination Connector Dispatcher:

  • Database Writer (to load the JDBC driver, or for the JS script)
  • JMS Writer (to load the connection factory)
  • JavaScript Writer (for the JS script)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Fix-Commited Issue fixed and will be available in milestone Internal-Issue-Created An issue has been created in NextGen's internal issue tracker RS-9482
Projects
None yet
Development

No branches or pull requests

5 participants