Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Java class extension from within Python #420

Open
Thrameos opened this issue May 15, 2019 · 18 comments
Open

Support for Java class extension from within Python #420

Thrameos opened this issue May 15, 2019 · 18 comments
Assignees
Labels
enhancement Improvement in capability planned for future release on-hold Problem requiring further user input to address
Projects
Milestone

Comments

@Thrameos
Copy link
Contributor

I took a shot at extending java objects using Python. In order to do so, we would need to construct two classes from within Python. One is an extension Class which overrides each of the targeted methods and points them to a interface holding the methods to implement. The annotations for the task are pretty easy and the rest of the mechanics are well supported by current JPype. We can even call super class methods because the virtues of JNI. The only limitation appears to be difficulties with overriding the constructor. Thus the question is how do we get the two required pieces compiled on demand.

The need for this development is pretty compelling. There are a lot of Java classes that have abstract methods which must be overridden to implement. Though few new APIs use this style, there are plenty of older ones which still require it. Thus long term it is a requirement for JPype to be considered a complete solution.

I took the first shot at this using the MemoryCompiler. The compiling from within Python is a fairly tall order, but is possible. But I found a lot of downsides. The java compiler is huge, incredibly slow, and spawns into native C code. Further, it is only available when running the jdk copy. Thus I am looking for other solutions.

I have evaluated java assembler solutions such as Jasmin and Lilac. Lilac is certainly the most advanced and capable with exterior goals that cover our needs. And the author seems to have done homework on what was needed for assembler/disassembler capable of "a perfect round trip." Thus I started by studying their inner workings. It is certainly plausible to get what we want from them. However, thus far they are all far from ideal. The assemblers are general purpose and suffer from the typical maven philosophy of pull whatever you need. The result is for one or two functions you are holding on to 4 dependent jars which each must be gotten to work. For a large thing like an assembler this leads to 20 or more dependencies. It is not that maven doesn't make this possible, but as a programmer from the 80s, I don't like to depend on that much 3rd party stuff.

The second problem I have had with each of them is that the assemblers all have poor separation between the data pieces and the process. The data classes in the assemblers do the job of packing (and unpacking in the case of lilac) and rule checking the code. This is way too heavy weight for our purpose. For the purposes of our usage, we just need a java class that gets a class and a list of method references and constructs our two classes within the data model directly then calls the assembler to pack the data into a class file in the class loader. I don't really need to go through creating a big text file, just to have a lexer/parser chop it back into a tree, and then decode it back to a file when constructing the tree directly is trivial and I want the class to go straight to the loader. Therefore, I wanted to strip them down to just few K worth of functions that can carry that task out. But because all of the functionality built into the current assembler classes, it is like a big Jenga tower. I can't pull out the pieces I need because there is too much stuff above it.

Studying the assemblers shows it isn't that hard after all the boiler plate is complete. There is a bunch of table encoders needed to put the file back together, and some decoders to test the process is working. Java has a few support classes to make it possible to do some of the work without 3rd party libraries (DataInput, DataOutput, JarFile, etc). But because the Java compiler is not actually written in Java, it is just pieces.

Since Lilac can't be cut to pieces and I haven't got a response from the author, it seems like I am going to craft the pieces I need. I can still test the back end encoding using a converter from the existing assembler to my mini one, so I can likely cut the work in half over a full tool suite. Studying jas, jasmin, lilac, and the java docs makes the required path pretty clear especially when I can peek over into well tread ground to see how others interpreted the spec even if I can't copy from it directly. After about 4 hours of effort, I pulled together most of the class file data structure. But I anticipate that at least 30 hours will be needed before it is complete, which puts it pretty far down the road before I have something to submit considering other priorities.

If anyone is compelled to assist in the effort I can post the code for the mini-jasm to github.

@Thrameos Thrameos self-assigned this May 15, 2019
@Thrameos Thrameos added the enhancement Improvement in capability planned for future release label May 15, 2019
@Thrameos Thrameos added this to To do in JPype 1.0 via automation May 15, 2019
@Thrameos Thrameos added this to the 0.8 milestone May 20, 2019
@Thrameos
Copy link
Contributor Author

This still has several pieces missing. We need to remove the bootstrap loader and add a altered copy of asm library before we can make this work. Therefore, I am pushing this to 0.9.

@Thrameos Thrameos removed this from To do in JPype 1.0 Mar 30, 2020
@Thrameos Thrameos added the on-hold Problem requiring further user input to address label Apr 6, 2020
@petrushy
Copy link

Hi,

This feature is quite neccessary in some use cases. One way to do it is like it is done in JCC, to have specific java classes that are used for the classes to be subclassed and then take care of the selection of which python routine to call wihin java. It is far from ideal but could maybe be used as an intermediate stage.

An example of how this can be used is one of the wrapping classes of orekit:
https://github.com/petrushy/Orekit/blob/python-wrapper-additions-v10.1/src/main/java/org/orekit/propagation/PythonFieldPropagator.java

It isn't possible to do something like this today in JPype I understand?

Regards

@Thrameos
Copy link
Contributor Author

This item is on the roadmap to 0.9. I have a basic prototype which uses the Java asm library to rewrite a Java class into an interface with an extension hook for Python. I am currently juggling a number of development items and have 3 items ahead of this on the schedule. I can try to bump it up in the schedule if there is a strong interest, though I doubt I can push it in front of the 0.8 release which is item 1.

The task list for this is

  • replace the thunk boot strap path with a jar loaded loaded along side the module. (Currently the thunk solution limits us from pulling in the asm library)
  • Consider using JarJar to rename asm symbols to avoid conflicts with user code.
  • install ivy to pull in external libraries to be installed with JPype. (complete in another branch)
  • Define decorators for extension.
  • Set up prototype class implementing an extension of an Object class. (done when I last attacked to evaluate difficulty)
  • (*) Copy the prototype methods into a recipe file which is used to create the stub.
  • (*) Deal with all the edge cases (variable arguments, exceptions, exception frames)
  • Install stub generator classloader into JPype context (easy now, was hard earlier).
  • Write backend for decorator hook @JExtends which will call the stub generator, scan the class for @JOverride and push out the class definition.
  • (*) Deal with how the extended class will interact in terms of Java methods.
  • Test, test, test.
  • Document. (actually this is mostly complete as I wrote out a guide at the start of the process defining how this would look).

The items with the stars are the long poles on the tent.

@Thrameos Thrameos added this to To do in JPype 2.0 Apr 30, 2020
@Thrameos Thrameos modified the milestones: JPype 0.8.0, JPype 0.9.0 May 15, 2020
@enjoybeta
Copy link

Extending class is an important feature for inheritance. Looking forward to seeing this feature! Do we have any updates?

@Thrameos
Copy link
Contributor Author

I completed a prototype of it a few months back as a proof of concept, but it has a lot of issues that need to be resolved.

In particular, the prototype requires that asm take the class and create two new classes. The first class is an extension of the existing class which holds a proxy object. The proxy object is the same with all of the methods converted into interface. So the hard part is how to decide if a method is implemented by the Python class.

One solution is to have the proxy invoker check for existence of Python method in the dict and if it isn't there throw an exception. Unfortunately, exception stacks are pretty expensive so that is not a great solution. We can use a flag to verify that it was actually called. This isn't thread safe because if two methods get called the flag may not reflect the actual value of this call. So that leaves the two part invocation model. In this we have one JNI call the fetches the Python method if present and if not calls the base Java method. Otherwise it launches the Python using a second JNI method. We didn't have the infrastructure for proxies to directly load complex JNI methods but I think that is finally resolved.

The Python implementation is straight forward. We need the meta class to recognize the attempt to extend a concrete Java class and call the invoker hooks which generate the two classes. We then scan for the JOverride methods which will create a Python proxy which implements the methods and installs the proxy in the object instance.

There is also a memory loop issue. The Python instance points to a Java object which has a reference to back to the Python object. Thus for this to function at least one of these needs to be a weak reference type or ever instance will live forever.

The last issue is properly cloning the methods that need to be overriden. Simple argument lists can be done with the ASM visitor pattern, but in some cases we also need to copy the exception list. I think I have mastered this pattern in my last attempt. But there may be edge cases.

Either way the newly created classes need to be loaded into memory to work. This means we have to call a custom class loader. The new dynamic classloader should be extendable for this task pretty simply. The alternative that I did was to make a custom class loader which is tied to the ASM directly. That is a fairly common pattern in which the loader/generator are tied together as one class. I did also run into some security problems as the loader that creates the class is different that one that creates the application classes. In some cases

The last wrinkle is how to make this work on Android. The Android "JVM" is not actually Java bytecode but DEX. I can likely get the new code to simply not work on DEX by using a patch to remove the code, but if we did want to work there it would require a completely different solution than ASM. There is also the concept of the security model that we face there. Either way we need to modularize so that the with and without options don't require a massive amount of patching.

I have put a fair amount of thought and prototyping effort, but I will likely need to make a hard push to actually get something which is usable. It is not terrible, but thus far I haven't had much in the way of compelling use cases within my local group to motivate me so it keeps getting put on the back burner.

@Thrameos
Copy link
Contributor Author

Thrameos commented Nov 9, 2020

I took a shot at this one over the weekend.

The hardest part is how to properly allow the user to call "super" during the construction stage and potentially for members. In Java it is possible to call the base constructor only at specific times during the initialization process. It seems like I need a special object type when a proxy method gets called which will contain a special super_ member.

I think if possible it should look like this...

class MyObject(java.lang.Object):   # use a direct inheritance to indicate that this will be implementing a Java object.

   @JPrivate(java.lang.String)  # A decorator is required to reserve a private dictionary spot in the Java extension.
   name = None 
   
   @JConstructor(java.lang.String)  # An decorator is required to tell what the Java formal arguments are.
   def __init__(this, name):   # The actual name of the argument doesn't matter as we will need to have it name mangle 
         this.super_()  # Call Object init()
         this.name  = name

    @JOverride   # We can use the usual JOverride to define a dispatch or use arguments to create a name mangled.
    def toString(this):
         return this.super_.toString()+":"+this.name

Here when the proxy call gets back it would give you a special copy of "self" which has reserved access to the get/set and super_. Rather than the usual rules this would give access to private members (so long as they do not conflict) with existing symbols as well as directly calling the parents method (using a non-virtual call). It would also have to "close" itself at the end of the method so that the access is removed.

I still need some more consideration on this topic before I can start implementing. Does this look reasonable?

@enjoybeta
Copy link

I am not familiar with low-level logic at the moment, but your reasoning looks good to me.
Only a side note, maybe naming super_() to super_java() or something more obvious? An underscore is easy to miss

@Thrameos
Copy link
Contributor Author

Thrameos commented Nov 10, 2020 via email

@Thrameos
Copy link
Contributor Author

So lets start with the basics. To support Java, we need to the Python class dictionary to do some things that Python doesn't allow. Python does not allow a method definition to get ahold of the class that it is being defined in, nor allow for overloading, nor does it support the concept of name mangling. So we are going to have to do some evil hacking first.

The usual way to do this is to add a decorator that will break into the process. The order of operations for creating a class is

  1. Create an outer scope with the class qualname.
  2. Execute each Python statement within that scope (this includes define statements)
  3. If a define function is called, evaluate the define, call each decorator ordered last to first). Merging the class qualname into the function qualname.
  4. Define the symbol in the local table to based on the name of the define.
  5. Pass the local dictionary to the type.__new__ method as the argument members were we can intercept the class creation process and redirect it to the class builder.

Unfortunately, in the CPython implementation the name is taken through a different path than the actual __name__ argument so that means we can't simply rename the function using the dictionary in the decorator.

Instead we have to get ahold of the outer scope which for the decorator would be the class scope. Python has three methods to get ahold of the scope globals, locals, and vars. And of course none of the three have access to the class scope as it is a temporary scope which has no name. So we have to use inspect to grab the stack and then proceed back to find the frame and then use it to get that dictionary. We can then name mangle the arguments and make a second private entry in the members dictionary where the class builder can find it and construct the required entry. This fixes our class overloading and name mangling issue.

Next we can then use inspect to look at the function and make sure it meets any other requirements that need to be set. This makes it so we can enforce those extra Java requirements like all methods must take Java class arguments only.

The resulting code looks something like this.

(Please note, I disavow any and all knowledge of Python. Any resemblance to actual Python is purely coincidental.)

def JPublic(method):
    import inspect
    # Python doesn't make it easy to get the outer scope
    nonlocals = inspect.stack()[1][0].f_locals

    spec = inspect.getfullargspec(method)
    args = spec.args

    # Verify the requirements for arguments are met
    # Must have a this argument first
    if len(args)<1:
        raise TypeError("Methods require this argument")
    if args[0]!="this":
        raise TypeError("Methods first argument must be this")

    # All other arguments must be annotated as JClass types
    for i in range(1, len(args)):
        if not args[i] in spec.annotations:
            raise TypeError("Methods types must have specifications")

    # Rename the method so we can support overloading
    hc = abs(hash(frozenset(method.__annotations__)))
    name = "_%s$%d"%(method.__name__, hc)
    nonlocals[name] = method

    return method

class B:
    @JPublic
    def A(this):
        pass

    @JPublic
    def A(this, n:str):
        pass

    @JPublic
    def A(this, n:str, i:int):
        pass

@enjoybeta
Copy link

Thank you for your detailed explanation! I have a grip on your planned approach after reading it.
One question though, one Java library I am working on, which wants to port to Python. uses dynamically class loading. Is it feasible to load a JPype enabled Python class similar to something shown below?

Java

public class Parent {
    ......
}

public class Schools {
    public void foobar(String className) {
        // load a child class and create objects
        ......
    }
}

Python

class Child(Parent):
    pass

def main():
    Schools.foobar(Child.__class__.__name__)

@Thrameos
Copy link
Contributor Author

@enjoybeta with regard to dynamic loading we should likely take this to another thread.

@Thrameos
Copy link
Contributor Author

It appears that Jython uses the __init__ rather than super.

So my best guess of syntax would be something like the following.

def MyImplementation(java.util.ArrayList):

    # Declare a Java slot of type int 
    JPrivate(JInt, count = 0)

    @JPublic
    def __init__(this, count:JInt):
        java.util.ArrayList.__init__(this, count)

    @JPublic
    def add(this, item:JObject) -> None:
        this.count += 1
        java.util.ArrayList.add(this, item)

@marscher Any thoughts? Unfortunately this is not very compatible with Jython as far as I can tell. We need a lot more annotation information as we need to support stuff from Java such as overloading. They pretty much skipped the overloading and dropped Java keyword support for stuff like super.

The Pythonic method of declaring variable slots is really not usable. I tried Pythons var:type = value but there is no way to drag the type around properly. You also can't add any additional notations to an instance as decorators only work on functions and classes. The Python slot reservation mechanism is also pretty ugly with __slot__= ["var", ...]. It can't carry types or initial values. Pushing the names and initial values as kwargs to a function works, but then we will have issues if we need additional specifiers like JPublic(JString, a="foo", b="bar", static=True, transient=True) => public static transient String a="foo", b="bar";. But then static and transient are keywords in Java so you can't name a variable with them so I guess it doesn't create conflict potential.

@Thrameos
Copy link
Contributor Author

Thrameos commented Nov 14, 2020

@marscher I need an executive decision on something. In order to do extensions I will need to use the asm library. If I import the library then it will preclude a second copy of asm such as the one from Kotlin from being included at the same time which may mess things up unless both are using the same version. The second approach would be to use JarJar or similar to rename all the symbols in the library to something else and then include it in the JPype jar. Other than the special magic to run JarJar and do the jar inclusion pattern this one is doable. Third would be to copy their source into JPype source tree and compile it in with a new package name. This added headaches to maintenance.

As we have Scala and Kotlin users it seems like we should plan to avoid conflicts.

The license for asm seems pretty liberal (BSD) and would allow direct linking. Their documentation seems to indicate that is an acceptable solution so long as their copyright statement is included.

https://asm.ow2.io/license.html

Which of these options seems acceptable?

  • Just include the asm library in the JPype distribution and if there is conflicts fail.
  • Rename the symbols in asm using JarJar Links and embedded the jar.
  • Embedded the asm source code in JPype source tree.

@marscher
Copy link
Member

If we can include the source tree of asm as a subtree, that would be preferable as we can easily update it from upstream. How stable is asm? Do you expect lots of future updates? If not we could also do the static approach and rename using jarjar. I would just want to exclude option one for now.

@Thrameos
Copy link
Contributor Author

JarJar was a no go. It is so out of date that it gets hung up on many modern jars as the ASM version it includes is ancient.

So I am going with include source for now. ASM has changed a bit in recent years to keep up with the lastest JVM changes, but we are going to be making "old" (and I mean really old... like JDK 1.5) asm for our hooks so we likely have no need to track the ASM latest. We could even strip out a lot of it as I will be using only about 20 op codes for the majority of the work and I can skip a lot fancy stuff. The plan is simple, create a set of native hooks that we will use to transfer control back to C, write a prototype class which exercises those hooks, decompile the prototype with javap, the write a class visitor which takes and existing class and replaces the methods with the redirection hooks. (Okay maybe that didn't sound as simple as it should, but don't worry. I got this.)

I may have to bifurcate some things in the native directory level. Android uses a different machine so I will need to compile in ASMDEX rather than ASM when building hooks for that platform. But first things first, lets get the basic extension module complete.

@Yu-Vitaqua-fer-Chronos
Copy link

Heya, it's been two years now (roughly), has there been any progress or new decisions regarding this?

@Thrameos
Copy link
Contributor Author

Thrameos commented Mar 24, 2022

Status remains largely unchanged. I completed a prototype two years ago for a reverse bridge allowing Java to call Python and using ASM to convert directions into Java classes which is the first step to full integration such as extensions written in Python, but in order for it to enter productions it would required additional programmer help as we would need to test all of the different aspects of the reverse bridge capabilities. And at the same time my employer decided that they would not sign the Python community contribution agreement. As I am only allowed to work on projects that do not require a signed agreement this left me at an impasse on how to proceed as a number of actions required being able to contribute the hooks to better support language integration.

If I can get together some interested users that are willing to help write the tests then we can complete JPype 2.0 which would include this feature, but as it stands my employer prohibitions leave with very little motivation proceed by myself which could potentially jeopardize my employment.. The cost of working in a bureaucracy is often silly and ill informed decisions have unintended and harmful consequences.

@Thrameos
Copy link
Contributor Author

The main technical issue is the ability to call private functions from within Python defined extensions. JPype currently can only call public methods and unsafe access have been largely eliminated by the module system post 9. So while I can allow someone to write a trivial extension class it would only be for SAM or interfaces for which there are not private members to access. Sill useful but not much use over what we have now. You can always write a short piece of Java and include it in a package as a jar aor class file so just allowing extensions for limited use is not necessarily a big advancement. Of course if we have the full reverse bridge that would be an advancement as we would no longer be converting container by instead passing them to Java. But this change will certainly break some code as all code currently assumes that Java classes receive native Java types rather than Python wrappers. There is a also the memory issue as once you pass a Python wrapper that is held in Java, then you have memory cycles. Neither Python nor Java were designed to be able to handle external memory management and they lack a protocol to communicate with another memory management system. I left that as a "hard problem" as unless I rewrote portions of Java or Python it would be difficult to resolve. I considered trying to exploit the Java RMI which does have edge memory management but it did not have hooks for local use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement in capability planned for future release on-hold Problem requiring further user input to address
Projects
JPype 2.0
  
To do
Development

No branches or pull requests

5 participants