Skip to content

Alternative Libraries

Tyler edited this page Dec 20, 2020 · 1 revision

The concept of wanting to call or use one programming language from a different programming language is not an original idea. As such, Pypeline is certainly not the first (or necessarily the best) way to call Python from Java. Ignoring Pypeline, there are three primary ways to execute Python from within Java, each discussed further below.

  1. Directly executing a Python script (using ProcessBuilder or similar)
  2. Jython
  3. Py4J

Method 1: Directly executing a Python script

This method is equivalent to calling a Python script from the command line, but in Java.

For example, say you had an 'add.py' file that took numbers as arguments and returned the sum of those values. From the command line, you may call it like: python add.py 1 2 3 4 5 to which it would output (i.e., print to the console): 15

In Java, you can use the ProcessBuilder class to construct this command and then use the BufferedReader + InputStreamReader classes to capture the output. The issue with this approach is that it is not interactive and requires starting the Python interpreter each time you want Python output. Though, because it is useful in some contexts, Pypeline offers the runFile function which accomplishes this while abstracting away from the creation of Java classes.

Method 2: Jython

As the Jython site explains it, Jython is a Java implementation of Python and allows embedded scripting, an interactive interpreter, and rapid application development via Python's more concise syntax.

The first issue with Jython is that it is not compatible with C-compiled Python libraries, such as NumPy or SciPy or most AI-related libraries. This is a major issue for potential AnyLogic users, as those are core data science libraries. While libraries such as JyNI build on Jython to add some compatibility, there is not yet full support for the range of desired features.

The second, and arguably larger, issue with Jython is that it only supports Python 2.7. Support for Python 3 is only in the MVP stages.

As official maintenance for Python 2 is ending this year (2020) and over 80% of Python users are using v3+ (based on the 2018 JetBrains survey, Jython was not considered for the primary "backend" for Pypeline.

Method 3: Py4J

As the Py4J website explains, Py4J enables Python programs to dynamically access Java objects. Methods are called as if the Java objects resided in the Python interpreter and Java collections can be accessed through standard Python collection methods.

Note however, that the primary purpose of Py4J is for allowing Python to control a Java environment (whereas Pypeline is seeking to do the opposite). As a secondary feature, Py4J also enables Java programs to execute Python code via callbacks (which is aligned with what Pypeline is seeking to do0.

If Py4J were to have become the backend of Pypeline, it would require end-users to do two things whenever it was desired to call Python code:

  1. Create a new Java interface (in the AnyLogic model) that declares the name, return type, and argument types of the Python function(s) to be called.
  2. Create a new Python class which implements the function declared in the Java interface. Both the arguments passed and value returned must be a Java-compatible object. To use with other Python code, you would need to convert the Java List-type to a Python list, send it to your other Python code, then convert it back into a Java List-type.

This is a simplified version of the explanation and the technical details can be read from the Py4J docs.

While there is nothing inherently wrong with this approach, there is no way for Pypeline to automate or abstract away from any of this process. It would require users to write extra code around existing code, following a specific syntax, and for each model that was desired to use Python in. It also has more "moving parts" than was desired; it requires the user to convert between Java and Python types and to initiate the connection from the Python side first (running the AnyLogic model/Java first would result in errors).

Method 4: Other approach

In the initial designing of Pypeline, there were some desired features to be at its core:

  • Allow for interactive Python execution
  • Able to run Python 3
  • Not be limited in the available 3rd party libraries
  • Streamlined into user's existing workflows (i.e., minimizing the amount of extra code required around existing code in order to execute it)
  • Be intuitive to use, based on workflows users are already familiar with

The approach decided on was heavily inspired by NetLogo's Python-Extension, which itself sought to interactively call Python from Scala. It did this by connecting to an existing Python environment on the users machine and opened an interactive environment via system calls, then using Python's built-in string-to-code parsing for execution (sound familiar?). This approach ended up satisfying all the desired features, while also being able to expand on NetLogo's version with added support for JSON.

Conclusion

If you are looking for an alternative library to use for the purpose of interactively calling Python from within Java, the most compatible would be Py4J. However, if the calls to Python are limited (e.g., requesting the values of parameters at the start of the model), it may be most simple to go with the ProcessBuilder approach; Pypeline uses this for the 'runFile' function and is available to be called statically. Benchmarks for these methods will be examined on another page.