Use binary protocol for better performance #159

davies · 2015-04-06T19:22:21Z

Currently, the text protocol is very slow for large objects, we can have optional binary protocol for better performance.

bartdag · 2015-04-06T19:54:29Z

Hi, could you be more specific and clarify what you mean by large objects (Py4J works with references except for primitives and byte arrays)?

I totally agree that for calling methods with large integers or byte arrays as parameters, the protocol is suboptimal because there is a non-trivial encoding/decoding that takes place. Another context where a binary protocol would help is if the client and the server are on high-latency/low-bandwidth network.

But for most of the use cases I encountered (this might not match your experience), the client and the server are on the same machine or the same LAN and most operations do not involve numerical arguments. In this context, the tests I did a few years ago showed that a binary protocol does not help because most arguments (e.g., class name, method name) needs to be converted to strings anyway.

davies · 2015-04-06T20:33:50Z

In Spark MLlib, it's possible to have vector with several millions double in it, we also want to fetch into python (will serialize it in ByteArray first), so we can do the calculation in Python.

In the past, we have created a special channel (local file or socket) to transfer large object in PySpark (broadcast and collect), but for this case, it's not easy to do in this way, because it sit in a general code path.

davies · 2015-04-06T20:35:23Z

@bartdag If you think it's reasonable to do, I could try to send out a patch for this when I have some cycles.

bartdag · 2015-04-06T20:39:27Z

Sending byte arrays as bytes (vs using the text protocol) is doable and probably preferable than trying to create a totally new binary protocol. I agree that transferring vectors with several millions double in it will be slow with the current protocol and using a side channel is for now the best way to go. I'll see what I can do, but you are welcome to provide a patch too!

davies · 2015-04-06T20:41:59Z

We are already sending them back as bytes, but the overhead of string concating and encoding/decoding is still too high.

bartdag · 2015-04-06T20:43:18Z

Yup, I understood. Bytes are converted with base64 so it is really expensive for large arrays.

eperzhand · 2015-10-30T00:04:53Z

Could you please profile it, cause there are several possible reasons for performance drawbacks.

encoding/decoding bytes as b64
passing larger datum through the network
copying large objects in memory
others?

If 2) is the reason, it is possible to compress b64 before sending (zip)

bartdag, can you give a clue why b64 was choosen?

bartdag · 2015-10-30T00:37:19Z

Py4J uses a text protocol and base 64 is a common strategy to transfer non-textual data in a text protocol. The text protocol made sense because Py4J was created to interact with the JVM: methods and class names need to be converted at some point to strings so even protobuf (at the time, maybe not now) was slower than just transferring text.

Py4J was not created for the use case of transferring large byte arrays between the JVM and a Python process, and I don't think there is a solution that will be the best for all scenarios. I've met with several Py4J users who transferred large byte arrays, but some were doing it over the network (zip might help, dedicated channel for byte transfer also) and others were running all their processes on the same host (memory mapped file might help? unix sockets instead of tcp? dumping the bytes to a file and reading it again?).

I can make this particular case configurable (e.g., you could decide which transfer strategy you want to use for large byte arrays), but I'm not even sure which strategy is worth implementing at this point so profiling would definitively help.

As a side note, base 64 encoding was contributed by developers from The Atlantic because I tried first a silly solution that utterly failed even if it was very fast :-)

eperzhand · 2015-10-30T01:07:46Z

What do you mean by The Atlantic? Doesn't it use java.util.Base64?
I think that the most useful to implement is "receiveRawDatum"/"send.." method for gate which could be used to transfer anything.
Then a developer could manually implement ZIP(on java side)-UNZIP(on python or vice-versa)/raw byte arrays/serialized objects transferring.
I.e. for most cases he would use nice Py4j interface, but for performance critical sections - this "raw channel".

bartdag · 2015-10-30T01:14:54Z

Re The Atlantic: #95

eperzhand · 2015-10-30T01:17:55Z

I see - MiGBase64.

rapgro · 2015-10-30T10:13:14Z

Performance comparison of Base64 implementations:
http://java-performance.info/base64-encoding-and-decoding-performance/

We don't use MiGBase64 in the Fedora package, also cause of not allowed bundling of that file. There are patches for Java 7 to use (internal) javax.xml.bind.DatatypeConverter, and java.util.Base64 for Java 8.

http://pkgs.fedoraproject.org/cgit/py4j.git/tree/py4j-Base64-java7.patch
http://pkgs.fedoraproject.org/cgit/py4j.git/tree/py4j-Base64-java8.patch

bartdag · 2015-10-30T12:20:08Z

If I understand correctly, DatatypeConverter is available on all JVMs and is the fastest implementation. That might be a way to slightly improve performance and simplify fedora packaging.

kaytwo · 2016-05-02T13:14:19Z

#201 is the likely culprit for this performance degradation, or at least a large chunk of it. @davies, can you see whether py4j built from master improves the performance you were seeing? The OOM crashes are a different story however.

davies · 2016-05-02T18:44:10Z

@kaytwo The slowness mostly happened on Python side, so the improvements on JVM side will not help much actually.

kaytwo · 2016-05-02T18:53:25Z

This is a change on the Python side (by "built from master" I meant "zip up the library from master" - sorry for being unclear).

yupbank · 2018-04-19T15:04:38Z

is there any updates on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use binary protocol for better performance #159

Use binary protocol for better performance #159

davies commented Apr 6, 2015

bartdag commented Apr 6, 2015

davies commented Apr 6, 2015

davies commented Apr 6, 2015

bartdag commented Apr 6, 2015

davies commented Apr 6, 2015

bartdag commented Apr 6, 2015

eperzhand commented Oct 30, 2015

bartdag commented Oct 30, 2015

eperzhand commented Oct 30, 2015

bartdag commented Oct 30, 2015

eperzhand commented Oct 30, 2015

rapgro commented Oct 30, 2015

bartdag commented Oct 30, 2015

kaytwo commented May 2, 2016 •

edited

davies commented May 2, 2016

kaytwo commented May 2, 2016

yupbank commented Apr 19, 2018

Use binary protocol for better performance #159

Use binary protocol for better performance #159

Comments

davies commented Apr 6, 2015

bartdag commented Apr 6, 2015

davies commented Apr 6, 2015

davies commented Apr 6, 2015

bartdag commented Apr 6, 2015

davies commented Apr 6, 2015

bartdag commented Apr 6, 2015

eperzhand commented Oct 30, 2015

bartdag commented Oct 30, 2015

eperzhand commented Oct 30, 2015

bartdag commented Oct 30, 2015

eperzhand commented Oct 30, 2015

rapgro commented Oct 30, 2015

bartdag commented Oct 30, 2015

kaytwo commented May 2, 2016 • edited

davies commented May 2, 2016

kaytwo commented May 2, 2016

yupbank commented Apr 19, 2018

kaytwo commented May 2, 2016 •

edited