Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Issue due to TCP Nagle's Algorithm write,write,read Situation #516

Open
trohwer opened this issue Apr 25, 2023 · 1 comment
Open

Comments

@trohwer
Copy link

trohwer commented Apr 25, 2023

I am aware, that passing byte arrays via py4j is not (supposed to be) very efficient (although some copy operations along the path could be avoided).
Nevertheless I was initially surprised by the following performance difference (here in py4j-0.10.9.5.jar included in PySpark on Linux):

import time

b=spark.sparkContext._jvm.java.nio.ByteBuffer.allocate(4096)
t0=time.time()
for i in range(0,100):
u=b.array()
print(time.time()-t0)

0.04267597198486328

b=spark.sparkContext._jvm.java.nio.ByteBuffer.allocate(8192)
t0=time.time()
for i in range(0,100):
u=b.array()
print(time.time()-t0)

4.404087543487549

It turns out that the code suffers from Nagle algorithm here. E.g. in the CallCommand

	writer.write(returnCommand);
	writer.flush();

if writing returnCommand exceeds the buffer of the BufferedWriter, there are two writes to the socket output.

After disabling the Nagle algorithm for loopback sockets by adding the following in ClientServerConnection.java

super();
this.socket = socket;

// added
if (socket.getLocalAddress().isLoopbackAddress()) socket.setTcpNoDelay(true);

this.reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), Charset.forName("UTF-8")));

I get the following run time measurements:

0.047772884368896484
0.07696914672851562

I think, that for loopback sockets disabling the algorithm does not have any disadvantages, since buffering occurs in the BufferedWriter. Possibly one could disable it in general.

@trohwer
Copy link
Author

trohwer commented Apr 25, 2023

See pull request #517 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant