Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnixServer example doesn't work on Mac with either nc or socat #82

Closed
zonkhead opened this issue Mar 15, 2020 · 8 comments
Closed

UnixServer example doesn't work on Mac with either nc or socat #82

zonkhead opened this issue Mar 15, 2020 · 8 comments
Milestone

Comments

@zonkhead
Copy link

I'm trying to make a server using the UnixServer example as a starting point. On my Mac, when I send more than two buffers worth to it, it just hangs and no longer reads more data. Here are two example commands that should work (and do work on Linux).

➜ ~ cat .emacs | socat UNIX-CONNECT:/tmp/fubar.sock -
➜ ~ cat .emacs | nc -U /tmp/fubar.sock

All I'm doing is running the UnixServer class straight out of the box. I'm using version jnr-unixsocket 0.28.

The funny thing is that if I make the ByteBuffer smaller than 1024 (like 512), it hangs after just 1 buffer read. All buffer sizes work fine on Linux.

@headius
Copy link
Member

headius commented Mar 17, 2020

Hmm ok seems like a problem with the UNIX socket subsystem... but only on Darwin?

Can you make a simple script or test that shows the problem?

@zonkhead
Copy link
Author

zonkhead commented Mar 18, 2020

You already have it. Your UnixServer class. Invoke it with the script lines above. Maybe both socat and netcat don't work well with unix sockets on Darwin. Probably not but I can't be certain.

@headius
Copy link
Member

headius commented Mar 18, 2020

Ah I understand now. Will investigate.

@headius
Copy link
Member

headius commented Mar 18, 2020

I did not have a socat command on my MacOS machine, so I assume you installed that separately.

With socat (from homebrew) it appears to write 1024 bytes and then hang here:

"main" #1 prio=5 os_prio=31 tid=0x00007fa192004000 nid=0x1a03 runnable [0x0000700000b5f000]
   java.lang.Thread.State: RUNNABLE
	at com.kenai.jffi.Foreign.invokeL6(Native Method)
	at com.kenai.jffi.Invoker.invokeL6(Invoker.java:455)
	at jnr.enxio.channels.Native$LibC$jnr$ffi$1.kevent(Unknown Source)
	at jnr.enxio.channels.KQSelector.poll(KQSelector.java:165)
	at jnr.enxio.channels.KQSelector.select(KQSelector.java:145)
	at jnr.unixsocket.example.UnixServer.main(UnixServer.java:47)

I don't see why it hangs here, but I did notice that socat sends data in 8196-byte chunks by default. If I change that to 1024-byte blocks, it gets further... 8 blocks successfully transit the server, and then the server exits with a "Broken pipe" error indicating the client has gone away.

For my test, I used a file that's 11423 bytes long, so I would expect to see that much data transit the server.

So two questions out of this:

  • Why does the server get stuck selecting for read after the first buffer of data? (you say two but I could only get one to go)
  • Why when the client matches the server's buffer size do I only get through 8 buffers before it disconnects?

@headius
Copy link
Member

headius commented Mar 18, 2020

Suspecting this might be an interaction between jnr-unixsocket and socat I thought I'd play with the UnixClient we also have in examples.

With only 9 bytes written, it works fine.

If I modify it to send 9000 bytes, with a loop to read everything using the same 1024-byte buffer, it gets stuck after two 1024-byte buffers have been filled.

At that point, the server is in the same place it is for socat with the client stuck here:

"main" #1 prio=5 os_prio=31 tid=0x00007ff309805000 nid=0x2303 runnable [0x00007000011c7000]
   java.lang.Thread.State: RUNNABLE
	at com.kenai.jffi.Foreign.invokeN3O1(Native Method)
	at com.kenai.jffi.Invoker.invokeN3(Invoker.java:1061)
	at jnr.enxio.channels.Native$LibC$jnr$ffi$1.read(Unknown Source)
	at jnr.enxio.channels.Native.read(Native.java:115)
	at jnr.unixsocket.impl.Common.read(Common.java:51)
	at jnr.unixsocket.impl.AbstractNativeSocketChannel.read(AbstractNativeSocketChannel.java:72)
	at jnr.unixsocket.UnixSocketChannel.read(UnixSocketChannel.java:253)
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:59)
	- locked <0x000000076d9d4c78> (a java.lang.Object)
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
	- locked <0x000000076daa2bd0> (a sun.nio.ch.ChannelInputStream)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	- locked <0x000000076daa2b40> (a java.io.InputStreamReader)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.Reader.read(Reader.java:100)
	at jnr.unixsocket.example.UnixClient.main(UnixClient.java:55)

The client appears to be stuck reading from the server, while the server is stuck waiting for more data... even though we've dumped 9000 bytes on the wire!

Here's the patch for the client:

diff --git a/src/test/java/jnr/unixsocket/example/UnixClient.java b/src/test/java/jnr/unixsocket/example/UnixClient.java
index 3bdfc6c..aaafafc 100644
--- a/src/test/java/jnr/unixsocket/example/UnixClient.java
+++ b/src/test/java/jnr/unixsocket/example/UnixClient.java
@@ -42,6 +42,7 @@ public class UnixClient {
             }
         }
         String data = "blah blah";
+        for (int i = 0; i < 1000; i++) data += "blah blah";
         UnixSocketAddress address = new UnixSocketAddress(path);
         UnixSocketChannel channel = UnixSocketChannel.open(address);
         System.out.println("connected to " + channel.getRemoteSocketAddress());
@@ -51,17 +52,19 @@ public class UnixClient {
 
         InputStreamReader r = new InputStreamReader(Channels.newInputStream(channel));
         CharBuffer result = CharBuffer.allocate(1024);
-        r.read(result);
-        result.flip();
-        System.out.println("read from server: " + result.toString());
-        final int status;
-        if (!result.toString().equals(data)) {
-            System.out.println("ERROR: data mismatch");
-            status = -1;
-        } else {
-            System.out.println("SUCCESS");
-            status = 0;
+        while (r.read(result) > 0) {
+            result.flip();
+            System.out.println("read from server: " + result.toString());
+            result.clear();
         }
-        System.exit(status);
+//        final int status;
+//        if (!result.toString().equals(data)) {
+//            System.out.println("ERROR: data mismatch");
+//            status = -1;
+//        } else {
+//            System.out.println("SUCCESS");
+//            status = 0;
+//        }
+//        System.exit(status);
     }
 }

@headius
Copy link
Member

headius commented Mar 18, 2020

Ok I think I have some answers. I'm not sure it's a bug, but it's an explanation of what we're seeing here.

Because the UnixClient seemed to also hang in a read, I suspected that the server was only seeing a partial view of the content. I modified the ServerActor to not just read 1024 bytes, but to read as many bytes as it can before getting a "0" return value.

The result is that the server successfully reads and writes all 9000 bytes from my modified client.

Heres the patch:

diff --git a/src/test/java/jnr/unixsocket/example/UnixServer.java b/src/test/java/jnr/unixsocket/example/UnixServer.java
index a70a924..787f4f6 100644
--- a/src/test/java/jnr/unixsocket/example/UnixServer.java
+++ b/src/test/java/jnr/unixsocket/example/UnixServer.java
@@ -104,16 +104,20 @@ public class UnixServer {
         public final boolean rxready() {
             try {
                 ByteBuffer buf = ByteBuffer.allocate(1024);
-                int n = channel.read(buf);
-                UnixSocketAddress remote = channel.getRemoteSocketAddress();
-                System.out.printf("Read in %d bytes from %s%n", n, remote);
+                int n;
 
-                if (n > 0) {
-                    buf.flip();
-                    channel.write(buf);
-                    return true;
-                } else if (n < 0) {
-                    return false;
+                while ((n = channel.read(buf)) > 0) {
+                    UnixSocketAddress remote = channel.getRemoteSocketAddress();
+                    System.out.printf("Read in %d bytes from %s%n", n, remote);
+
+                    if (n > 0) {
+                        buf.flip();
+                        channel.write(buf);
+                        buf.clear();
+//                        return true;
+                    } else if (n < 0) {
+                        return false;
+                    }
                 }
 
             } catch (IOException ex) {

This change also fixes the socat example; the file I pipe to it now completely transits the server. And just for completeness, I confirmed that your nc example also completes successfully.

I think what we're seeing here is a bad interaction between IO buffers (at either the JVM or kernel level) and the poll call used for IO select here. On the server side, it seems the poll for read is not seeing data left "on the wire" after a subsequent read event has fired. As a result, we eventually end up with some number of bytes "in limbo" and no poll events left to trigger the server to read those bytes. I don't think this constitutes a bug in jnr-unixsocket, since select, read, and write all just bottom out in the system's poll, read, and write native calls.

It's possible that we're not configuring the buffering for the unix domain socket file descriptor properly, but we would need to research that. We're not doing anything unusual when setting it up, so I would expect the basic unix socket to work properly with poll.

I will commit this change to UnixServer for you to test. I am not entirely satisfied with this as a "solution" so perhaps you can help me figure out why we're seeing this buffering behavior?

headius added a commit that referenced this issue Mar 18, 2020
There appears to be some buffering interaction with the native
`poll` call here that causes unread bytes "on the wire" to be left
"in limbo" if they are not read before a subsequent POLLIN has
fired. As a result, there are too few POLLIN events to read all
the data off the wire and the server eventually hangs.

A client attempting to write and then read data from the server
also eventually hangs, because it has written all it can write but
not received enough data back.

More research is needed to know whether this is an expected
interaction between `poll` and the buffering for sockets (or more
specifically unix domain sockets), possibly specific to Darwin.

See #82
@headius
Copy link
Member

headius commented Mar 18, 2020

With the UnixServer working properly now on Darwin, I'm going to close this issue.

From discussions and articles online, it appears this may be just one of the "quirks" of using poll across platforms. It does not appear that additional POLL_IN events get triggered for unread data that happens to be lying around in a kernel buffer, so code that responds to a READ select should attempt to read as much data as is available before doing another select.

@headius headius closed this as completed Mar 18, 2020
@headius headius modified the milestones: 0.28, 0.29 Apr 22, 2020
@headius
Copy link
Member

headius commented Apr 22, 2020

Releasing today in 0.29.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants