Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket errors with unixsocket under load #1281

Closed
nickbabcock opened this issue Jan 20, 2017 · 14 comments
Closed

Socket errors with unixsocket under load #1281

nickbabcock opened this issue Jan 20, 2017 · 14 comments
Labels
Bug For general bugs on Jetty side

Comments

@nickbabcock
Copy link

nickbabcock commented Jan 20, 2017

Been experimenting with jetty's unix socket (using 9.4.0 and c7c183c) support and have received a few errors when putting the setup under load (low volume works fine). For instance, in a jersey app I've seen this stack exception:

org.glassfish.jersey.server.ServerRuntime$Responder: An I/O error has occurred while writing a response message entity to the container output stream.
! java.io.IOException: Broken pipe
! at jnr.enxio.channels.NativeSocketChannel.write(NativeSocketChannel.java:93)
! at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)

so I decided to isolate using just jetty code.

Given a simple echo server that uses unix sockets:

public class EchoApplication {
    public static void main (String... args) throws Exception {
        Server server = new Server();
        UnixSocketConnector connector = new UnixSocketConnector(server);
        connector.setUnixSocket("/var/lib/haproxy/jetty.sock");
        server.addConnector(connector);
        server.setHandler(new EchoHandler());
        server.start();
        server.join();
    }

    public static class EchoHandler extends AbstractHandler {
        @Override
        public void handle(
                String target,
                Request baseRequest,
                HttpServletRequest request,
                HttpServletResponse response
        ) throws IOException, ServletException {
            response.setStatus(200);
            ByteStreams.copy(request.getInputStream(), response.getOutputStream());
            baseRequest.setHandled(true);
        }
    }
}

And given the partial config for an ssl terminating HAProxy setup:

frontend ft_myapp
 bind 192.168.137.83:9443 ssl crt /etc/ssl/private/jetty_new.key.pem
 mode tcp
 option tcplog
 default_backend bk_myapp

backend bk_myapp
 server srv unix@jetty.sock

Stress testing the setup from another box:

wrk -c 40 -d 60s -t 8 -s wrk1M.lua https://192.168.137.83:9443

where wrk1M.lua has contents

wrk.method = "POST"
wrk.body   = string.rep("a", 1024 * 1024)

wrk reports there are socket errors:

Socket errors: connect 0, read 6390, write 2844, timeout 0

Also I see the following (single) warning in the jetty log:

WARN:oeju.SharedBlockingCallback:qtp834600351-35: Blocker not complete Blocker@3f29209e{null}

To me there appears to be a few possibilities:

  • HAProxy has poor support for unix sockets (I personally never used them in HAProxy before, so I could be doing something wrong).
  • Fault lies somewhere in Jetty
  • Bug in the underlying jnr library
@joakime joakime added the Bug For general bugs on Jetty side label Jan 21, 2017
@joakime
Copy link
Contributor

joakime commented Jan 21, 2017

@nickbabcock thanks for the thorough bug report, we'll dig into this.
(perhaps jnr has a new release we should be using)

@nickbabcock
Copy link
Author

I took another look at this issue. Used the latest master on Jetty and used jnr 0.18.

This might be a lead: I was able to cut out haproxy by using a recent version of curl (7.40+).

Have two shells running:

while [ 1 ]; do
    curl --unix-socket /var/run/jetty.sock 'http://localhost' -d @<large-file> >/dev/null
done

The two will run side by side for an indeterminate amount of time before one of them hangs indefinitely, while the other one continues to run without any problems. Which one hangs is indeterminate.

@gregw
Copy link
Contributor

gregw commented May 10, 2017

@nickbabcock just a note to confirm that we can reproduce the problem. However, there are no indications yet as to exactly what the problem is. It does not look like a simple dead lock or anything similar.

But we should be able to work out a bit more now we can reproduce.

gregw added a commit that referenced this issue May 11, 2017
@gregw
Copy link
Contributor

gregw commented May 11, 2017

@nickbabcock
I've been dumping the jetty server once one of the clients has hung. From Jetty's point of view it has either completed the handling of the request or not seen the connection attempt - either way I see no evidence of the hung client.

I have updated the java test client to operate in the same style as curl, but it fails to reproduce the problem. So I'm not sure where in the conversation it is hung.... I wonder if there is the equivalent for wireshark for unix sockets?

@gregw
Copy link
Contributor

gregw commented May 11, 2017

Using --verbose on curl, I see now that it is hanging at a connect, as the last line logged is the very first line:

*   Trying /tmp/jetty.sock...

So either the connection is not being received by Jetty, is not being noticed by jetty, or is noticed but somehow is lost. hhmmmmmmm....

@gregw
Copy link
Contributor

gregw commented May 11, 2017

Still struggling to find anything in Jetty: I've tried both async and blocking accepts; I've tried looping on accept to ensure all connections are accepted, but no go.

I'm concerned it may be a bug in JNR... let me ask them to see if they can shed light...

@joakime
Copy link
Contributor

joakime commented May 11, 2017

@gregw check your lsof / file descriptors / open files / ulimit settings. It sounds like you are running low.

@gregw
Copy link
Contributor

gregw commented May 11, 2017

I'm configured for 80000 FDs and the most unix domain sockets I see running these tests has been about 580.

A single busy client never fails. But one client will fail soon after a second test is started. I'm currently using 2 clients loops:

while :; do curl -v --unix-socket /tmp/jetty.sock http://localhost/ ; done

and

while :; do echo -e "GET / HTTP/1.1\r\nHost: socket\r\n\r\n" | netcat -U /tmp/jetty.sock ; done

Running two curl clients, one or the other will hang within seconds; running two netcat clients I never see a failure; Running a netcat and a curl, the curl will fail withing seconds.

So it could still be a curl problem....

@nickbabcock
Copy link
Author

I don't think it is curl code because load testing tools like wrk and k6 hang / error as well (but this is with haproxy in the middle).

gregw added a commit that referenced this issue May 11, 2017
@gregw
Copy link
Contributor

gregw commented May 11, 2017

I can't make the following pure JNR server fail:

/*
 * This file is part of the JNR project.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.eclipse.jetty.unixsocket;

import jnr.enxio.channels.NativeSelectorProvider;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.util.Set;
import java.util.Iterator;
import java.util.logging.Level;
import java.util.logging.Logger;

import org.eclipse.jetty.util.StringUtil;
import org.eclipse.jetty.util.TypeUtil;

import jnr.unixsocket.UnixServerSocket;
import jnr.unixsocket.UnixServerSocketChannel;
import jnr.unixsocket.UnixSocketAddress;
import jnr.unixsocket.UnixSocketChannel;

public class JNRServer
{

    public static void main(String[] args) throws IOException
    {
        java.io.File path = new java.io.File("/tmp/jetty.sock");
        path.deleteOnExit();
        UnixSocketAddress address = new UnixSocketAddress(path);
        UnixServerSocketChannel channel = UnixServerSocketChannel.open();

        try
        {
            Selector sel = NativeSelectorProvider.getInstance().openSelector();
            channel.configureBlocking(false);
            channel.socket().bind(address);
            channel.register(sel,SelectionKey.OP_ACCEPT,new ServerActor(channel,sel));

            while (sel.select() > 0)
            {
                Set<SelectionKey> keys = sel.selectedKeys();
                Iterator<SelectionKey> iterator = keys.iterator();
                while (iterator.hasNext())
                {
                    SelectionKey k = iterator.next();
                    Actor a = (Actor)k.attachment();
                    if (!a.rxready())
                    {
                        k.cancel();
                    }
                    iterator.remove();
                }
            }
        }
        catch (IOException ex)
        {
            Logger.getLogger(UnixServerSocket.class.getName()).log(Level.SEVERE,null,ex);
        }
        System.out.println("UnixServer EXIT");
    }

    static interface Actor
    {
        public boolean rxready();
    }

    static final class ServerActor implements Actor
    {
        private final UnixServerSocketChannel channel;
        private final Selector selector;

        public ServerActor(UnixServerSocketChannel channel, Selector selector)
        {
            this.channel = channel;
            this.selector = selector;
        }

        public final boolean rxready()
        {
            try
            {
                UnixSocketChannel client = channel.accept();
                client.configureBlocking(false);
                client.register(selector,SelectionKey.OP_READ,new ClientActor(client));
                return true;
            }
            catch (IOException ex)
            {
                return false;
            }
        }
    }

    static final class ClientActor implements Actor
    {
        String request = "";
        String response = "HTTP/1.1 200 OK\r\n"
            + "Content-Length: 14\r\n"
            + "Content-Type: text/plain\r\n"
            + "Connection: close\r\n"
            + "\r\n"
            + "Hello World!\r\n";
        
        private final UnixSocketChannel channel;

        public ClientActor(UnixSocketChannel channel)
        {
            this.channel = channel;
        }

        public final boolean rxready()
        {
            try
            {
                ByteBuffer buf = ByteBuffer.allocate(1024);
                
                while (true)
                {
                    buf.clear();
                    int n = channel.read(buf);
                    UnixSocketAddress remote = channel.getRemoteSocketAddress();
                    System.err.printf("Read in %d bytes from %s\n",n,remote);

                    if (n == 0)
                        return true;
                    
                    if (n < 0)
                        return false;

                    buf.flip();
                    request += new String(buf.array(),buf.arrayOffset(),buf.remaining());

                    System.err.println(TypeUtil.toHexString(request.getBytes()));
                    if (request.endsWith("\r\n\r\n"))
                    {
                        System.err.println("Read request:");
                        System.err.println(request);
                        
                        channel.write(ByteBuffer.wrap(response.getBytes()));
                        channel.shutdownOutput();
                    }
                    return true;

                }
            }
            catch (IOException ex)
            {
                ex.printStackTrace();
                return false;
            }
        }
    }
}

with any client. So I guess it does indicate something in jetty.... but I cannot see what we are doing differently???

@vitezg
Copy link

vitezg commented Apr 27, 2018

Calling UnixSocketConnector.setAcceptQueueSize(65530) seems to make the issue go away. Lower values could work too but did I not test it. @gregw @nickbabcock could you confirm?

@gregw
Copy link
Contributor

gregw commented Apr 30, 2018

I have found an issue in the JNR implementation that I believe could be responsible for these failures, however I'm not sure how it relates to your queuesize finding (which might just avoid the problem rather than fix it).

I have submitted a PR to JNR, but there has been no activity on that project for some time and the PR has not been accepted. However we have worked around the bug in commit 1921220, which was part of #2014 and included in jetty-9.4.9.v20180320. So can you try this release?

@vitezg
Copy link

vitezg commented May 2, 2018

I verified that I tested jetty-9.4.9.v20180320 originally, and did the tests again just to make sure, so the two issues are separate.

However as PHP workers behind a unix socket need the same tweak I do not really think this is a problem, just something that needs a line in the UnixSocketConnector docs. Raising net.core.somaxconn is probably needed too.

For the record with a queue size of 1 I needed 3 curls for one of them to freeze, with a queue size of 2 I needed 4.

@nickbabcock
Copy link
Author

Excellent, I load tested again (this time simplified using jetty, socat, and wrk) and I see no errors 😄

Thanks for the hard work, working around that bug in jnr was incredible!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

No branches or pull requests

4 participants