Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Nailgun client read future was interrupted #112

Closed
jvican opened this issue Dec 12, 2017 · 22 comments
Closed

Nailgun client read future was interrupted #112

jvican opened this issue Dec 12, 2017 · 22 comments
Assignees

Comments

@jvican
Copy link
Contributor

jvican commented Dec 12, 2017

Hello, we're using nailgun in bloop: http://github.com/scalacenter/bloop.

We're using latest master for both the server and the python plugin, and we're getting this warning all the time in the server logs (after every executed command).

Dec 12, 2017 12:40:41 PM com.martiansoftware.nailgun.NGInputStream$1 run
WARNING: Nailgun client read future was interrupted
java.lang.InterruptedException
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
	at java.util.concurrent.FutureTask.get(FutureTask.java:204)
	at com.martiansoftware.nailgun.NGInputStream$1.run(NGInputStream.java:91)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Why is this happening? How can we make this warning disappear?

@sbalabanov-zz
Copy link
Contributor

Unfortunately this is the way how server shutdown is currently designed - it just interrupts a listener thread by setting its flag to 'interrupted' and catching InterruptedException afterwards. I have plans to refactor it and implement a graceful shutdown. So far you can probably just have java logger configured to suppress com.martiansoftware.nailgun.NGInputStream warnings.

@jvican
Copy link
Contributor Author

jvican commented Jan 12, 2018

Thank you for the answer Sergey. I'll try with the logger supression. I think that fixing this at its root would be awesome -- users are confused when they see this stack trace, and they think something's acting up when it's not.

@sbalabanov-zz
Copy link
Contributor

Should be fixed with 0.9.3

@ilovezfs
Copy link

ilovezfs commented Feb 7, 2018

@sbalabanov this still seems to be happening in 0.9.3.

@sbalabanov-zz
Copy link
Contributor

Have log? It might happen if socket did not properly signal termination and Nailgun server has to close itself forcefully.
https://github.com/facebook/nailgun/blob/master/nailgun-server/src/main/java/com/martiansoftware/nailgun/NGCommunicator.java#L137
You may also want to try to update Nailgun to the latest (off latest master) - it had some other improvements in shutdown process.

@jvican
Copy link
Contributor Author

jvican commented Mar 8, 2018

Yes, I'm seeing it too.

Mar 08, 2018 2:17:44 PM com.martiansoftware.nailgun.NGSession run
INFO: NGSession shutting down
Mar 08, 2018 2:27:38 PM com.martiansoftware.nailgun.NGCommunicator lambda$new$1
WARNING: Nailgun client read future raised an exception
java.net.SocketException: Socket closed
	at java.net.SocketInputStream.read(SocketInputStream.java:204)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.net.SocketInputStream.read(SocketInputStream.java:224)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:327)
	at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:124)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Mar 08, 2018 2:27:38 PM com.martiansoftware.nailgun.NGSession run
INFO: NGSession shutting down

And this is from the very latest master.

@jvican
Copy link
Contributor Author

jvican commented Mar 12, 2018

Can we reopen this? It would be great if this is fixed... 😄. Most of my users run this manually and are scared when they see a stack trace be thrown at them.

@sbalabanov-zz
Copy link
Contributor

I am not seeing it at least not on mac. Have repro? Yes you can reopen.

@jvican
Copy link
Contributor Author

jvican commented Mar 12, 2018

Strange, we have reproduced on mac. I cannot reopen since I don't have contributor rights on the repo. Could you do it for me?

@edsilfer
Copy link

+1 I can confirm I'm having the same issue:

WARNING: Nailgun client read future raised an exception
java.net.SocketException: Socket closed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.net.SocketInputStream.read(SocketInputStream.java:224)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:327)
	at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:124)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Mar 13, 2018 11:21:35 AM com.martiansoftware.nailgun.NGSession run
INFO: NGSession shutting down

@jvican
Copy link
Contributor Author

jvican commented Mar 16, 2018

ping @sbalabanov

@jvican
Copy link
Contributor Author

jvican commented Mar 23, 2018

I was able to reproduce this issue with the hello world nail. Running it several times makes the server print the stack trace. The reason why this happens is because there is a race condition between the readChunk piece of code running in the readExecutor and the close in NGCommunicator.

What happens is that, from time to time, to get a heartbeat, readChunk will try to read the chunk length immediately after close in NGCommunicator has closed the data input and output streams, but before the executors have been terminated. In my case, this exception was happening all the time, so I'd say this is a frequent race condition.

I've solved it in a less than ideal way in this patch. The patch forces Nailgun to ignore socket exceptions if eof is true, which is a sign that the close in NGCommunicator is running.

Let me know if you want me to PR this change. I've tried it with my application and the stack trace has disappeared completely.

@sbalabanov-zz
Copy link
Contributor

It seems you are using internet protocol for client-server communication, that's why there is SocketInputStream in the stack. In my test cases, local unix socket was used thus custom NGUnixDomainSocketInputStream as used as an underlying stream.
They do have a difference in behavior. Both NGUnixDomainSocketInputStream just returns -1 when the socket is closed causing DataInputStream to throw EOFException. However, SocketInputStream throws SocketException and returns -1 only if EOF is detected, making DataInputStream propagate SocketException. Nailgun only expects EOFException as an indication of socket closure.

The fix would probably be to rethrow SocketException as EOFException if eof or termination flag is set.

@sbalabanov-zz
Copy link
Contributor

#132

@jvican
Copy link
Contributor Author

jvican commented Apr 5, 2018

I’m pretty sure I’m using Unix domain sockets, though.

@edsilfer
Copy link

edsilfer commented Apr 5, 2018

Hey guys, I'm not sure if I missing something but I have a very simple and straight forward use case that seems to be degrading with these updates. The implementation goes as below:

The sole Nail in the project

public class MyNail {
    public static void nailMain(NGContext context) {
        new MyNail().execute(context.getArgs());
    }
}

Embedded NG Server in my project's jar

public class Main {
    private static final NGServer SERVER = new NGServer();
    public static void main(String[] args) {
        if (SERVER.isRunning()) exit(0);
        Main main = new Main();
        // Register MyNail
        main.registerAliases(SERVER);
        SERVER.run();
    }
}

Shell script I use for invoking NG Python Client

NG_CLIENT="ng.py"
NG_SERVER="myproject.jar"

function is_server_offline {
    "${NG_CLIENT}" &>/dev/null
    # 230 -> could not connect to server
    [[ $? == 230 ]] && return 0 || return 1
}

function start_server {
    if $(is_server_offline) ;
    then
        java -jar "$NG_SERVER" & &>/dev/null
        sleep 1
    fi
}

function process_aliases_command {
    start_server
    "${NG_CLIENT}" ng-alias
    finish_command "" "Unable to parse aliases. Please contact ${OWNERS}"
}

function process_my_nail_call {
    start_server
    "${NG_CLIENT}" mynail -- ${ARGS}
}

Stack trace I get whenever I try to reach my nail

Apr 05, 2018 10:36:05 AM com.martiansoftware.nailgun.NGCommunicator lambda$new$1
WARNING: Nailgun client read future raised an exception
java.net.SocketException: Socket closed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.net.SocketInputStream.read(SocketInputStream.java:224)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:327)
	at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:124)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Apr 05, 2018 10:36:05 AM com.martiansoftware.nailgun.NGSession run
INFO: NGSession shutting down
Apr 05, 2018 10:36:05 AM com.martiansoftware.nailgun.NGCommunicator lambda$new$1
WARNING: Nailgun client read future raised an exception
java.net.SocketException: Socket closed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.net.SocketInputStream.read(SocketInputStream.java:224)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:327)
	at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:124)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Apr 05, 2018 10:36:05 AM com.martiansoftware.nailgun.NGSession run
INFO: NGSession shutting down

Versions

NG Server: master
NG Client: master

@sbalabanov , are you suggesting I should handle something different in the server side?

@sbalabanov-zz
Copy link
Contributor

@edsilfer what kind of updates are you talking about? #132 should solve it (though it is not yet merged into trunk)

@sbalabanov-zz
Copy link
Contributor

@jvican you need to use local: in address string to start local listener

https://github.com/facebook/nailgun/blob/f9c1d5a437cbeaf3f54a793cc2f58d9b41da0c6a/nailgun-server/src/main/java/com/martiansoftware/nailgun/NGServer.java#L496

which instantiates NGUnixDomainServerSocket

https://github.com/facebook/nailgun/blob/f9c1d5a437cbeaf3f54a793cc2f58d9b41da0c6a/nailgun-server/src/main/java/com/martiansoftware/nailgun/NGServer.java#L391

which yields to NGUnixDomainSocketInputStream

@edsilfer
Copy link

@sbalabanov sorry taking to long to answer. I tested today (ater #132 merged) again using the master version of ng server and ng client. Every time I run a command - just like I posted in my previous comment - I get the stack trace:

Apr 13, 2018 3:10:57 PM com.martiansoftware.nailgun.NGCommunicator lambda$new$1
WARNING: Nailgun client read future raised an exception
java.net.SocketException: Socket closed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.net.SocketInputStream.read(SocketInputStream.java:224)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:327)
	at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:124)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Apr 13, 2018 3:10:57 PM com.martiansoftware.nailgun.NGSession run
INFO: NGSession shutting down

Though the command gets executed. I'll see if I can put a simple project where this can be reproduced.

@jvican
Copy link
Contributor Author

jvican commented Apr 13, 2018

I've been using in production my fix scalacenter@84605e2 and the error is gone. Sometimes, maybe 1 in 20 times, I get the other following stack trace:

java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:210)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.net.SocketInputStream.read(SocketInputStream.java:224)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:328)
	at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:124)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

But given the fact it rarely shows up, I can live with it.

@edsilfer
Copy link

Hi guys, I was having a problem and referencing an old version of ng all. That's why I was still getting the error message. Everything is working as it supposed to now. Thank you very much the help :)

@jvican
Copy link
Contributor Author

jvican commented Apr 13, 2018

I'm happy to hear that this is now fixed. I'll update to latest master then and drop my temporary fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants