New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epoll force closes connection #4170

Closed
ninja- opened this Issue Aug 29, 2015 · 23 comments

Comments

Projects
None yet
3 participants
@ninja-

ninja- commented Aug 29, 2015

a small (visual) regression when doing nio vs epoll:

When proxy is running on epoll, and the connection to backend is disconnected, the backend server shows a message:
Player lost connection: Internal Exception: java.io.IOException: Error while read(...): Connection reset by peer

instead of "Player lost connection: disconnected"

Any idea of what could be the cause?
(I'll update when I find the piece of code in epoll that could cause this regression)

random guess: there should be a shutdown() syscall before close().
where to look for comparsion: java nio socketchannel.close()

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Aug 31, 2015

Member

Without any infos what your code does it's hard to say.

Member

normanmaurer commented Aug 31, 2015

Without any infos what your code does it's hard to say.

@Scottmitch

This comment has been minimized.

Show comment
Hide comment
@Scottmitch

Scottmitch Aug 31, 2015

Member

Player lost connection: disconnected

I don't believe this would be an exception generated by Netty.

Member

Scottmitch commented Aug 31, 2015

Player lost connection: disconnected

I don't believe this would be an exception generated by Netty.

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Aug 31, 2015

i never said "disconnected" was an exception! It's an lack, of exception. Regarding code we are talking just about channel.close() here. I am 90% sure that adding shutdown syscall would make it match behaviour with nio.

ninja- commented Aug 31, 2015

i never said "disconnected" was an exception! It's an lack, of exception. Regarding code we are talking just about channel.close() here. I am 90% sure that adding shutdown syscall would make it match behaviour with nio.

@Scottmitch

This comment has been minimized.

Show comment
Hide comment
@Scottmitch

Scottmitch Aug 31, 2015

Member

@ninja- - It would be helpful if you describe your issue in terms of Netty expected behavior vs Netty observed behavior. Your description mixes in application concepts that don't exist in Netty. We generally don't have knowledge of your application running on top of Netty or what it is doing in these circumstances.

Like for example (I'm trying to guess based upon what you have provided thus far) in NIO are you observing a channelInactive(..) and in EPOLL are you observing an exceptionCaught(..)? If so in what situations, and can you provide a reproducer?

random guess: there should be a shutdown() syscall before close().
where to look for comparsion: java nio socketchannel.close()
..
I am 90% sure that adding shutdown syscall would make it match behaviour with nio.

Have you tried this? What did you observe?

Member

Scottmitch commented Aug 31, 2015

@ninja- - It would be helpful if you describe your issue in terms of Netty expected behavior vs Netty observed behavior. Your description mixes in application concepts that don't exist in Netty. We generally don't have knowledge of your application running on top of Netty or what it is doing in these circumstances.

Like for example (I'm trying to guess based upon what you have provided thus far) in NIO are you observing a channelInactive(..) and in EPOLL are you observing an exceptionCaught(..)? If so in what situations, and can you provide a reproducer?

random guess: there should be a shutdown() syscall before close().
where to look for comparsion: java nio socketchannel.close()
..
I am 90% sure that adding shutdown syscall would make it match behaviour with nio.

Have you tried this? What did you observe?

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 1, 2015

@Scottmitch I will try adding shutdown() and then update this bug with results.

ninja- commented Sep 1, 2015

@Scottmitch I will try adding shutdown() and then update this bug with results.

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Sep 4, 2015

Member

@ninja- any news ?

Member

normanmaurer commented Sep 4, 2015

@ninja- any news ?

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer
Member

normanmaurer commented Sep 10, 2015

@ninja- ping...

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 10, 2015

@normanmauer i can test it today. It's just one line od code

ninja- commented Sep 10, 2015

@normanmauer i can test it today. It's just one line od code

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 10, 2015

@normanmaurer it got more complicated because I now see shutdown() syscall is used in some places. so I am not sure where to insert it.

ninja- commented Sep 10, 2015

@normanmaurer it got more complicated because I now see shutdown() syscall is used in some places. so I am not sure where to insert it.

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 10, 2015

here's your fix, I tested it and it closes the connection properly now (no ugly message!)

Index: transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java (revision 250a09df635d70853e1576a9e522c846e918938e)
+++ transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java (revision )
@@ -535,7 +535,9 @@
             // Calling super.doClose() first so splceTo(...) will fail on next call.
             super.doClose();
         } finally {
+            safeShutdownPipe(pipeIn);
             safeClosePipe(pipeIn);
+            safeShutdownPipe(pipeOut);
             safeClosePipe(pipeOut);
             clearSpliceQueue();
         }
@@ -581,6 +583,18 @@
             } catch (IOException e) {
                 if (logger.isWarnEnabled()) {
                     logger.warn("Error while closing a pipe", e);
+                }
+            }
+        }
+    }
+
+    private void safeShutdownPipe(int pipe) {
+        if (pipe != -1) {
+            try {
+                Native.shutdown(pipe, true, true);
+            } catch (IOException e) {
+                if (logger.isWarnEnabled()) {
+                    logger.warn("Error while shutting down a pipe", e);
                 }
             }
         }

ninja- commented Sep 10, 2015

here's your fix, I tested it and it closes the connection properly now (no ugly message!)

Index: transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java (revision 250a09df635d70853e1576a9e522c846e918938e)
+++ transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java (revision )
@@ -535,7 +535,9 @@
             // Calling super.doClose() first so splceTo(...) will fail on next call.
             super.doClose();
         } finally {
+            safeShutdownPipe(pipeIn);
             safeClosePipe(pipeIn);
+            safeShutdownPipe(pipeOut);
             safeClosePipe(pipeOut);
             clearSpliceQueue();
         }
@@ -581,6 +583,18 @@
             } catch (IOException e) {
                 if (logger.isWarnEnabled()) {
                     logger.warn("Error while closing a pipe", e);
+                }
+            }
+        }
+    }
+
+    private void safeShutdownPipe(int pipe) {
+        if (pipe != -1) {
+            try {
+                Native.shutdown(pipe, true, true);
+            } catch (IOException e) {
+                if (logger.isWarnEnabled()) {
+                    logger.warn("Error while shutting down a pipe", e);
                 }
             }
         }
@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Sep 10, 2015

Member

Thanks ... Looking

Am 10.09.2015 um 18:40 schrieb ninja- notifications@github.com:

here's your fix, I tested it and it closes the connection properly now (no ugly message!)

Index: transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP

<+>UTF-8

--- transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java (revision 250a09d)
+++ transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java (revision )
@@ -535,7 +535,9 @@
// Calling super.doClose() first so splceTo(...) will fail on next call.
super.doClose();
} finally {

  •        safeShutdownPipe(pipeIn);
         safeClosePipe(pipeIn);
    
  •        safeShutdownPipe(pipeOut);
         safeClosePipe(pipeOut);
         clearSpliceQueue();
     }
    
    @@ -581,6 +583,18 @@
    } catch (IOException e) {
    if (logger.isWarnEnabled()) {
    logger.warn("Error while closing a pipe", e);
  •            }
    
  •        }
    
  •    }
    
  • }
  • private void safeShutdownPipe(int pipe) {
  •    if (pipe != -1) {
    
  •        try {
    
  •            Native.shutdown(pipe, true, true);
    
  •        } catch (IOException e) {
    
  •            if (logger.isWarnEnabled()) {
    
  •                logger.warn("Error while shutting down a pipe", e);
             }
         }
     }
    

    Reply to this email directly or view it on GitHub.
Member

normanmaurer commented Sep 10, 2015

Thanks ... Looking

Am 10.09.2015 um 18:40 schrieb ninja- notifications@github.com:

here's your fix, I tested it and it closes the connection properly now (no ugly message!)

Index: transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP

<+>UTF-8

--- transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java (revision 250a09d)
+++ transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java (revision )
@@ -535,7 +535,9 @@
// Calling super.doClose() first so splceTo(...) will fail on next call.
super.doClose();
} finally {

  •        safeShutdownPipe(pipeIn);
         safeClosePipe(pipeIn);
    
  •        safeShutdownPipe(pipeOut);
         safeClosePipe(pipeOut);
         clearSpliceQueue();
     }
    
    @@ -581,6 +583,18 @@
    } catch (IOException e) {
    if (logger.isWarnEnabled()) {
    logger.warn("Error while closing a pipe", e);
  •            }
    
  •        }
    
  •    }
    
  • }
  • private void safeShutdownPipe(int pipe) {
  •    if (pipe != -1) {
    
  •        try {
    
  •            Native.shutdown(pipe, true, true);
    
  •        } catch (IOException e) {
    
  •            if (logger.isWarnEnabled()) {
    
  •                logger.warn("Error while shutting down a pipe", e);
             }
         }
     }
    

    Reply to this email directly or view it on GitHub.

@normanmaurer normanmaurer self-assigned this Sep 10, 2015

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Sep 10, 2015

Member

@ninja- do you use spliceTo(...) ?

Member

normanmaurer commented Sep 10, 2015

@ninja- do you use spliceTo(...) ?

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 10, 2015

@normanmaurer no I don't think so, I don't even know what's that

ninja- commented Sep 10, 2015

@normanmaurer no I don't think so, I don't even know what's that

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Sep 10, 2015

Member

@ninja- ups sorry... I meant spliceIn(...)... If you not use it I think this can not have anything to do with this as pipeIn and pipeOut are only != -1 if `spliceIn(...) is called at some point.

Member

normanmaurer commented Sep 10, 2015

@ninja- ups sorry... I meant spliceIn(...)... If you not use it I think this can not have anything to do with this as pipeIn and pipeOut are only != -1 if `spliceIn(...) is called at some point.

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 10, 2015

@normanmaurer yea I guess I could quickly get rid of this fix to give you a 100% confirmation that it's doing it's job...

ninja- commented Sep 10, 2015

@normanmaurer yea I guess I could quickly get rid of this fix to give you a 100% confirmation that it's doing it's job...

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 10, 2015

@normanmaurer I can't get it to uncleanly close connection without the fix......that's weird, I doubt you have recently fixed this by accident. I'll get back to this bug when I could get it to show unclean disconnect. Maybe it would be easier to spot the difference on production, so I'll test it there.

ninja- commented Sep 10, 2015

@normanmaurer I can't get it to uncleanly close connection without the fix......that's weird, I doubt you have recently fixed this by accident. I'll get back to this bug when I could get it to show unclean disconnect. Maybe it would be easier to spot the difference on production, so I'll test it there.

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 10, 2015

ok I confirm the difference with and without the fix

    @Override
    protected void doClose() throws Exception {
        try {
            ChannelPromise promise = connectPromise;
            if (promise != null) {
                // Use tryFailure() instead of setFailure() to avoid the race against cancel().
                promise.tryFailure(CLOSED_CHANNEL_EXCEPTION);
                connectPromise = null;
            }

            ScheduledFuture<?> future = connectTimeoutFuture;
            if (future != null) {
                future.cancel(false);
                connectTimeoutFuture = null;
            }

            safeShutdownPipe(pipeIn);
            safeShutdownPipe(pipeOut);

            // Calling super.doClose() first so splceTo(...) will fail on next call.
            super.doClose();
        } finally {
            safeClosePipe(pipeIn);
            safeClosePipe(pipeOut);
            clearSpliceQueue();
        }
    }

but hm, it's not a 100% fix...it doesn't handle every case where in nio it would "shutdown" the socket every time. Maybe you know which case I missed when it's closed but not shutdown ? probably something relating to FileDescriptor.close() where it's called instead.

ninja- commented Sep 10, 2015

ok I confirm the difference with and without the fix

    @Override
    protected void doClose() throws Exception {
        try {
            ChannelPromise promise = connectPromise;
            if (promise != null) {
                // Use tryFailure() instead of setFailure() to avoid the race against cancel().
                promise.tryFailure(CLOSED_CHANNEL_EXCEPTION);
                connectPromise = null;
            }

            ScheduledFuture<?> future = connectTimeoutFuture;
            if (future != null) {
                future.cancel(false);
                connectTimeoutFuture = null;
            }

            safeShutdownPipe(pipeIn);
            safeShutdownPipe(pipeOut);

            // Calling super.doClose() first so splceTo(...) will fail on next call.
            super.doClose();
        } finally {
            safeClosePipe(pipeIn);
            safeClosePipe(pipeOut);
            clearSpliceQueue();
        }
    }

but hm, it's not a 100% fix...it doesn't handle every case where in nio it would "shutdown" the socket every time. Maybe you know which case I missed when it's closed but not shutdown ? probably something relating to FileDescriptor.close() where it's called instead.

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Sep 22, 2015

Member

@ninja- I think I not understand what you suggest here. For me I think the provide patch makes no sense in terms of a fix if you not use splicing.

Member

normanmaurer commented Sep 22, 2015

@ninja- I think I not understand what you suggest here. For me I think the provide patch makes no sense in terms of a fix if you not use splicing.

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 22, 2015

@normanmaurer well it's about a difference in how nio vs epoll closes connection. nio in every case calls shutdown() before close and epoll doesn't which is visible to other side of the connection as a connection reset etc.

ninja- commented Sep 22, 2015

@normanmaurer well it's about a difference in how nio vs epoll closes connection. nio in every case calls shutdown() before close and epoll doesn't which is visible to other side of the connection as a connection reset etc.

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Sep 22, 2015

Member

@ninja- so you say nio basically does:

shutdown(fd);
close(fd);

While epoll does:

close(fd)

Is this what you trying to say here ?

Member

normanmaurer commented Sep 22, 2015

@ninja- so you say nio basically does:

shutdown(fd);
close(fd);

While epoll does:

close(fd)

Is this what you trying to say here ?

@ninja-

This comment has been minimized.

Show comment
Hide comment
@ninja-

ninja- Sep 22, 2015

@normanmaurer exactly. I am also unsure where would be a correct place for shutdown() to match nio. Maybe we could assume it would cause no harm and just make Native.close always call shutdown first? (as well as UnixFileDescriptor.close because these are two seperate implementations at netty)

ninja- commented Sep 22, 2015

@normanmaurer exactly. I am also unsure where would be a correct place for shutdown() to match nio. Maybe we could assume it would cause no harm and just make Native.close always call shutdown first? (as well as UnixFileDescriptor.close because these are two seperate implementations at netty)

@normanmaurer normanmaurer added this to the 4.0.32.Final milestone Sep 22, 2015

@normanmaurer normanmaurer added the defect label Sep 22, 2015

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Sep 22, 2015

Member

@ninja- ok gotcha... let me check

Member

normanmaurer commented Sep 22, 2015

@ninja- ok gotcha... let me check

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Sep 23, 2015

Member

@ninja- please check #4256

Member

normanmaurer commented Sep 23, 2015

@ninja- please check #4256

@normanmaurer normanmaurer added the defect label Sep 23, 2015

normanmaurer added a commit that referenced this issue Sep 23, 2015

[#4170] Shutdown socket before close fd when using epoll transport
Motivation:

We should call shutdown(...) on the socket before closing the filedescriptor to ensure it is closed gracefully.

Modifications:

Call shutdown(...) before close.

Result:

Sockets are gracefully shutdown when using native transport.

normanmaurer added a commit that referenced this issue Sep 25, 2015

[#4170] Shutdown socket before close fd when using epoll transport
Motivation:

We should call shutdown(...) on the socket before closing the filedescriptor to ensure it is closed gracefully.

Modifications:

Call shutdown(...) before close.

Result:

Sockets are gracefully shutdown when using native transport.

normanmaurer added a commit that referenced this issue Sep 25, 2015

[#4170] Shutdown socket before close fd when using epoll transport
Motivation:

We should call shutdown(...) on the socket before closing the filedescriptor to ensure it is closed gracefully.

Modifications:

Call shutdown(...) before close.

Result:

Sockets are gracefully shutdown when using native transport.

normanmaurer added a commit that referenced this issue Sep 25, 2015

[#4170] Shutdown socket before close fd when using epoll transport
Motivation:

We should call shutdown(...) on the socket before closing the filedescriptor to ensure it is closed gracefully.

Modifications:

Call shutdown(...) before close.

Result:

Sockets are gracefully shutdown when using native transport.

Scottmitch added a commit to Scottmitch/netty that referenced this issue Mar 11, 2016

EPOLL SO_LINGER=0 sends FIN+RST
Motivation:
If SO_LINGER is set to 0 the EPOLL transport will send a FIN followed by a RST. This is not consistent with the behavior of the NIO transport. This variation in behavior can cause protocol violations in streaming protocols (e.g. HTTP) where a FIN may be interpreted as a valid end to a data stream, but RST may be treated as the data is corrupted and should be discarded.

netty#4170 Claims the behavior of NIO always issues a shutdown when close occurs. I could not find any evidence of this in Netty's NIO transport nor in the JDK's SocketChannel.close() implementation.

Modifications:
- AbstractEpollChannel should be consistent with the NIO transport and not force a shutdown on every close
- FileDescriptor to keep state in a consistent manner with the JDK and not allow a shutdown after a close
- Unit tests for NIO and EPOLL to ensure consistent behavior

Result:
EPOLL is capable of sending just a RST to terminate a connection.

Scottmitch added a commit to Scottmitch/netty that referenced this issue Mar 11, 2016

EPOLL SO_LINGER=0 sends FIN+RST
Motivation:
If SO_LINGER is set to 0 the EPOLL transport will send a FIN followed by a RST. This is not consistent with the behavior of the NIO transport. This variation in behavior can cause protocol violations in streaming protocols (e.g. HTTP) where a FIN may be interpreted as a valid end to a data stream, but RST may be treated as the data is corrupted and should be discarded.

netty#4170 Claims the behavior of NIO always issues a shutdown when close occurs. I could not find any evidence of this in Netty's NIO transport nor in the JDK's SocketChannel.close() implementation.

Modifications:
- AbstractEpollChannel should be consistent with the NIO transport and not force a shutdown on every close
- FileDescriptor to keep state in a consistent manner with the JDK and not allow a shutdown after a close
- Unit tests for NIO and EPOLL to ensure consistent behavior

Result:
EPOLL is capable of sending just a RST to terminate a connection.

Scottmitch added a commit to Scottmitch/netty that referenced this issue Mar 17, 2016

EPOLL SO_LINGER=0 sends FIN+RST
Motivation:
If SO_LINGER is set to 0 the EPOLL transport will send a FIN followed by a RST. This is not consistent with the behavior of the NIO transport. This variation in behavior can cause protocol violations in streaming protocols (e.g. HTTP) where a FIN may be interpreted as a valid end to a data stream, but RST may be treated as the data is corrupted and should be discarded.

netty#4170 Claims the behavior of NIO always issues a shutdown when close occurs. I could not find any evidence of this in Netty's NIO transport nor in the JDK's SocketChannel.close() implementation.

Modifications:
- AbstractEpollChannel should be consistent with the NIO transport and not force a shutdown on every close
- FileDescriptor to keep state in a consistent manner with the JDK and not allow a shutdown after a close
- Unit tests for NIO and EPOLL to ensure consistent behavior

Result:
EPOLL is capable of sending just a RST to terminate a connection.

Scottmitch added a commit that referenced this issue Mar 17, 2016

EPOLL SO_LINGER=0 sends FIN+RST
Motivation:
If SO_LINGER is set to 0 the EPOLL transport will send a FIN followed by a RST. This is not consistent with the behavior of the NIO transport. This variation in behavior can cause protocol violations in streaming protocols (e.g. HTTP) where a FIN may be interpreted as a valid end to a data stream, but RST may be treated as the data is corrupted and should be discarded.

#4170 Claims the behavior of NIO always issues a shutdown when close occurs. I could not find any evidence of this in Netty's NIO transport nor in the JDK's SocketChannel.close() implementation.

Modifications:
- AbstractEpollChannel should be consistent with the NIO transport and not force a shutdown on every close
- FileDescriptor to keep state in a consistent manner with the JDK and not allow a shutdown after a close
- Unit tests for NIO and EPOLL to ensure consistent behavior

Result:
EPOLL is capable of sending just a RST to terminate a connection.

Scottmitch added a commit that referenced this issue Mar 17, 2016

EPOLL SO_LINGER=0 sends FIN+RST
Motivation:
If SO_LINGER is set to 0 the EPOLL transport will send a FIN followed by a RST. This is not consistent with the behavior of the NIO transport. This variation in behavior can cause protocol violations in streaming protocols (e.g. HTTP) where a FIN may be interpreted as a valid end to a data stream, but RST may be treated as the data is corrupted and should be discarded.

#4170 Claims the behavior of NIO always issues a shutdown when close occurs. I could not find any evidence of this in Netty's NIO transport nor in the JDK's SocketChannel.close() implementation.

Modifications:
- AbstractEpollChannel should be consistent with the NIO transport and not force a shutdown on every close
- FileDescriptor to keep state in a consistent manner with the JDK and not allow a shutdown after a close
- Unit tests for NIO and EPOLL to ensure consistent behavior

Result:
EPOLL is capable of sending just a RST to terminate a connection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment