Both Send-Q of client-side and Recv-Q of server-side are in hang state #8712

YangruiEmma · 2019-01-11T05:39:13Z

The problem is similar with 6525, but it happens occasionally. I don't know why the server-side Recv-Q in hang state, I checked netty io threads, all are RUNNABLE state.

Expected behavior

No direct memory leak ; Send-Q and Recv-Q not in hang state.

Actual behavior

Direct memory of client-side is leak and it keeps growth.
A lot of Entry that be referenced by ChannelOutboundBuffer are not released(refCnt of msg in Entry is 1). flushed val is 6w+ of ChannelOutboundBuffer.
From the heap dump, can find that one channel is unwritable(unwritable of ChannelOutboundBuffer is 1).
Get the problem channel address from dump and check the connection state, find:
client-side : QR=0 QS=17376
server-side: QR=96086 QS=0
QS and QR are hang state no change.
Client cannot send message to Server, do tcpdump, just send cksum and server-side return win 0
The request of other normal channel can cause direct memory risk quickly
there are 4 channels between one client-side and server-side, just one channel is unwritable. But the request of the other 3 normal channels will cause direct memory risk quickly, when we closed the problem channel then the direct memory stop rise but still no release.

WRITE_BUFFER_HIGH_WATER_MARK config is 64 * 1024
QPS < 100, QPM < 5000
Both client and server side use EpollEventLoopGroup

Steps to reproduce

Recv-Q of server-side is backlog, then Send-Q of client-side is backlog and the direct memory is leak. But don't know how the Recv-Q backlog happen, QPS below 100.

My main questions is:

why Recv-Q of server-side is backlog and in hang state, how to check it, is to relate to 4.0.21 ?
the dead channel exist(cannot send data), why direct memory grow quickly and constantly？(just heartbeat request use the channel, less memory keep growth is reasonable)

Minimal yet complete reproducer code (or URL to code)

Netty version

client-side: 4.0.27
server-side: 4.0.21

JVM version (e.g. `java -version`)

both client-side and server-side:
java version "1.7.0_76"
Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)

OS version (e.g. `uname -a`)

both client-side and server-side:
Linux *** 2.6.32-431.20.3.el6.mt20161028.x86_64 #1 SMP Fri Oct 28 17:50:35 CST 2016 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

YangruiEmma · 2019-01-11T14:36:22Z

@NiteshKant Could you give me some suggestions ? Thank you very much!!

YangruiEmma · 2019-01-14T08:28:14Z

@normanmaurer Could you give me some suggestions ? Thank you very much!!!

YangruiEmma · 2019-01-22T13:57:30Z

@trustin Could you give me some suggestions? Please >.< Thank you very much!!!

normanmaurer · 2019-04-01T06:51:00Z

Please upgrade to latest version and check again. I suspect you need to trigger a flush() again

YangruiEmma · 2019-04-10T07:55:50Z

@normanmaurer Thanks! I have got reasons about the two main questions that mentioned in this issue

why Recv-Q of server-side is backlog and in hang state, how to check it, is to relate to 4.0.21 ?

Indeed, it is due to 4.0.21, the phenomenon just happen in 4.0.21 and never happen again when server upgrade to 4.0.24.
But why 4.0.21 cause Recv-Q backlog, I still don't know.

the dead channel exist(cannot send data), why direct memory grow quickly and constantly？(just heartbeat request use the channel, less memory keep growth is reasonable)

Because the default pool direct memory(64k) in ByteBufUtil, #7704 have updated to 0.
The UnpooledUnsafeDirectByteBuf of normal request won't be released unless its size > 64k, and UnpooledUnsafeDirectByteBuf objects are shared by all channels of one thread.
The size of heartbeat is tiny, but heartbeat hold the UnpooledUnsafeDirectByteBuf(allocated by normal request) in object pool to send request and don't release(Send-Q of heartbeat channel
backlog).
Through OQL find mass ThreadLocalUnsafeDirectByteBuf:
capacity 32,768B: 11510
capacity 16,384B: 3005
capacity 8192B: 75256
capacity 4096B：38186
capacity 2048B: 4718
capacity 256B: 28035
Total: 32,768*11510+16384*3005+8192*75256+4096*38186+2048*4718+256*28035 = 1.13G
Thus, that's why direct memory leak happen.
My solution to fix this memory leak : Close the channel when the unwritable status is continual.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Both Send-Q of client-side and Recv-Q of server-side are in hang state #8712

Both Send-Q of client-side and Recv-Q of server-side are in hang state #8712

YangruiEmma commented Jan 11, 2019 •

edited

YangruiEmma commented Jan 11, 2019

YangruiEmma commented Jan 14, 2019

YangruiEmma commented Jan 22, 2019

normanmaurer commented Apr 1, 2019

YangruiEmma commented Apr 10, 2019 •

edited

Both Send-Q of client-side and Recv-Q of server-side are in hang state #8712

Both Send-Q of client-side and Recv-Q of server-side are in hang state #8712

Comments

YangruiEmma commented Jan 11, 2019 • edited

Expected behavior

Actual behavior

Steps to reproduce

Minimal yet complete reproducer code (or URL to code)

Netty version

JVM version (e.g. java -version)

OS version (e.g. uname -a)

YangruiEmma commented Jan 11, 2019

YangruiEmma commented Jan 14, 2019

YangruiEmma commented Jan 22, 2019

normanmaurer commented Apr 1, 2019

YangruiEmma commented Apr 10, 2019 • edited

YangruiEmma commented Jan 11, 2019 •

edited

JVM version (e.g. `java -version`)

OS version (e.g. `uname -a`)

YangruiEmma commented Apr 10, 2019 •

edited