Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Both Send-Q of client-side and Recv-Q of server-side are in hang state #8712

Open
YangruiEmma opened this issue Jan 11, 2019 · 5 comments
Open

Comments

@YangruiEmma
Copy link

YangruiEmma commented Jan 11, 2019

The problem is similar with 6525, but it happens occasionally. I don't know why the server-side Recv-Q in hang state, I checked netty io threads, all are RUNNABLE state.

Expected behavior

No direct memory leak ; Send-Q and Recv-Q not in hang state.

Actual behavior

  1. Direct memory of client-side is leak and it keeps growth.
  2. A lot of Entry that be referenced by ChannelOutboundBuffer are not released(refCnt of msg in Entry is 1). flushed val is 6w+ of ChannelOutboundBuffer.
  3. From the heap dump, can find that one channel is unwritable(unwritable of ChannelOutboundBuffer is 1).
  4. Get the problem channel address from dump and check the connection state, find:
    client-side : QR=0 QS=17376
    server-side: QR=96086 QS=0
    QS and QR are hang state no change.
  5. Client cannot send message to Server, do tcpdump, just send cksum and server-side return win 0
  6. The request of other normal channel can cause direct memory risk quickly
    there are 4 channels between one client-side and server-side, just one channel is unwritable. But the request of the other 3 normal channels will cause direct memory risk quickly, when we closed the problem channel then the direct memory stop rise but still no release.

WRITE_BUFFER_HIGH_WATER_MARK config is 64 * 1024
QPS < 100, QPM < 5000
Both client and server side use EpollEventLoopGroup

Steps to reproduce

Recv-Q of server-side is backlog, then Send-Q of client-side is backlog and the direct memory is leak. But don't know how the Recv-Q backlog happen, QPS below 100.

My main questions is:

  1. why Recv-Q of server-side is backlog and in hang state, how to check it, is to relate to 4.0.21 ?
  2. the dead channel exist(cannot send data), why direct memory grow quickly and constantly?(just heartbeat request use the channel, less memory keep growth is reasonable)

Minimal yet complete reproducer code (or URL to code)

Netty version

client-side: 4.0.27
server-side: 4.0.21

JVM version (e.g. java -version)

both client-side and server-side:
java version "1.7.0_76"
Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)

OS version (e.g. uname -a)

both client-side and server-side:
Linux *** 2.6.32-431.20.3.el6.mt20161028.x86_64 #1 SMP Fri Oct 28 17:50:35 CST 2016 x86_64 x86_64 x86_64 GNU/Linux

@YangruiEmma
Copy link
Author

@NiteshKant Could you give me some suggestions ? Thank you very much!!

@YangruiEmma
Copy link
Author

@normanmaurer Could you give me some suggestions ? Thank you very much!!!

@YangruiEmma
Copy link
Author

@trustin Could you give me some suggestions? Please >.< Thank you very much!!!

@normanmaurer
Copy link
Member

Please upgrade to latest version and check again. I suspect you need to trigger a flush() again

@YangruiEmma
Copy link
Author

YangruiEmma commented Apr 10, 2019

@normanmaurer Thanks! I have got reasons about the two main questions that mentioned in this issue

  1. why Recv-Q of server-side is backlog and in hang state, how to check it, is to relate to 4.0.21 ?

Indeed, it is due to 4.0.21, the phenomenon just happen in 4.0.21 and never happen again when server upgrade to 4.0.24.
But why 4.0.21 cause Recv-Q backlog, I still don't know.

  1. the dead channel exist(cannot send data), why direct memory grow quickly and constantly?(just heartbeat request use the channel, less memory keep growth is reasonable)

Because the default pool direct memory(64k) in ByteBufUtil, #7704 have updated to 0.
The UnpooledUnsafeDirectByteBuf of normal request won't be released unless its size > 64k, and UnpooledUnsafeDirectByteBuf objects are shared by all channels of one thread.
The size of heartbeat is tiny, but heartbeat hold the UnpooledUnsafeDirectByteBuf(allocated by normal request) in object pool to send request and don't release(Send-Q of heartbeat channel
backlog).
Through OQL find mass ThreadLocalUnsafeDirectByteBuf:
capacity 32,768B: 11510
capacity 16,384B: 3005
capacity 8192B: 75256
capacity 4096B:38186
capacity 2048B: 4718
capacity 256B: 28035
Total: 32,768*11510+16384*3005+8192*75256+4096*38186+2048*4718+256*28035 = 1.13G
Thus, that's why direct memory leak happen.
My solution to fix this memory leak : Close the channel when the unwritable status is continual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants