New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak related to TCP connection management with use of STOMP broker relay [SPR-11531] #16156
Comments
Rossen Stoyanchev commented After some testing and looking through the code, I found an issue with closing connections, which should be fixed now. Despite this issue, due to the heartbeat support, the broker should eventually (10 seconds by default) notice the client is no longer active, and the connections should be getting closed. I am wondering if you're by any chance explicitly setting the heartbeat settings to I've also added logging messages that show the number of active WebSocket sessions and the number of TCP connections to the broker to make it easier to debug. Could you try |
Kevin Jordan commented I'll try the snapshot. I've got the hearbeat set to the default through stomp.js:
I also wonder if it has something to do with going through an EC2 load balancer. Although since it's using websockets, I wouldn't think it should have more than one connection open. |
Rossen Stoyanchev commented Is the load balancer configured to use sticky sessions? Indeed for the websocket transport it wouldn't matter, but for other transports (for IE < 10 clients for example) it would. See the comments on "load balancer" in the sockjs protocol test suite. Do give this change a try. The added log messages could also be helpful for tracking down issues. |
Kevin Jordan commented I'll run a new memory dump at the end of the day but I left a client up overnight and my server didn't run out of memory yet, so that's good so far. |
Kevin Jordan commented I don't think that fixed it. I've attached a new memory analysis after 1 day of running. |
Rossen Stoyanchev commented Alright, we'll keep looking for what's causing this. Could you provide a basic description of the test scenario? Also I wonder what logging is enabled, in particular the following log message appear on every new request. If that is enabled what numbers do you see? |
Rossen Stoyanchev commented Also what browsers are clients executing from? Specifically I wonder if you know what SockJS transports are being used (e.g. IE 7/8/9 would imply HTTP-based transports). Also you mentioned EC2 load balancing, is that configured for sticky sessions? |
Rossen Stoyanchev commented Kevin Jordan, any updates? Given a basic description I can try to reproduce this on my side. |
Kevin Jordan commented The clients are Chrome and Firefox. The EC2 load balancer isn't set up for sticky sessions, but it also only has one instance running currently. It may be related somehow to Amazon EC2 and/or VPC structure. I don't know if our local testing box just gets so many webapp updates pushed to it that I don't see the memory problem since Tomcat is restarted several times a day or if it's something with Ubuntu 12.04 (latest version available in OpsWorks) or something inside AWS. I thought maybe it was somehow to do with the Java version since my testing box was behind on that (Oracle JDK 7.0.45 locally vs Oracle JDK 7.0.51 on Amazon), but after updating the testing box it doesn't show it at least overnight (jmap dump gives a size of about 300MB with none of the results showing a lot of NettyTcpConnections). The testing box is also outside of Amazon with a lot more memory than an m1.small provides. I may set up a duplicate environment and see if I can get it to do it there too. If possible, you might try doing an example under an EC2 environment as well. |
Rossen Stoyanchev commented Okay thanks for that detail. What about the test itself, how many clients are you running, what are those clients doing, and what does the server do? In very general terms of course. I'm mainly interested in getting a sense of the overall message flow, and events such as clients connecting and disconnecting. |
Kevin Jordan commented Not very many clients. Maybe 5-10 at most during the day. It seems like the connections build up regardless of whether someone is connected or not. So I'm assuming those are connections to my RabbitMQ server and not connections to the client. |
Rossen Stoyanchev commented We maintain only one connection to RabbitMQ (it's called "system" connection in the configuration) for sending messages from within the application to the broker for broadcasting purposes. As clients connect we create additional connections so each client can have its own STOMP session to the broker but those connections are opened and closed in tandem with the client connections.
The "system" connection will try to reconnect if it gets closed, I wonder if that's somehow causing an issue. If you are able to confirm that the number of connections grows even without clients, that would be very helpful. This also just came in #16178 that may be related. Could the RabbitMQ service be becoming unavailable at times triggering retries in turn? BTW whenever the broker becomes available/unavailable an ApplicationContext event is triggered so you can monitor and log this. For example see the QuoteService in the portfolio sample. |
Kevin Jordan commented I'd sort of doubt that the RabbitMQ service became unavailable as it's on the same host. |
Rossen Stoyanchev commented I have it reproduced. Thanks. |
Rossen Stoyanchev commented Kevin Jordan, there is a new reactor-tcp |
Rossen Stoyanchev commented It's in Maven central now (reactor-tcp 1.0.1.RELEASE). |
Kevin Jordan commented I've got a new version of my app going out and I'll let you know if I still get a memory leak or not. |
Kevin Jordan commented I'll run a jmap on it later tonight, but if there is still a leak I think it's cut down quite a bit. |
Kevin Jordan commented Ran another jmap tonight and no signs of a leak. I think this can be closed now. |
Rossen Stoyanchev commented Great to hear, thanks for confirming! |
Kevin Jordan opened SPR-11531 and commented
This may be more of a bug with reactor which Spring websockets uses, but it seems it builds up a lot of connections. I get almost 800MB in just a few short days. I don't have 65k connections active at any given time so this shouldn't be this way. In my configuration I'm using stomp with rabbitmq.
Configuration:
Could it somehow be opening connections for non-websocket connections? Do I have a mis-configuration? It works fine except for the memory leak.
I've attached a screenshot from MAT from a memory dump of my webapp.
Affects: 4.0 GA, 4.0.2
Attachments:
The text was updated successfully, but these errors were encountered: