-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropping session queues leaks netty ByteBufs, resulting in OOM #608
Comments
I've made a change to address this issue. I need to do some more testing, but hope to send a PR shortly. @hylkevds also interested in your opinion on this since you've been chasing down memory leaks. This, IMO, is a significant one. My change involves simplifying session creation logic, and adding a
Working changes in https://github.com/jbutler/moquette/tree/fix/session-management |
Are you still testing with 0.14? Since that one has known memory leaks. Also 0.15 has one know bytebuf leak left, and a known leak in that subscriptions are not cleaned from sessions on unsubscribe, so you should apply PR #609 and #612. I've done some testing with clients that reconnect every 15 seconds, alternating clean session and persistent sessions and I've not found obvious leaks yet. One tricky bit is that when making a heap dump, it should be done after the clients connected and disconnected with clean sessions, otherwise it's expected that the server stores a lot of stuff :) The session-creation code does seem more complex than it needs to, but I've not traced it out yet for correctness. Still, more testing to be done. The HiveMQ client generates lots of warnings when alternating clean sessions and persistent sessions, so something is not quite right yet. |
I've reproduced the issue on the mainline. The easiest way is to use a modified client that does not send PUBACKs. This results in the inflight window filling up, and then subsequent messages are stored in the session queue. Everything in the inflight window is leaked when the Session objects are GC'd. "Case 2" in the SessionRegistry results in the session queue being dropped from the map, which means that a new session queue is created the next time a connection with that client ID is made. So, the sequence of events leading up to the session queue being leaked is a bit more complex, but is still present. The sequence would be:
Interestingly enough, the clean session in step 3 above also re-uses the previous Session object. So that may be why the HiveMQ client generates warnings. Moquette is resuming a previous session rather than creating a new one. Changing the session_present flag to false and using Anyway, I will try pulling in your subscription fix. I need to iterate on my PR for this issue still. My change appears to fix this issue in my simple testing, but I know there will still be problems if |
I had a bit of a look into the SessionRegistry, and one direct issue: It drops the session queue without releasing the messages from it...
Also, unsubscribe does not remove the subscription from the session:
And, as you mentioned, there are the |
I've added my take on the refactor of SessionRegistry in #619
This means we actually have to be careful when cleaning up the old session. We can't just throw out the queue. |
Thanks for taking a look. I'm not sure if you took a look at the change I made. I actually took things one step further and completely got rid of queue sharing.
Rather than copying session state into a new Session object, I've opted to re-use the old Session when resuming a session. A new Session is only created in the other cases. Each newly created session gets a brand new queue. IMO that's much easier to reason about. Another thing to mention: cleaning up Session state is not as simple as just calling a "clean" method when replacing an old Session. There is another race that we need to protect against. We need to ensure that messages cannot be published to the Session while we are tearing it down. For example, a message could be added to
I've gotten around this by adding reference counting to the Session object. I can't think of a better way to stop incoming messages that doesn't involve locking / synchronizing between threads, which would obviously hurt performance. Let me send out a PR instead of linking back to my dev branch. It will be easier to discuss there |
I've not looked too closely at your version yet, mainly trying to understand the current code first. Good point about the race with the session clean up. There can be many threads adding messages to the Session, and they already use an AtomicInteger for the inflightSlots to see if they can directly send, or need to add to a queue. The Session status is also an AtomicReference. |
I thought about adding a new Delayed clean up could work. Reference counting is pretty simple though. Basically just extend
The last thing I'm trying to work through is how to best disconnect the old connection. Closing the inbound channel associated with the old connection triggers moquette/broker/src/main/java/io/moquette/broker/MQTTConnection.java Lines 278 to 292 in bf3f831
Of course, this would be bad! Easiest thing might be to just add some sort of |
Reference counting on the Session is an interesting idea! Could we solve the |
I like that! Then the only time the Registry releases a persistent Session is when a new clean Session is created to replace it. @andsel curious on your thoughts when you have a moment. Most of the coding is done (implementing the changes @hylkevds suggested won't take long) but I'd appreciate your buy-in before I put more effort. |
Thank's folks to put down the discussion. I need some time to understand all the pictures you have listed above, I don't like too much to use reference counting for sessions because, we off load the work of the memory management to the application code while that's matter of the GC. However if that's the more practical solution I'm not against it, I only need time to think about. |
Summary for the Reference Counting disussion:
Options:
Ideally we would use the Java 9 Cleaner API, but that would tie us to Java 9+. |
Hi @hylkevds for the great summary. The reference counting seems promising. I foresee some points where we need to pay attention.
Possible problems
|
Yeah, reference counting helps, but doesn't completely solve all problems. There are actually two problems:
Reference counting solves the first problem. When the counter goes to 0, the There is, however, the second issue of one thread trying to terminate the Session, while another thread is trying to revive/claim the Session. I think the worst-case scenario is when a live Session is claimed simultaneously by two new connections, one with cleanSession and one without. In the MQTT 5 case it could be between a Connection trying to revive a Session, and the SessionCleaning trying to terminate that Session after it timed out. I think we are going to need a bit of locking to decide who wins that fight, but that locking should not involve any working Threads that add or remove Messages to/from the session. This locking should not happen often, but only on connect events with an existing Session and only involve other Threads that try to claim/terminate the Session. I don't think it'll impact performance much. |
Good call out on problem 2. I think we can avoid that problem entirely for clean sessions if we re-factor Session creation logic like I've done in #620. If clean sessions always get a brand new Session and that Session can never be resumed by another connection, then there's nothing to worry about. But I still have a bug when two clients connect simultaneously using persistent sessions. Related to your concern are problems created by having the
Locking in Steering the conversation to Session management for a second, since I've made some significant changes in this area and I'd like to justify them. I'll propose that we should distill cases 1-4 down into 3. Logic is as follows:
With this approach, we never need to re-use session queues. In fact, the SessionRegistry itself doesn't even need to track session queues. That responsibility can live entirely within the QueueRepository. This fixes two issues:
Of course, it would be possible to address those problems, but I believe that starting with whether a session is clean or not simplifies the decision matrix quite a bit. Thoughts? |
BTW - Java 9 Cleaners look neat! But I'll selfishly advocate for a Java 8 compatible solution. My project allows my customers to run on Java 8, and I'd prefer not to maintain diffs in this particular code path. |
Yes, first checking if a new session is in order makes things a lot simpler.
I think adding / removing the registry reference should always happen, not just for persistent sessions. All sessions have a registry reference, otherwise they'd be cleaned up immediately :) On the session handling, when ever we try to resume, take-over or terminate a session we should:
The same bit of code should never need to hold a lock on two Session instances, to avoid deadlocks. The only case were two instances are involved is when creating a new session and terminating an old one, and in that case we don't need a lock on the new session, since adding it to the registry is the last step. Only after that step we have the old Session for termination. |
Hi folks I've some doubts on some sentences:
All transitions of Session's state in
this is not clear to me what you mean, if there is no session in the registry how there could be an old session that was clean? We should only speak of the new session, or am I missing something. |
Sorry, you are absolutely correct! The SessionRegistry removes its reference while terminating the Session, regardless of whether it is clean. My code does this the correct way, I just explained it wrong here :) This was pre morning coffee
Sorry, that was confusing. I suppose it is actually 3 unrelated cases all in 1. Here it is in code:
Essentially, in all of these cases, the client should get a brand new Session (from an MQTT spec perspective). If the Session doesn't exist, there isn't one to resume (even if we wanted to). If the new Session is clean, then we don't want any state from the old Session. And if the old Session is clean, then it should not be resumed (according to the spec). |
It is no longer necessary to replace anything in the SessionRegistry with the new changes when a Session is resumed. We need to be careful since replacing a connection on an existing Session is a multi-step process (e.g. disconnect old client, bind new connection, move Session state back to CONNECTING/CONNECTED). There is some discussion in the PR that highlights a new potential race condition |
@jbutler I would suggest to create a PR which focus only on the reference counting, which solves the problem of the missed ByteBuf deallocation when a Session is cleaned, this would simplify my job to review and understand it. From this discussion a couple of interesting points come out, which should be tracked in different issues each:
|
Looking at the a possible thread interleaving that manifest the problem:
the thread1 once resume the execution at time 4, puts a message in the wrong flight zone and send to a client that wasn't subscribed. I think that here we have 2 distinct problems:
The problem 2 is due to the fact that the broker needs a unique status variable per clientId to coordinate the connection flow, this is the main motivation why the case 2 of the connection flow needs to wipe queues and inflight, the The problem1 regards 2 threads that are trying to update an established |
The interaction between thread 1 and 3 would be solved by the simplification of the connect logic + reference counting: Never re-use a Session unless it's a resumed session, and never re-use queues. If publishing thread adds messages to a disconnecting Session, the messages will simply be cleaned up when the last release() happens. Always using the Netty-thread for message handling is a good idea, since in the above example thread 1 probably runs into an exception when trying to send the message. Always putting it on the queue would fix that, but can a Netty-thread be notified that there is work waiting for it on the sending-side? That only leaves the problem of handing over a non-clean Session from one connection to another. Especially when there are multiple connections trying to "grab" the Session. |
Sorry for the delay. Good discussion here. I agree simplifying Session creation logic does eliminate some of the problems. I will break up my PR and start with just adding reference counting to the Session object so it's easier to review. It's a holiday weekend in the states so it will take me a few days, but I am planning to pick this up mid next week. |
Thanks @jbutler! there is no rush, I'll prepare the PR to switch the execution of publish in the peer's socket thread |
Tinkering with the code to understand if the "Netty's thread per session" is feasible, it come out that we can't use it because the broker can have sessions that still are live, they keep subscriptions and store message, but are not linked to an MQTTConnection (the client dropped it, for example). So we can think to have a pool of threads that manages the sessions. Each session is sticked to a specific thread, and a thread can have multiple sessions. The linkage happens by It comes at a cost of a one more CPU's context switch per published message, now we have 1 context switch per publish, the publisher thread has to switch to the subscriber thread; then we would have publisher thread -> session's thread -> subscriber thread, so 2 context switches. WDYT of this idea? |
I can't look at the code till next week... What needs to be done for a disconnected Session? Publish Threads add messages to the queue, but that's it. Having a separate Thread manage the messages doesn't solve the need for reference counting, since we still have a queue that is accessed by multiple Threads.
If the netty Thread can be notified when new work arrives we don't need a separate worker Thread, since if there is no netty Thread, no work needs to be done.
In-flight queues probably have to be per-connection instead of per-Session...
…________________________________
From: Andrea Selva ***@***.***>
Sent: 06 September 2021 13:24:12
To: moquette-io/moquette
Cc: Schaaf, Hylke van der; Mention
Subject: Re: [moquette-io/moquette] Dropping session queues leaks netty ByteBufs, resulting in OOM (#608)
Tinkering with the code to understand if the "Netty's thread per session" is feasible, it come out that we can't use it because the broker can have sessions that still are live, they keep subscriptions and store message, but are not linked to an MQTTConnection (the client dropped it, for example).
So we can think to have a pool of threads that manages the sessions. Each session is sticked to a specific thread, and a thread can have multiple sessions. The linkage happens by clientId so that every interaction or changes to the session is physically executed by one thread and it's always the same. The change operations are simply serialiazed actions enqueued to a specific thread.
This would eliminate the necessity of synchronization and also the problem of reference counting to decide which is the thread that needs to release the buffers referenced by the session.
It comes at a cost of a one more CPU's context switch per published message, now we have 1 context switch per publish, the publisher thread has to switch to the subscriber thread; then we would have publisher thread -> session's thread -> subscriber thread, so 2 context switches.
WDYT of this idea?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#608 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGCEFMH2SVGPXYTXODUQSTUASQFZANCNFSM5AUXHC2Q>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Storing messages on the queue.
In my idea is to let a unique thread all the handling of a sessions, enqueueing messages is part of the job of this management thread.
No, in this new shape the writing of messages, and the writing in general on the Session object is done only by one thread that receives it's jobs from a queue. Obviously there isn't a thread per sessions, but we could create a pool of threads equals to number of cores, and have one thread that manage the change statuses for a subset of Session instances. In this way the access to the Session is serialized. The problem with the actual design is that we need complex "status and CAS" logic for interaction with the session, plus we need to introduce also the concept of reference counting to avoid memory errors.
|
After more testing I noticed that failedPublishes currently doesn't work, since adding to the queue never causes an exception, so Fixing that noticed that both the Paho and HiveMQ clients never seem to re-send failed publishes. Instead they just "hang". The Async client of HiveMq does continue, but still doesn't send any dup packets. Having a quick look at the paho code it seems it only ever sets the dup flag when restoring after a disconnect. (I found a bit more info in this old email thread: https://www.eclipse.org/lists/paho-dev/msg03429.html ) This is, of course, quite bad for our plan to ignore messages and we may have to go back to delaying the response instead of completely dropping it... On a brighter note, I did experiment with creating only one message per queue for a publish, and that does indeed improve things massively. The implementation is quite an ugly hack, but I've added the commit to #637 so you can have a look. (88104d0) |
In next weeks I'll fix this, missed that the exception wasn't raised.
From the email thread you linked it's a specification change introduced in MQTT |
With the change to only have 1 command-per-queue, and making sure the queue size is configurable, dropping the connection may actually be fine. We will have to come up with an alternative for the internalPublish method for embedded use, since there is no connection to drop. A checked exception would be suitable. Using a tiny (configurable) timeout with a blocking queue may still improve things as well, since a publish with many subscribers will still result in a large number of Acks, though those are more spread out due to the nature of the network. On a slow 4-core laptop, using a tiny queue (size 32 and 1ms block limit) with 100 listeners and sending bursts of 20 QOS-2 messages does cause quite a few overflows in ack messages, but only very rarely dropped publish messages. Logging a warning about the queue being overrun, with the suggestion to increase the queue size, is very important of course. |
@hylkevds with latest commits to #631 I've improved a flood test plus reworked a little bit your idea to squeeze the publishes. Nothing far from your original idea, just kept all the collection and iteration over the batched published in one place and changed the |
#631 starts looking good :) Dropping the connection on a sub or unsub sounds like a bad idea. Especially for a clean session, since that will result in lost packets. Also especially for the unsub, since the client is trying to reduce our load... For the test, it makes sense that the queue overflows with subscriptions, when making that many in a short time :) |
That's great, could you list all the tasks it needs to get approved? So that we can sort out all the item, kinda TODO list
Right, so the disconnect of the client in case of queue full is not a viable solution., we can just log a warn for this.
It could be that we drop the unlucky client that sends 1 messages every 1 minute, but it has bad luck and send exactly the message that makes the queue to overflow. So in normal circumstances when there is an heavy sender, it reasonable that it has more probability to be hit, but it's coarse grain solution. However as cited before, dropping connection is not viable.
Yes we can assume that the clientIds are equally distributed across all queues, so we could simply use a barrier initialized at |
I think that's it. After that we can actually get back to the original topic of this Issue, dropping queues and the Session handling :) |
Great! I found and fixed two small issues while testing:
With that this branch is ready to merge! Edit: did a quick log of how the queues are assigned on my machine: https://gist.github.com/hylkevds/e50091cab9ba1d17d11d49adbe32d901 |
Thanks @hylkevds good catch, commits cherry-picked. About the final int murmur = MurmurHash3.hash32x86(name.getBytes(UTF_8));
final int targetQueueId = Math.abs(murmur) % EVENT_LOOPS; And got maybe more uneven results: |
I thought that the problem could be in ((hash % EVENT_LOOPS) + EVENT_LOOPS) % EVENT_LOOPS but the results are pretty consistent with the |
I agree. |
Now that the Thread safety is fixed, we can get back to the original topic of session and session queue handling. Now that session handling is always done by the same Thread, we should be able to simplify The issues I see:
|
I've started working on it in #648.
|
I'm working on a just discovered buffer leak, reproduced with #649 |
About the queues management:
After the queues are fine and don't leak we can think about the simplification of |
I think the concept of having an interface for the queues is not bad, though it may be a good idea to merge the subscriptions store ( The The problem with queues is mainly how they are handled in
That solves the problems in this issue, except for the two listed above. |
I think that keeping separate the
Seems ok to me.
The idea is ok, when #648 is ready to be reviewed, please, describe what it does and how it solves the problem and mark me as reviewer. Then we can approach the problem of pending commands for an dropped session when a new session with same clientId is just recreated and the other for the timeout. |
Ok, I updated the comment for #648. |
In #662 that builds on #648 I've added a timeout to disconnected Sessions, to get some feeling for the effect and implications. Of course there are many ways to implement this, and here I just picked one to have something to discuss. First result: Clearly visible is the initial increase in memory use, until the sessions start timing out. After that things calm down and stay nice and constant. |
I've updated #662 a bit, to make the session timeout configurable. |
Expected behavior
Steps described below should not result in Moquette OOM
Actual behavior
After a couple hours, Moquette is unable to allocate memory for incoming PUBLISH messages.
Steps to reproduce
OOM can be reproduced using two clients.
Client A publishes to
topic
at QoS 1 at a high TPSClient B subscribes to
topic
at QoS 1.Client B is a slow device. This can either be a slow physical device (e.g. Raspberry Pi) or simulate a slow device by artificially delaying PUBACKs. Client B will connect and disconnect, alternating between clean and persistent sessions.
To accelerate reproduction, I also reduced
INFLIGHT_WINDOW_SIZE
from 10 to 1.Minimal yet complete reproducer code (or URL to code) or complete log file
Publisher code:
Subscriber code:
I suspect this leak occurs when a client connects using a persistent session, disconnect, and reconnects using a clean session. When this happens, the SessionRegistry drops the session queue. Dropping the session queue then leaks all messages in the queue, unless I'm missing something.
Thoughts?
Moquette MQTT version
0.14
JVM version (e.g.
java -version
)openjdk version "1.8.0_282"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_282-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)
OS version (e.g.
uname -a
)Darwin 19.6.0 Darwin Kernel Version 19.6.0
The text was updated successfully, but these errors were encountered: