Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnect hangs #49

Closed
c-bg opened this issue Oct 29, 2019 · 23 comments
Closed

Disconnect hangs #49

c-bg opened this issue Oct 29, 2019 · 23 comments

Comments

@c-bg
Copy link

c-bg commented Oct 29, 2019

Hello,
Thanks a lot for developing this nice tool.
I've got a question regarding the connection Disconnect function; when I use it, it hangs, and when I debug it, the debugger stops at the following line:

c.DisconnectReceipt = <-c.input

Do you have any ideas why this might be the case?
Thanks a lot for your help in advance.
Feel free to let me know if you need any further information.

@c-bg
Copy link
Author

c-bg commented Oct 29, 2019

Furthermore, the line above does not seem to take the ReadDeadline into account, if I am not mistaken?

@gmallard
Copy link
Owner

I do have test(s) for disconnect receipts, and do not see hangs.

What stompngo version are you running ?

Do you have a short example that shows this behavior ?

Can you get me a stack trace of the hang (on Linux press Ctrl-\ when the hang occurs)?

Can you enable logging and show me that output ?

ReadDeadLine is reset between all calls to the TCP reader. They have to be. Otherwise nothing would work right.

You could always request no receipt as a work around.

@c-bg
Copy link
Author

c-bg commented Oct 29, 2019

Thank you very much for your prompt reply, much appreciated!

stompngo version: v1.0.12

I'm currently working on a minimal example, I'll come back to you as soon as I know more or have it available.

Please find the recorded stack trace with enabled connection logging attached:
stackdumpWithLogMasked.txt

I enabled the logging with conn.SetLogger. Please let me know if you'd like me to log something else as well.

ReadDeadLine is reset between all calls to the TCP reader. They have to be. Otherwise nothing would work right.

Alright, I see. Thanks a lot for the information.

You could always request no receipt as a work around.

Unfortunately I have to request a receipt to make sure that the disconnection has worked properly, for reasons that are probably beyond the scope of this.

Thank you very much for your help.

If you have any questions/comments, please do not hesitate to say so.

@gmallard
Copy link
Owner

Are you fully draining your input queues/topics ?

It looks like the system is deadlocked trying to send you a MESSAGE frame which you have not received.

@c-bg
Copy link
Author

c-bg commented Oct 29, 2019

thanks a lot for your help. now I'm draining the topic in a parallel goroutine while disconnecting, and it doesn't hang anymore -- however, now, it never receives a receipt. conn.DisconnectReceipt is empty, as well as all subsequent messages on conn.

@c-bg
Copy link
Author

c-bg commented Oct 29, 2019

short update: I see now that it works for some servers, for some not. I'll further investigate whether the error is in on the side of server's STOMP implementation, and come back to you if that error source is excluded. In the meantime I'll close this issue.
Thanks a lot for your prompt and helpful input, much appreciated!

@c-bg c-bg closed this as completed Oct 29, 2019
@c-bg
Copy link
Author

c-bg commented Oct 30, 2019

By now I can say that the error is not on the side of the servers.
My most recent insights are: Your library sees and records the receipt (visible in the logs):tada:, but when I try to access it with conn.DisconnectReceipt, it's sometimes no longer there. 🤔 Because of that, I can't verify that the receipt ID is the right one. Does the connection shutdown logic delete the recorded receipt, s.t. this issue would be due to a race condition?

@c-bg c-bg reopened this Oct 30, 2019
@gmallard
Copy link
Owner

The receipt is never deleted by stompngo code.

Show me the logs you are looking at please. The receipt should actually be in the logs (see code in disconnect.go).

Also ..... I need some code that recreates this. Because I cannot recreate it here .......

@c-bg
Copy link
Author

c-bg commented Oct 31, 2019

I've now taken another look at the logs and found something potentially interesting.

LOG:  2019/10/31 13:42:09.991834 connection.go:182: ID:< SOME ID, MASKED > < $HOME - MASKED >/go/pkg/mod/github.com/gmallard/stompngo@v1.0.12/disconnect.go 84 [DISCONNECT dr receipt:6
 {{  []} <nil>}]

Contrary to my original suspicion, the logged receipt in this line is not the one in the received message (which is in fact empty, as logged) -- it is a second mentioning of the earlier sent receipt.
This is the code creating the log entry:

c.log(DISCONNECT, "dr", ch, c.DisconnectReceipt)

where ch is a clone of the headers the user passed as the method's argument + a receipt, if the user didn't disallow it explicitly.

I suggest to check in the library whether the received receipt is the valid & expected one, e.g. with

conn.DisconnectReceipt.Message.Headers.ContainsKV(stomp.HK_RECEIPT_ID, receipt)

and with listening to the receipt with some safety margin. For example, one can start listening already right before sending the DISCONNECT frame, and keep listening some time until a receipt arrives. If the receipt doesn't arrive within a given timeout, one can mark the disconnection as unsuccessful -- I'm interested in hearing your input on how to handle this best, what follow-up actions to take if the disconnection didn't work. So far I've closed the underlying connection with a defer statement and stopped program execution to avoid the message buffer exploding on the server side.

I've implemented this mentioned procedure, let me know if you'd like me to contribute the code.
The only part that's missing is to start listening to the receipt already before sending the DISCONNECT frame, I'll do that now and keep you posted.

Also, I'm working on a small example, I'll come back to you as soon as I know more.

@c-bg
Copy link
Author

c-bg commented Oct 31, 2019

I'd like to give a short progress update.
As mentioned earlier, this issue doesn't always appear. I wrote a small example to connect and then disconnect with a receipt, but so far it works. There seems to be some sort of race condition or the like. Should I be able to reliably reproduce the error at some point, I'll tell you.

Since I have no idea where the issue comes from, and since the hypothetical race condition seems nontrivial to debug, I've now chosen the following procedure (and will change it back as soon as this issue is solved): If I don't get a receipt, it's not too bad, since I close the underlying connection afterwards anyways. So, I attempt to do it the nice way -- with a receipt -- but if it doesn't work, the program doesn't throw an error, it just warns, closes the underlying connection, and exits, since this is anyway the last action (at least in the current implementation).

If you'd like to suggest a different procedure or would recommend a different approach, please feel free to let me know, I'd be happy hear your feedback.

@gmallard
Copy link
Owner

gmallard commented Nov 1, 2019

Put a check in your code that looks at what is actually in conn.DisconnectReceipt. I want to know if it is actually an ERROR frame.

That is entirely possible, and my code does not handle that properly. I am working on a fix for that.

@c-bg
Copy link
Author

c-bg commented Nov 1, 2019

Ah, I think there might be a small misunderstanding :) I did not receive ERROR frames, but entirely empty ones; I have checks for this in the code:

if r.Message.Command == "" && (len(r.Message.Headers) == 0) && (len(r.Message.Body) == 0) {
    emptyMsgCount += 1
}

In general , thanks a lot for looking into this, much appreciated!

@gmallard
Copy link
Owner

gmallard commented Nov 2, 2019

Two items:

  1. There is an enhancement/correction to Disconnect on HEAD of the dev branch. You might try that to see if it provides any more data about this.

  2. This section in the specification:

https://stomp.github.io/stomp-specification-1.2.html#Connection_Lingering

Read the last sentence of that section.

@c-bg
Copy link
Author

c-bg commented Nov 8, 2019

Sorry for the delayed message.
Thanks a lot for pointing me to that sentence in the specification, that is very helpful!
Thanks also for the changes in the dev branch, they look good to me. If possible I'll try them out sometime soon and tell you how it went.

@gmallard
Copy link
Owner

Note that there is another reference to this behavior in:

https://stomp.github.io/stomp-specification-1.2.html#DISCONNECT

Question: have you tried this with heartbeats on?

@c-bg
Copy link
Author

c-bg commented Nov 11, 2019

Indeed, this is very helpful! Thanks a lot, your help is much appreciated.
Yes, I had heartbeats on (my heartbeats "offer" before negotiation: [0, 10000]).

@gmallard
Copy link
Owner

There is another enhancement to DISCONNECT on the dev branch.

It allows the user to optionally specify a "disconnect timeout". After the timeout expires, if no RECEIPT frame has been received, DISCONNECT will return an error indicating a timeout.

@gmallard
Copy link
Owner

Also, I pushed v1.0.13 to github a bit ago. Includes all the DISCONNECT enhancements. Please give that a try.

@c-bg
Copy link
Author

c-bg commented Nov 19, 2019

Please excuse the delayed reply.
Thanks a lot for the new features, this looks good to me.
At the moment I've switched from ActiveMQ to something else. I'm quite busy atm, but generally I'd be happy to try the changes out -- I'll let you know if I get a chance :)
Two small suggestions looking at the code: How about implementing timeouts with

case <- time.After(<timeout>)

instead of creating a ticker and assigning the channel output to _ here?
This is just a minor suggestion, nothing crucial.
Source: https://gobyexample.com/timeouts , last visited: 19. November 2019

@gmallard
Copy link
Owner

I am always curious about brokers. I would like to ask: from AMQ to what ?

Here I test with:

  • AMQ
  • Rabbit
  • Artemis
  • Apollo (the last release, since development has ceased)

Using time.After is a good idea, thanks, I will probably tweak the code to do that.

@c-bg
Copy link
Author

c-bg commented Nov 20, 2019

I switched from AMQ to a totally different approach that doesn't involve messaging at all -- not because of the messaging itself, but because my program needs to fetch information from somewhere else after all. :)

You're welcome, as already said just a small suggestion, nothing important.

Thank you very much for helping with this, very much appreciated! And in general thank you very much for maintaining this library.

@gmallard
Copy link
Owner

Apologies for the very tardy reply.

Removal of TIcker form that code has been on DEV since 12/23/19. Commit:

950baf2

I hope you have been able to try that.

I am going to close this issue based on the above. If you need to re-open it, please d so.

Regards, Guy

@c-bg
Copy link
Author

c-bg commented Jul 23, 2020

Thanks much for your message :) no worries about the delay.
I opened the issue during an internship, by now I don't have access to the code anymore for which I experienced the issue. Closing the issue sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants