Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature add bad heartbeat logic #22

Merged
merged 2 commits into from Apr 24, 2015

Conversation

maxgarvey
Copy link
Contributor

add new logic to deal with panic caused by bad heartbeating

@gmallard
Copy link
Owner

ACK that I have seen this PR.

I will run the heartbeat tests against ActiveMQ, Apollo, and Rabbit. That will take some time, several days at least.

With this change ..... I will also test with the race detector. I am a little uneasy about that 'rgr' variable to count sending goroutines.

If I have no problems with all the above I will accept this. Otherwise, I'll let you know.

If you have some time and this info is available, I would ask that you document here what the negotiated heartbeat values were when that panic occurred. And also what broker was in use.

@maxgarvey
Copy link
Contributor Author

Cool, sounds good; thanks for answering back so rapidly

Regarding the info requested:
the broker used was RabbitMQ
the heartbeat values at initialization is "5000,5000"

I never got printouts of what the values of first and last were when it failed;

the steps to repro the bug were to get the service using stompngo running with a healthy rabbit connection, and then filling the hard drive on the rabbit instance such that the GUI isn't reachable/getting 500s. Doing this, eventually (like <30s) the process would error out with the message above.

Please let me know if there's additional info that would be helpful

@gmallard
Copy link
Owner

Thanks for that information. I do not think I will be able to recreate that scenario - fill the hard drive, etc.

But this afternoon I ran a full set of heartbeat tests against that code, and saw no ill effects. I want to play a little more with some other informal tests I have, but it looks good at this point. Tests this afternoon were against:

  • ActiveMQ 5.9.0
  • Apollo snapshot - trunk-20140429.143513-255
  • Rabbit - 3.2.4-1 (Ubuntu 14.04 repository)

I am going to ignore 'races' for now - when I first wrote stompngo the race detector did not exist. And the entire package needs to be looked at. That is now on my TODO list. I am sure there are races, but I will examine the entire package in different effort.

gmallard added a commit that referenced this pull request Apr 24, 2015
@gmallard gmallard merged commit f6f1061 into gmallard:dev Apr 24, 2015
@gmallard
Copy link
Owner

You should look at the HEAD of the current dev branch. Race conditions uncovered by tests with -race have been eliminated.

A number of them were in heartbeat code.

Just make sure I have not broken your system with those changes.

@maxgarvey
Copy link
Contributor Author

Thanks again for making these changes;

I'm not sure that I will be able to load test these changes today due to another part of testing infrastructure being down, but I will do so ASAP and let you know

@gmallard
Copy link
Owner

No problem, no hurry. I just wanted to let you know that you should probably retest, and report any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants