Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd not sending heartbeats #859

Closed
JensRantil opened this issue Mar 5, 2017 · 3 comments
Closed

nsqd not sending heartbeats #859

JensRantil opened this issue Mar 5, 2017 · 3 comments

Comments

@JensRantil
Copy link

Background: Just downloaded nsq release 0.3.8 for Darwin to test for a potential project. I am new to the project so I apologize if I've missed something crucial here.

Problem: When starting nsqlookupd, nsqd and nsqadmin, it looks like nsqd stops sending heartbeats to nsqlookupd shortly thereafter.

Steps to recreate:

  1. Download release 0.3.8 (for Darwin) from Github.
  2. Open up three terminal.
  3. In terminal 1, execute ./nsqlookupd.
  4. In terminal 2, start a clean nsqd: rm *.dat && ./nsqd -lookupd-tcp-address localhost:4160. Notice that there is communication happening between the nsqd and nsqlookupd.
  5. In terminal 3, execute ./nsqadmin -lookupd-http-address localhost:4161
  6. In browser, open http://localhost:4171/nodes and make sure that your single nsqd is listed there.
  7. Notice that terminal 2 (log below) sends 4 heartbeats, but then stops sending heartbeats.
  8. After 5-6 minutes, reload http://localhost:4171/nodes and notice there is no nsqd listed there despite the fact that it's running.

Expected: That heartbeats are sent to continuously to nsqlookupd and that nsqd is constrantly listed in nsqadmin GUI.

Discussion:

  • Suspecting that heartbeats were only sent when there were messages ready for consumtion on nsqd I also sent a few messages to it using to_nsq script. I double checked there was a queue depth in nsqadmin. nsqlookupd logs [nsqlookupd] 2017/03/05 18:49:25.876432 DB: client([::1]:52076) REGISTER category:topic key:testtopic subkey:, but nsqd did not start sending heartbeats and it doesn't show up in nsqlookupd. 😢
  • I asked a question about this on IRC, but didn't get any answer within my personal timeout ;) Decided to create an issue instead.
  • It feels like I'm missing something here as this is basic stuff. Please clarify and maybe we can improve documentation/default somehow!

Logs:

Terminal 1:

$ ./nsqlookupd
[nsqlookupd] 2017/03/05 18:19:11.323092 nsqlookupd v0.3.8 (built w/go1.6.2)
[nsqlookupd] 2017/03/05 18:19:11.323495 TCP: listening on [::]:4160
[nsqlookupd] 2017/03/05 18:19:11.323526 HTTP: listening on [::]:4161
[nsqlookupd] 2017/03/05 18:19:16.223427 TCP: new client([::1]:51680)
[nsqlookupd] 2017/03/05 18:19:16.223456 CLIENT([::1]:51680): desired protocol magic '  V1'
[nsqlookupd] 2017/03/05 18:19:16.223674 CLIENT([::1]:51680): IDENTIFY Address:MacBook-Pro-3.lan TCP:4150 HTTP:4151 Version:0.3.8
[nsqlookupd] 2017/03/05 18:19:16.223711 DB: client([::1]:51680) REGISTER category:client key: subkey:
[nsqlookupd] 2017/03/05 18:19:31.223651 CLIENT([::1]:51680): pinged (last ping 14.999954538s)
[nsqlookupd] 2017/03/05 18:19:46.223603 CLIENT([::1]:51680): pinged (last ping 14.999967991s)
[nsqlookupd] 2017/03/05 18:20:01.223691 CLIENT([::1]:51680): pinged (last ping 15.000067492s)
[nsqlookupd] 2017/03/05 18:20:16.223746 CLIENT([::1]:51680): pinged (last ping 15.000070542s)
[nsqlookupd] 2017/03/05 18:21:22.327779 200 GET /nodes ([::1]:51707) 91.301µs
[nsqlookupd] 2017/03/05 18:23:51.552302 200 GET /nodes ([::1]:51714) 34.067µs
[nsqlookupd] 2017/03/05 18:31:56.538593 200 GET /nodes ([::1]:52054) 30.236µs
[nsqlookupd] 2017/03/05 18:32:09.292243 CLIENT([::1]:51680): closing
[nsqlookupd] 2017/03/05 18:32:09.292265 DB: client([::1]:51680) UNREGISTER category:client key: subkey:
[nsqlookupd] 2017/03/05 18:32:09.292272 ERROR: client([::1]:51680) - EOF
^C[nsqlookupd] 2017/03/05 18:33:33.343569 HTTP: closing [::]:4161
[nsqlookupd] 2017/03/05 18:33:33.343627 TCP: closing [::]:4160

Terminal 2:

./nsqd -lookupd-tcp-address localhost:4160
[nsqd] 2017/03/05 18:19:16.219923 nsqd v0.3.8 (built w/go1.6.2)
[nsqd] 2017/03/05 18:19:16.220225 ID: 219
[nsqd] 2017/03/05 18:19:16.220353 NSQ: persisting topic/channel metadata to nsqd.219.dat
[nsqd] 2017/03/05 18:19:16.222319 LOOKUP(localhost:4160): adding peer
[nsqd] 2017/03/05 18:19:16.222341 LOOKUP connecting to localhost:4160
[nsqd] 2017/03/05 18:19:16.222321 TCP: listening on [::]:4150
[nsqd] 2017/03/05 18:19:16.222343 HTTP: listening on [::]:4151
[nsqd] 2017/03/05 18:19:16.224334 LOOKUPD(localhost:4160): peer info {TCPPort:4160 HTTPPort:4161 Version:0.3.8 BroadcastAddress:}
[nsqd] 2017/03/05 18:19:31.223448 LOOKUPD(localhost:4160): sending heartbeat
[nsqd] 2017/03/05 18:19:46.223422 LOOKUPD(localhost:4160): sending heartbeat
[nsqd] 2017/03/05 18:20:01.223472 LOOKUPD(localhost:4160): sending heartbeat
[nsqd] 2017/03/05 18:20:16.223524 LOOKUPD(localhost:4160): sending heartbeat
^C[nsqd] 2017/03/05 18:32:09.288531 NSQ: persisting topic/channel metadata to nsqd.219.dat
[nsqd] 2017/03/05 18:32:09.288664 TCP: closing [::]:4150
[nsqd] 2017/03/05 18:32:09.288790 HTTP: closing [::]:4151
[nsqd] 2017/03/05 18:32:09.290848 NSQ: closing topics
[nsqd] 2017/03/05 18:32:09.290877 QUEUESCAN: closing
[nsqd] 2017/03/05 18:32:09.290881 ID: closing
[nsqd] 2017/03/05 18:32:09.290903 LOOKUP: closing
$ date
Sun Mar  5 18:32:11 CET 2017

Terminal 3:

$ ./nsqadmin -lookupd-http-address localhost:4161
[nsqadmin] 2017/03/05 18:19:22.303003 nsqadmin v0.3.8 (built w/go1.6.2)
[nsqadmin] 2017/03/05 18:19:22.303330 HTTP: listening on [::]:4171
[nsqadmin] 2017/03/05 18:21:22.170614 200 GET /nodes ([::1]:51701) 651.889µs
[nsqadmin] 2017/03/05 18:21:22.199770 200 GET /static/base.css ([::1]:51702) 12.598944ms
[nsqadmin] 2017/03/05 18:21:22.200757 200 GET /static/main.js ([::1]:51704) 7.505969ms
[nsqadmin] 2017/03/05 18:21:22.203325 200 GET /static/bootstrap.min.css ([::1]:51701) 11.392668ms
[nsqadmin] 2017/03/05 18:21:22.204103 200 GET /static/vendor.js ([::1]:51703) 10.115815ms
[nsqadmin] 2017/03/05 18:21:22.325957 CI: querying nsqlookupd http://localhost:4161/nodes
[nsqadmin] 2017/03/05 18:21:22.327803 200 GET /static/nsq_blue.png ([::1]:51704) 364.439µs
[nsqadmin] 2017/03/05 18:21:22.329194 200 GET /api/nodes ([::1]:51703) 3.278707ms
[nsqadmin] 2017/03/05 18:21:22.524397 200 GET /static/favicon.png ([::1]:51704) 236.103µs
[nsqadmin] 2017/03/05 18:23:51.401074 200 GET /nodes ([::1]:51704) 640.186µs
[nsqadmin] 2017/03/05 18:23:51.415142 200 GET /static/bootstrap.min.css ([::1]:51704) 1.483991ms
[nsqadmin] 2017/03/05 18:23:51.421146 200 GET /static/vendor.js ([::1]:51702) 3.414312ms
[nsqadmin] 2017/03/05 18:23:51.425865 200 GET /static/base.css ([::1]:51703) 121.399µs
[nsqadmin] 2017/03/05 18:23:51.434007 200 GET /static/main.js ([::1]:51701) 4.115398ms
[nsqadmin] 2017/03/05 18:23:51.542458 200 GET /static/nsq_blue.png ([::1]:51702) 229.482µs
[nsqadmin] 2017/03/05 18:23:51.550946 CI: querying nsqlookupd http://localhost:4161/nodes
[nsqadmin] 2017/03/05 18:23:51.553812 200 GET /api/nodes ([::1]:51702) 2.889014ms
[nsqadmin] 2017/03/05 18:23:51.756247 200 GET /static/favicon.png ([::1]:51702) 226.668µs
[nsqadmin] 2017/03/05 18:31:56.374134 200 GET /nodes ([::1]:52048) 662.327µs
[nsqadmin] 2017/03/05 18:31:56.385205 200 GET /static/bootstrap.min.css ([::1]:52048) 1.563993ms
[nsqadmin] 2017/03/05 18:31:56.396541 200 GET /static/base.css ([::1]:52049) 118.878µs
[nsqadmin] 2017/03/05 18:31:56.400205 200 GET /static/main.js ([::1]:52051) 8.700146ms
[nsqadmin] 2017/03/05 18:31:56.404636 200 GET /static/vendor.js ([::1]:52050) 11.379851ms
[nsqadmin] 2017/03/05 18:31:56.537500 CI: querying nsqlookupd http://localhost:4161/nodes
[nsqadmin] 2017/03/05 18:31:56.538086 200 GET /static/nsq_blue.png ([::1]:52051) 327.236µs
[nsqadmin] 2017/03/05 18:31:56.538817 200 GET /api/nodes ([::1]:52050) 1.3424ms
[nsqadmin] 2017/03/05 18:31:56.736195 200 GET /static/favicon.png ([::1]:52051) 217.25µs
^C[nsqadmin] 2017/03/05 18:34:10.113381 HTTP: closing [::]:4171
@mreiferson
Copy link
Member

mreiferson commented Mar 5, 2017

I can't reproduce, but I think this could have something to do with the fact that Go 1.6.2 isn't compatible with macOS Sierra — is that the version you're running? I'm on 10.12.3.

I suggest installing NSQ on macOS with brew instead.

Since I'm not seeing this, if you're interested in debugging further, you could send nsqd a SIGQUIT (CTRL + \) to get a stack trace and see if there are any deadlocks related to lookupd goroutines.

@JensRantil
Copy link
Author

@mreiferson I am on MacOSX Sierra (10.12.3), too. Just double checking, how did you try to recreate this? Using the release downloadable here on Github, or by building your own binary/brew?

That said, I Installed 0.3.8 using brew and heartbeat is continuously being submitted now.

I dumped the stacks and you can find them here: https://gist.github.com/JensRantil/2d97bba40ebd9512049b9a8a693fbb12 I don't expect you to spend time to look at them, but I do note that the lookupLoop function is at least running. I'll leave this for now since it works great when installing from brew.

@mreiferson
Copy link
Member

@JensRantil I tried both, couldn't reproduce in either 😞. I'll take a look at that gist, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants