-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: stops sending messages #851
Comments
@thelinuxlich can you provide some context on your configuration, and relevant logs from nsqd prior to and after restarting it? |
I have two nsqd nodes connecting to two nsqlookupd nodes. The two nsqd nodes also hosts a microservice directly writing messages to their local instances which gets subscribed by two microservices connecting directly to their nqslookupd instances. Logs usually only print this: Jan 25 13:31:25 aidax-collector-2 nsqd[19813]: [nsqd] 2017/01/25 13:31:25.501527 LOOKUPD(10.240.200.16:4160): sending heartbeat But there was a error last week: Jan 18 17:09:41 aidax-collector-2 nsqd[322]: [nsqd] 2017/01/18 17:09:41.682483 LOOKUPD(10.240.200.16:4160): ERROR PING - read tcp 10.240.226.74:54986->10.240.200.16:4160: i/o timeout But since it's heartbeating every 15s, shouldn't it reconnect normally instead of saving to the disk forever? |
The consumer should reconnect to nsqd and receive messages. But the report is still a bit too vague. Please include:
Those heartbeats, and the "error ping", are between nsqd and nsqlookupd. They do not indicate any connections from consumers. The problem may be in the consumer, or in network access between consumer and nsqd. |
I'm using nsq 0.3.8 in all nodes. The nsqd service configuration:
The "curl localhost:4151/stats" output of the nsqd node that I had to restart because it stopped sending messages:
The consumer uses https://github.com/dudleycarr/nsqjs here a sample JS code describing how I use it: const nsq = require("nsqjs"),
reader = new nsq.Reader("batch", "default", {
maxInFlight: 100,
lookupdHTTPAddresses: ['127.0.0.1:4161']
});
reader.connect();
reader.on("message", rec => {
const msg = rec.body.toString();
do_something_with_it(msg);
rec.finish();
}); |
I could be misreading the stats output, but it does look to me like the channel has processed all of the messages. Are you sure that there are unprocessed messages? |
There were millions of messages unprocessed, but I restarted the nsqd daemon before posting the issue here. |
When the message processing stalls, could you please try restarting the nsqjs worker instead of the |
I tried restarting the nsqjs worker first and it didn't worked. Only restarting the nsqd node it reappeared again in the nsqadmin UI sending all those messages saved to disk. |
If it happens again, please capture stats and recent log lines before restarting anything, and post them here. Thanks! |
I have this issue as well for quite some time, but I believe this has more to do with nsq.js than with NSQ itself. By restarting the job processors I was able to work around the issue, if you work with docker containers you can just have a job that check message count on the queue and if nothing is being processed for lets say 30 mins, you restart the docker containers responsible for consuming this msgs. I would recommend you having a look into the Golang driver https://github.com/nsqio/go-nsq and checking with a small poc if you are able to reproduce the same problem with Golang as well.. Cheers. |
Whether or not this bug a JS client issue, it'd be useful to get some clarity on the three available libraries:
Is there a recommended client? |
@mcorb I'm 99.99999% sure this issue here is related to client libraries, not My recommendation would be to use |
I've just seen this happening with nsqd 1.0.0-compat & the latest version of nsqjs. The setup is about 30 nsqds on individual hosts (only 4 active during this test), and 1 nslookupd. 4 worker processes, all successfully pulling data from 3 of the 4 nsqd nodes. 1 node was just piling up unprocessed messages, even though the workers reported connecting to it. Restarting the workers did not fix it. Here's a screenshot of the state of nsqadmin after nsqd restart. The workers did NOT need to be restarted to start getting data from the previously-dead node. The affected nsqd log was full of heartbeats & nothing else:
|
If In-Flight and Ready are both zero, then no messages will be received by that client connection, even though it is connected. @ceejbot in your case this is the problem, all zeros. If the It may be necessary to set |
If this is a clue that is at all helpful: attempting to empty the |
I'm happy to tweak the in-flight & ready params, but I am baffled about the other behavior described if that's indeed the problem. |
The |
Setting to 2x my nsqd instance count seems to have worked a treat, thanks! |
Finally addressing this in nsqio/go-nsq#208, obviously other client libraries will need to adopt this approach if it lands (I'll likely handle Also, there hasn't been evidence here that this is an |
Sometimes one of my nsqd nodes simply stops sending messages to the consumer/reader(using nsqjs) and starts saving these messages to disk. When I connect with nsqadmin to the nsqlookupd address I can see that this node is not in the Topic Message Queue list. So I restart its nsqd daemon, and all of a sudden it starts sending all those messages in the disk to the consumer and it reappears in the Topic Message Queue list. Its log is not very informative before that, just sending heartbeats to nsqlookupd.
Is there a reason why this should be happening or is it a bug?
The text was updated successfully, but these errors were encountered: