You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed nsqd correctly marked the health state to "not ok" (e.g. via /ping end point) when disk is full and a message is published. However, it did not do this when a message is published to existing topic -- even worse, it reset the state to healthy/ok.
Steps to replicate:
./nsqd -mem-queue-size=0
curl 'http://localhost:4151/ping'
## see OK
echo "Haha" | ./to_nsq --topic t1 --nsqd-tcp-address 127.0.0.1:4150 --rate 1000
curl 'http://localhost:4151/ping'
## see OK
sudo fallocate -l 80G penuh80g
df
## see 0 available space left
curl 'http://localhost:4151/ping'
## see OK
echo "Haha" | ./to_nsq --topic t1 --nsqd-tcp-address 127.0.0.1:4150 --rate 1000
curl 'http://localhost:4151/ping'
## still OK
echo "Hoho" | ./to_nsq --topic t2 --nsqd-tcp-address 127.0.0.1:4150 --rate 1000
curl 'http://localhost:4151/ping'
## now NOK
echo "Haha" | ./to_nsq --topic t1 --nsqd-tcp-address 127.0.0.1:4150 --rate 1000
## sending to existing topic reset the error status -- now OK
df
## see 0 available space left
The text was updated successfully, but these errors were encountered:
Hey @dodysw2, apologies for the delay in responding here.
This behavior can likely be explained by the sync behavior of the underlying diskqueue. For an existing topic, where the underlying diskqueue files have already been created, a single write of a ~4 byte message isn't going to force a sync to the filesystem, which nsqd then interprets as a successful write.
I suspect if you configure nsqd with --sync-every=1 then you'll see the behavior you expect.
There's probably some improvement to be made here, but the "healthiness" indicator in nsqd isn't intended to be incredibly sophisticated. There are so many different failure modes that I'm not convinced it's worth the effort.
NOTE: in your example debugging steps the --rate parameter to to_nsq doesn't actually send 1000 messages, it just rate limits the messages on stdin to 1000 😁
Hello,
I noticed nsqd correctly marked the health state to "not ok" (e.g. via /ping end point) when disk is full and a message is published. However, it did not do this when a message is published to existing topic -- even worse, it reset the state to healthy/ok.
Steps to replicate:
The text was updated successfully, but these errors were encountered: