-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: heartbeat for producers #131
Conversation
interesting, this is definitely something that should be added, thanks.. It completely slipped our mind as we dont actually produce messages over the wire protocol ourselves (we publish over the HTTP interfaces) |
@jehiah ready for review |
LGTM |
Without a way to turn off the heartbeat for producers... Any client Not using threads or a event loop will miss the heartbeat and will now start disconnecting if the producer blocks to collect it's data. This make it's impossible to write simple producers like a simple log watcher that pushes into NSQ over TCP. Or embedding it in a fastCGI script to log it's request stats. Now I'm not able to block on a socket waiting for a web request without nsq disconnecting my producer's connection. Not every producer is dedicating a thread or in a event loop. There is a reason Redis, memcached, RabbitMQ or Postgres no not force a heartbeat that must be responded to! We should be able to write very simple "while STDIN publish into NSQ" script and with this patch it's not possible anymore without breaking out the threads or Event loops. :( @mreiferson @mreiferson @dustismo @jehiah Thoughts? Ideas? Thanks guys for your time, -Dan |
Hi @dmarkham - I understand your perspective. We consider the TCP protocol to be the more sophisticated approach to publishing and made this change to address an inconsistency with consumers. Would it sufficiently meet your needs to have "simple" publishers like the ones you mention publish via HTTP? To put this in perspective, at bitly, we exclusively use the HTTP interface for publishing and our NSQ cluster hits peaks of over 80k messages per second. Thanks for the feedback! |
First off thank you for responding, For consumers a heartbeat is great idea. It is their job to listen to things streaming from NSQ. I agree I could use the HTTP interface. I'm rather shocked/impressed this is not more expensive than you are willing to pay at that scale. You have a very clean TCP spec with low overhead. With that said I will start benching my setup with the HTTP interface before drawing any real conclusions. So I will dig in and see what the HTTP keep-alive default/max is configured to be on the nsqd servers. I'm highly motivated to get something I enjoy using to work well. So even though my examples were talking about a simple way to publish into NSQ. I should have given more details into my concerns. I have been working to see if I can replace most logfile writing with dropping the messages directly into NSQ. So I was pretty sure I needed a persistant low latency connection to NSQ. Whats your thoughts on this. Or would you still recommend spooling things to logs before pushing into NSQ anyway? So once I felt like it meet my latency benchmark to write directly into NSQ vs logfiles I started working on the persistant part. This is where I hit my issues with the TCP interface. Something felt wrong when I can hold open long lasting tcp connections to all of my other backend services yet I can not publish data without maintaining a heartbeat with NSQ. I'm all for consistency but I think your just being consistent with the wrong thing (the consumer vs other publishing models). I would love you guys to give this one more think over taking into consideration other publishing models people are currently programming against. Publishing data in to things like Redis/Memcached/RabbitMQ**/RDBMS.. The things in this list and things I can think of really only use heartbeats for cluster health... not client health and definitely not publisher health, I also find that interesting. Ok I'm off to start testing the http interface for my usage. But it mentally burns me the tcp interface is going unused in my stack because of a heartbeat. Thanks for your time, -Dan **RabbitMQ/amqp you can ask for a heartbeat but it's not forced |
Hey Dan, I'll wait for @mreiferson to respond to the meat of your post, I just wanted to add some anecdata re:
We also use the HTTP interface exclusively, if only because we had battle-tested client libs ready. We use a rather unoptimized libcurl implementation (PHP binding) on the producer side, and nsq_to_http on the consumer side. The costs of bandwidth are obvious, but the "resource" costs (CPU time, sockets' memory usage, etc.) will present themselves on the clients' ends — and be highly dependent on the RTT. We average around 0.7ms RTT on the consumer leg with virtually no retransmissions (<0.01%), and our producers are local to nsqd. That makes the impact negligible, but in a different architecture/scenario it could certainly be another story entirely. For what it's worth here in the shadow of @mreiferson's ❗ 80k msgs/second.. ;) We process around 600 million messages a day this way, synchronously, with an average payload (that is, Content-length) of 357 bytes. All that said, would love if you share your benchmarks! [and thanks for taking the time to write all of this, already] |
My testing scripts, notes, numbers, versions, etc. At the very least I now know what rate with HTTP I'm working towards. I'm sure I can improve it some on the client side (LWP replacement).
@michaelhood You both have me beat on messages per day. I'm getting pretty close to 100M/day for my planned use of NSQ. I'm mostly concerned about the latency/cost per message. My goal is to spend about the same amount of time to hand off a message to NSQ as to write it to memcahcd/redis/logfile. Are you writing your messages to disk first? Or directly into NSQ? Thanks for any thoughts. I'm just a heartbeat away (pun entended) from this working out perfectly. |
Seems to me you can still do a simple synchronous tcp connector with heartbeats. Just check if any result frame is a heartbeat. Something like:
Would that work for your purpose? |
@dmarkham I'll take a look at your code as soon as I have a few minutes, but I expect it's not the bytes but rather the relatively slow LWP. If you need to I tried to make a quick histogram from a production box just now, but nsq is too fast for the precision output by histogram.py (from bitly/data_hacks) @jehiah ;) I'll put together some stuff and also take a look at your code as soon as I have a few.
|
@dustismo I've not had a chance to look these changes over in detail, but unless I'm missing something, could he not also receive a heartbeat at a time not-immediately-after sending a message (and therefore not polling for a reply)? edited for clarity |
@dustismo I think your idea could get me out of sync with the server. If i'm sent a heartbeat frame before I start a PUB when I go to publish my data the server would be expecting a NOP not a PUB. So I'm guessing I'm not able to: The server should be expecting a NOP after a heartbeat not a PUB And yes LWP can't be the best tool for this. I have already start poking around with libCurl and friends WWW::Curl is not playing nice ATM.. By time I strip down these http libs and remove all the unneeded headers it's going to look SO very close the the TCP interface with no heartbeat and more difficult to parse. Something to be said for the fast TCP protocol you guys have built it's super simple to parse and easy to write a fast lib against. I will have to build a perl lib regardless. Not everything can be in golang overnight! thanks again for your time looking at my messy prototype code. |
@dmarkham thanks for taking the time to put together those benchmarks. As @michaelhood pointed out, it's probably not so much the size difference (both fit into a single packet aka single syscall) but rather the simple fact that it's HTTP (client lib + golang's stdlib server). As a simple test to evaluate the client side cost, you could try just writing a pre-formatted HTTP request to a raw (persistent) TCP socket connected to To answer your other question (for messages going through NSQ) we don't first write them to disk. Like you're evaluating now as a potential solution, our applications that produce messages write to a local Ultimately, I think you're headed down the correct path in terms of topology. Relatedly, this blog post might be useful http://word.bitly.com/post/38385370762/spray-some-nsq-on-it. re: @dustismo's suggestion - the heartbeat doesn't expect a responded = False
send(pub_msg)
while True:
res = read()
if not is_heartbeat(res):
return res
if not responded:
send(nop)
responded = True Regardless, the real problem with this for you (as I understand it) is that you're blocked sitting in an Taking a step back, we try really hard for NSQ to "just work". It means that some of our opinions about how things should work are baked into the system. Flexibility is great but there is a fine line we try to be conscious of (you can't please everyone). I'm going to consider your perspective and requirements (which I understand to be stateless, low-latency, low-overhead publishing) and think through what this means for NSQ. I'm certainly open to suggestions as to the specific changes that might resolve this for all interested parties. CC @jehiah for his thoughts |
This is great I think everyone understands my concernes. It also sounds like everyone is open to at least reviewing a patch. I have some catching up to do in the code. Unless someone beats me to it. Or has a better Idea. I'm going to try and put a patch together for review. My first thought was to add:
|
@dmarkham - if you're gonna take a swing... we built some flexibility into the |
noted I have a version almost working with the HEARTBEAT command i'll move it to the IDENTIFY json blob before submitting it for further review. The clients conn's readDeadline being managed in a different place than the heartbeat Ticker makes it interesting to manage them together after the messagePump has started up. I'll have something creative to look at soon. |
Ok here is my swing at it. Adding a https://github.com/dmarkham/nsq/tree/heartbeat_configure I'm highly interested in observations I could improve on. |
@dmarkham nice, thanks :) would you mind opening a pull request so we can review the change? |
@michaelhood I like the use of awk multiplier for historgram.py. Slick trick (and it's always cool to see data_hacks in use). I've been silent on this thread but i concur with @mreiferson's comment that heartbeat configuration would be best served in the |
allow login user to ACK message
Would be nice to have a heartbeat for the producers. would make connection liveness easier to keep track of for slow producers.