Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Crash gnatsd #63
I tried to create a Haskell NATS binding and made gnatsd crash quite easily - I happened to send a command in several TCP packets. TCP is a stream oriented protocol, so that should be perfectly correct. However, I can crash NATS quite easily. This program causes parse errors:
And this one crashes gnatsd:
BTW: the protocol design isn't nice. The INBOXes should not be assigned randomly, the responses should be somehow tagged (i.e. IMAP style). The cluster behaviour is undocumented; I guess there are no guarantees as to what would happen on crashes/network partitions etc.
First, thanks for the feedback. NATS strives to be extremely performant but also needs to be resilient and always available. I will take a look at the crash example above. Could you give me a bit more information on how you are running gnatsd such that I have all of the information?
I also agree, and is on my TODO list, to document both the client protocol and the cluster protocol. NATS by design is a fire and forget system. It does guarantee at most once delivery, and in order delivery per publishing client connection. However, do note that on systems that do claim some form of guarantee, its best to look at what level that guarantee really runs out. Especially around persistence, exactly once delivery semantics, etc. I spent much of my career designing and building messaging systems that have those guarantees, and in turn developed many systems utilizing some of those features. For me, I found that depending on these guarantees was a bad pattern in distributed system design, hence NATS was born.
As far as INBOXes, I hear you and agree that it seems like they should be directed. However, in NATS, and any messaging system IMO, you should not assume how any message may be used in the future. For that reason, responses are not special, they are just unique, but can be matched at any time by wildcards such as ">" or "INBOX.>". This allows all of the system and its messages, requests and responses to be available if needed. I did this specifically in response to a system I had designed in the past where you could see requests but responses were directed and there was no easy way to see them (note I was trying to diagnose a bug in a system and wanted to see all the traffic).
So I have recreated your python test in the testing framework and see the crash. Most of the protocol is very resilient to cross packet completions of a control line and have specific tests for them: https://github.com/apcera/gnatsd/blob/master/server/split_test.go
However, the CONNECT one is not, I will fix that now. Thanks again for the report.