New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash gnatsd #63

Closed
ondrap opened this Issue Nov 16, 2014 · 2 comments

Comments

Projects
None yet
2 participants
@ondrap

ondrap commented Nov 16, 2014

I tried to create a Haskell NATS binding and made gnatsd crash quite easily - I happened to send a command in several TCP packets. TCP is a stream oriented protocol, so that should be perfectly correct. However, I can crash NATS quite easily. This program causes parse errors:

#!/usr/bin/python
import socket
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 4222))
print s.recv(1024)
s.send('CONNECT ')
time.sleep(0.1)
s.send('{"verbose":true,"ssl_required":false,"user":"test","pedantic":true,"pass":"password"}')
time.sleep(0.1)
s.send('\r\n')
print s.recv(1024)

And this one crashes gnatsd:

#!/usr/bin/python
import socket
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 4222))
print s.recv(1024)
s.send('CONNECT {"verbose":true,"ssl_required":false,"user":"test","pedantic":true,"pass":"password"}')
time.sleep(0.1)
s.send('\r\n')
print s.recv(1024)

BTW: the protocol design isn't nice. The INBOXes should not be assigned randomly, the responses should be somehow tagged (i.e. IMAP style). The cluster behaviour is undocumented; I guess there are no guarantees as to what would happen on crashes/network partitions etc.

@derekcollison

This comment has been minimized.

Show comment
Hide comment
@derekcollison

derekcollison Nov 16, 2014

Member

First, thanks for the feedback. NATS strives to be extremely performant but also needs to be resilient and always available. I will take a look at the crash example above. Could you give me a bit more information on how you are running gnatsd such that I have all of the information?

I also agree, and is on my TODO list, to document both the client protocol and the cluster protocol. NATS by design is a fire and forget system. It does guarantee at most once delivery, and in order delivery per publishing client connection. However, do note that on systems that do claim some form of guarantee, its best to look at what level that guarantee really runs out. Especially around persistence, exactly once delivery semantics, etc. I spent much of my career designing and building messaging systems that have those guarantees, and in turn developed many systems utilizing some of those features. For me, I found that depending on these guarantees was a bad pattern in distributed system design, hence NATS was born.

As far as INBOXes, I hear you and agree that it seems like they should be directed. However, in NATS, and any messaging system IMO, you should not assume how any message may be used in the future. For that reason, responses are not special, they are just unique, but can be matched at any time by wildcards such as ">" or "INBOX.>". This allows all of the system and its messages, requests and responses to be available if needed. I did this specifically in response to a system I had designed in the past where you could see requests but responses were directed and there was no easy way to see them (note I was trying to diagnose a bug in a system and wanted to see all the traffic).

Member

derekcollison commented Nov 16, 2014

First, thanks for the feedback. NATS strives to be extremely performant but also needs to be resilient and always available. I will take a look at the crash example above. Could you give me a bit more information on how you are running gnatsd such that I have all of the information?

I also agree, and is on my TODO list, to document both the client protocol and the cluster protocol. NATS by design is a fire and forget system. It does guarantee at most once delivery, and in order delivery per publishing client connection. However, do note that on systems that do claim some form of guarantee, its best to look at what level that guarantee really runs out. Especially around persistence, exactly once delivery semantics, etc. I spent much of my career designing and building messaging systems that have those guarantees, and in turn developed many systems utilizing some of those features. For me, I found that depending on these guarantees was a bad pattern in distributed system design, hence NATS was born.

As far as INBOXes, I hear you and agree that it seems like they should be directed. However, in NATS, and any messaging system IMO, you should not assume how any message may be used in the future. For that reason, responses are not special, they are just unique, but can be matched at any time by wildcards such as ">" or "INBOX.>". This allows all of the system and its messages, requests and responses to be available if needed. I did this specifically in response to a system I had designed in the past where you could see requests but responses were directed and there was no easy way to see them (note I was trying to diagnose a bug in a system and wanted to see all the traffic).

@derekcollison

This comment has been minimized.

Show comment
Hide comment
@derekcollison

derekcollison Nov 16, 2014

Member

So I have recreated your python test in the testing framework and see the crash. Most of the protocol is very resilient to cross packet completions of a control line and have specific tests for them: https://github.com/apcera/gnatsd/blob/master/server/split_test.go

However, the CONNECT one is not, I will fix that now. Thanks again for the report.

Member

derekcollison commented Nov 16, 2014

So I have recreated your python test in the testing framework and see the crash. Most of the protocol is very resilient to cross packet completions of a control line and have specific tests for them: https://github.com/apcera/gnatsd/blob/master/server/split_test.go

However, the CONNECT one is not, I will fix that now. Thanks again for the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment