Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad decoding for languages other than English #165

Closed
Cyperwu opened this issue Oct 18, 2017 · 13 comments
Closed

Bad decoding for languages other than English #165

Cyperwu opened this issue Oct 18, 2017 · 13 comments

Comments

@Cyperwu
Copy link

Cyperwu commented Oct 18, 2017

Hi!

In line 767, var buf = client.inbound.toString('binary', 0, MAX_CONTROL_LINE_SIZE);
encoding error will occur when using other languages.


I tried change 'binary' to 'utf8' in line 767 and add a function getting binary length from utf8 string:

function byteLength(str) {
  // returns the byte length of an utf8 string
  var s = str.length;
  for (var i=str.length-1; i>=0; i--) {
    var code = str.charCodeAt(i);
    if (code > 0x7f && code <= 0x7ff) s++;
    else if (code > 0x7ff && code <= 0xffff) s+=2;
    if (code >= 0xDC00 && code <= 0xDFFF) i--; //trail surrogate
  }
  return s;
}

Then change line 936 var psize = m[0].length; to var psize = byteLength(m[0]), my problem seems solved.

@aricart
Copy link
Member

aricart commented Oct 18, 2017

@Cyperwu Thanks for the report. Could you be so kind as to send us a sample input (subject) and payload that we can take a look at? Thanks!

@aricart
Copy link
Member

aricart commented Oct 18, 2017

@Cyperwu I am guessing here that the subject is not ASCII, and likely it should be. While the payload can be anything, the NATS protocol must be ASCII. Can you confirm this is the case?

@Cyperwu
Copy link
Author

Cyperwu commented Oct 19, 2017

@aricart No the subject is not ASCII. But the NATS documentation didn't specify any encoding is a "must use" in subjects.

Oh, Sorry. I've seen it.

@Cyperwu
Copy link
Author

Cyperwu commented Oct 19, 2017

@aricart tried to directly subscribe to a non-alphanumeric subject and received no message.
But in fact if we use wildcard to match the rest part of the subject, even non-alphanumeric subjects are valid.

PUB mainChannel.噢.额 3 44\r\n
Hello\r\n\r\n

SUB mainChannel.> 3\r\n\r\n will still get me messages

@isobit
Copy link
Contributor

isobit commented Oct 20, 2017

Perhaps an issue should be filed with gnatsd requesting support for UTF-8 topic names? If the NATS protocol requires that topics are ASCII-encoded then it seems like this is a non-issue for client libs.

@wallyqs
Copy link
Member

wallyqs commented Oct 20, 2017

A reproducible example which makes it fail might be nice, tried out of curiosity and there does not seem to be an issue receiving the message when using non-ASCII characters in the protocol line either.

asyncio-nats (master) $ python examples/nats-sub ">"
Connected to NATS at 127.0.0.1:4222...
Received a message on 'hello.日本語 ': hi

node-nats (master) $ node examples/node-sub ">" 
Listening on [>]
Received "hi"

ruby-nats (master) $ ruby bin/nats-pub "hello.日本語" hi
Published [hello.日本語] : 'hi'

@Cyperwu
Copy link
Author

Cyperwu commented Oct 23, 2017

@isobit I agree.

@Cyperwu
Copy link
Author

Cyperwu commented Oct 23, 2017

@wallyqs What if you try node examples/node-sub "hello.日本語 " ?

@wallyqs
Copy link
Member

wallyqs commented Oct 23, 2017

@Cyperwu that works for me locally too...

node examples/node-sub "hello.日本語 "
Listening on [hello.日本語 ]
Received "中文"

node examples/node-pub "hello.日本語 " 中文
Published [hello.日本語 ] : "中文"

@Cyperwu
Copy link
Author

Cyperwu commented Oct 23, 2017

@wallyqs In my case it is Received "中文" from "hello.æ�¥æ�¬èª"

@aricart
Copy link
Member

aricart commented Oct 23, 2017

I also get:

> ~/go/src/git...e-nats/examples]$ node-sub "hello.日本語 "
Listening on [hello.日本語 ]
Received "中文"

@Cyperwu what OS are you running in?

@Cyperwu
Copy link
Author

Cyperwu commented Oct 24, 2017

That's odd. My machine is MacOS X El Capitan 10.11.5. @aricart

@aricart
Copy link
Member

aricart commented Feb 8, 2021

v2 doesn't deal with any encoding messages are simply byte arrays.

@aricart aricart closed this as completed Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants