Not sure if this is the right approach, but I wanted to discussion how to handle encodings.
As I understand, the string payload should be encoding unaware. The length should be the ascii 8bit byte size of the string.
Use 8bit bytesize for length
Allow encoding to be specified
Default to 'internal' encoding
Merge branch 'master' into encoding
Only set default internal encoding on 1.9
Using bytesize is definitely correct according to TNetstring spec, being able to set the encoding while parsing is probably handy for nested objects. However, if the data string has a different encoding than binary, should we respect that?
Ah, so for encode we should be calling obj.encode('binary')?
Oops. I misunderstood this when I read it the first time; I should have looked at the code. #4 is a duplicate of this. I think UTF-8 should be assumed for effortless JSON API compatibility, and another tag character (perhaps ' or `) should be used for raw bytestrings. And yes, bytesize is a must.
Also if you know a file is UTF-8 or expect it to be, it could be streamed into a string by first getting the size from the filesystem, since the filesystem reports sizes in bytes. I don't think UTF-8 strings are unwieldily at all, so long as that's what we want and we get the bytesize thing out of the way.
@benatkin what do you mean by "supporting UTF-8"? str.force_encoding("UTF-8") before returning it?
@rkh Yes, that's what I mean. Basically require that all implementations do that if indications are that it's UTF-8 (having a standard tag rather than a binary tag). The file would need to be read in as a bytestring, though.
I'm 👎 to that.
Netstrings are a lower level transport. It should be the responsibility of the application to decide which encoding is being used.
Calling force_encoding from UTF-8 to something else is a smell.
I don't like it either. People should not have to set the encoding more than once or twice.
I may have fucked up the assumptions about Encoding.default_internal. Think they are only intended to be used for FS related encodings.
Oh man, how I hate to deal with that stuff.
I think we just want to change encoding = 'internal' to encoding = nil' and don't any sort of encoding unless its explicit.
encoding = 'internal'
encoding = nil'