Added tests for UTF-8 validity of option values. #150

themue · 2014-06-23T16:29:36Z

So far tests of the API and the set command checked only for ASCII values. Usage of UTF-8 should lead to no probems. The usage of UTF runes in the test value verifies this.

mjs · 2014-06-24T00:54:12Z

state/apiserver/client/client_test.go

@@ -83,19 +83,21 @@ func (s *clientSuite) TestCompatibleSettingsParsing(c *gc.C) {
 	c.Assert(err, gc.ErrorMatches, `unknown option "yummy"`)
 }

+var setTestValue = "a value with spaces\nand newline\nand UTF-8 characters: äöüéàô 😄👍"


s/var/const/

mjs · 2014-06-24T01:07:10Z

Question one: what ultimately ends up decoding the UTF-8 strings? Does Juju just pass these straight through and it's up to a charm to deal with it?

Question two: should Juju be rejecting invalid UTF-8 at the client or API level to prevent bad strings from propagating?

jameinel · 2014-06-24T05:28:27Z

cmd/juju/set_test.go

@@ -37,6 +37,8 @@ func (s *SetSuite) SetUpTest(c *gc.C) {
 	setupConfigFile(c, s.dir)
 }

+var setTestValue = "a value with spaces\nand newline\nand UTF-8 characters: äöüéàô 😄👍"
+


I personally prefer ascii only source text (I realize go does specify source is UTF-8), and using unicode escapes to do non-standard characters.
I believe the last time this came up we went with "lets use ascii for our source" when someone wanted a variable named Façade. I'd probably be stricter about variables/functions rather than contents of strings, though.
So while I'd still recommend unicode escapes, I wouldn't require them.

When adding stuf like the umlauts it has been ok for me, because I type them almost every day. But I felt somehow strange with the emojis too. So yes, I'll change it into unicode escapes.

jameinel · 2014-06-24T05:33:57Z

Q1: when reading the files or commandline we just use a go "string" object which doesn't really have an encoding. However, we pass them to json.Marshal(string) when we want to send them to the API, and that assumes UTF-8 encoded strings.
Q2: I do think we want a test that uses non-UTF-8 (\xff\xff or somesuch) and asserts that it fails.

There is http://golang.org/pkg/unicode/utf8/ utf8.Valid([]byte) and utf8.Valid(string) that we can use to assert that we are getting proper UTF-8 strings when we want to, rather than assuming it as a side effect.
I don't really want this change to bloat up into "lets go around and validate all of our input", this was a bit more about "we should be clear that we are currently supporting UTF-8 and have tests for it so that we don't actually regress support".
I believe back when clients accessed the DB directly, we actually supported plain 8-bit strings with no encoding. But since we are going via a JSON layer, I think we now would have to play base64 encoding tricks if we wanted to support that.
Since it was never clearly stated or tested we can't say for sure what people were doing in the past. So now we can at least improve things incrementally and say "these things are UTF-8 encoded strings, if you need more than that, come talk to us and we can work out how to support binary data properly".

So yeah, I'd like to see a test that non-UTF-8 gets rejected, but otherwise this is better than what we have.

…tf-encoding

themue · 2014-06-24T17:34:10Z

Just tested a bit more with the JSON encoding. Sadly the InvalidUTF8Error isn't returned anymore in case of data containing invalid UTF-8 sequences. Instead all sequences are marshalled to the string "\ufffd". When unmarshalling the string containing the unicode escape it is converted into the valid unicode sequence U+FFFD (as bytes: [239 191 189]).

So to ensure valid transfer of data via the API a change should be done inside the jsonrpc package. Instead of using direct marshalling to writers, like with the websocket.JSON codec, the data can be marshalled first and checked for the string "\ufffd" without another escaping backslash and return an error in that case.

The alternative would be to control possible string arguments, also in maps or slices, in the client functions of the API with utf8.ValidString(). But the risk of missing this later when adding new functions is high.

mjs · 2014-06-24T21:43:16Z

Could we at least validate at the edges? "juju set" should error immediately if given invalid UTF-8. Then it doesn't matter what the API does with invalid UTF-8 as invalid UTF-8 won't be introduced in the first place (not by juju set at least).

Aside: U+FFFD is the unicode "replacement character" which represents unknown/invalid characters. This seems like a reasonable way for the API to handle such data.

themue · 2014-06-25T07:27:59Z

Yes, we discovered a possible bug that never has occured because nobody sends invalid UTF-8 strings. Only the idea of reading the values for options out of files let us think about it. So it seems good enough to do it on a higher level like the command. Alternatively it could be done at charm.Config.ParseSettingsStrings(), so that also non-command settings are validated.

jameinel · 2014-06-25T08:12:27Z

I think we could potentially put the check in the schema package, and have schema say "string objects must be valid UTF-8 strings".
It might also show up where we thought we were handling strings as byte arrays but weren't actually treating the data correctly.

I would be happy enough just having a utf.IsValid() check in 'cmd/juju/set*' and not bloat this fix particularly further, though.

…tf-encoding

mjs · 2014-06-25T08:41:09Z

+1 to the simple check in cmd/juju/set... That's all I was suggesting.

…havior. After a discussion this morning we accept this. It never has been an issue in real-life and only occurred during a test trying to transport binary data as string.

themue · 2014-06-25T12:40:18Z

$$merge$$

jujubot · 2014-06-25T12:40:48Z

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Added tests for UTF-8 validity of option values. So far tests of the API and the set command checked only for ASCII values. Usage of UTF-8 should lead to no probems. The usage of UTF runes in the test value verifies this.

Added tests for UTF-8 validity of option values.

0258da4

mjs reviewed Jun 24, 2014
View reviewed changes

jameinel reviewed Jun 24, 2014
View reviewed changes

Frank Mueller added 3 commits June 24, 2014 10:44

Merge branch 'master' of https://github.com/juju/juju into set-test-u…

499182f

…tf-encoding

Improved testing of valid and invalid UTF-8 encoded values.

a4799d6

Merge branch 'master' of https://github.com/juju/juju into set-test-u…

a666819

…tf-encoding

Merge branch 'master' of https://github.com/juju/juju into set-test-u…

4298b2a

…tf-encoding

Frank Mueller added 2 commits June 25, 2014 11:56

Made the tests more explicit and added comments which describe the be…

2351cc8

…havior. After a discussion this morning we accept this. It never has been an issue in real-life and only occurred during a test trying to transport binary data as string.

Added value check to set command.

a69c136

jujubot merged commit 35d38ca into juju:master Jun 25, 2014

themue deleted the set-test-utf-encoding branch June 25, 2014 14:18

barrettj12 mentioned this pull request May 4, 2022

[JUJU-867] Add ConfigCommandBase and refactor juju config to use it #13983

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added tests for UTF-8 validity of option values. #150

Added tests for UTF-8 validity of option values. #150

themue commented Jun 23, 2014

mjs Jun 24, 2014

mjs commented Jun 24, 2014

jameinel Jun 24, 2014

themue Jun 24, 2014

jameinel commented Jun 24, 2014

themue commented Jun 24, 2014

mjs commented Jun 24, 2014

themue commented Jun 25, 2014

jameinel commented Jun 25, 2014

mjs commented Jun 25, 2014

themue commented Jun 25, 2014

jujubot commented Jun 25, 2014

Added tests for UTF-8 validity of option values. #150

Added tests for UTF-8 validity of option values. #150

Conversation

themue commented Jun 23, 2014

mjs Jun 24, 2014

Choose a reason for hiding this comment

mjs commented Jun 24, 2014

jameinel Jun 24, 2014

Choose a reason for hiding this comment

themue Jun 24, 2014

Choose a reason for hiding this comment

jameinel commented Jun 24, 2014

themue commented Jun 24, 2014

mjs commented Jun 24, 2014

themue commented Jun 25, 2014

jameinel commented Jun 25, 2014

mjs commented Jun 25, 2014

themue commented Jun 25, 2014

jujubot commented Jun 25, 2014