GM should fail gracefully when encountering scripts with invalid encoding #1588

Closed
Ventero opened this Issue Jul 10, 2012 · 4 comments

Projects

None yet

2 participants

@Ventero
Ventero commented Jul 10, 2012

When opening a script file with invalid encoding (for example http://ventero.de/temp/encoding.user.js - which doesn't contain valid UTF-8), Greasemonkey throws two errors:

Error: Component returned failure code: 0x8050000e (NS_ERROR_ILLEGAL_INPUT) [nsIScriptableUnicodeConverter.convertFromByteArray]
Source: resource://greasemonkey/remoteScript.js
Line: 104

and

Error: this._uri is undefined
Source: resource://greasemonkey/remoteScript.js
Line: 188

and doesn't display the install dialogue or the script's source.
It's probably a good idea to display a message explaining the problem.

@arantius
Collaborator

I agree that things should fail gracefully. But the linked script does not contain UTF-8.

$ curl -s http://ventero.de/temp/encoding.user.js | hexdump -C | tail -n 3 | head -n 1
00000060  65 72 53 63 72 69 70 74  3d 3d 0a 0a 27 e8 27 3b  |erScript==..'.';|

That character between the single quotes is an 0xe8 or in binary 0b11101000. In UTF-8 encoding, a set MSB means that this is not the last byte in the character. A lone 0xe8 byte is not valid UTF-8.

If I load that script (with GM disabled, to see the source directly) then I see a question mark in a box there. If I manually set ISO-8859-1 character encoding (Firefox has guessed UTF-8 without anything from the headers) then I get an 'è'.

http://www.fileformat.info/info/unicode/char/e8/index.htm
This character is correctly encoded to UTF-8 as two bytes: 0xC3 0xA8.

I do still get the same errors from a script that's really delivered in UTF-8 w/out a charset in the headers:

$ curl -v http://arantius.com/misc/gm-test/utf8-encoding-no-header.user.js 2>&1 | grep Content-Type
< Content-Type: text/html
$ curl -s http://arantius.com/misc/gm-test/utf8-encoding-no-header.user.js | hexdump -C | tail -n 3 | head -n 1
00000060  65 72 53 63 72 69 70 74  3d 3d 0a 0a 27 c3 a8 27  |erScript==..'..'|

In this case, interestingly enough, Firefox guesses ISO even though the content is UTF-8.

@Ventero
Ventero commented Jul 10, 2012

Yeah, sorry, I was experimenting a bit, so I've now uploaded a new version and edited my post. The script now indeed contains valid ISO-8859-1.

It's a bit weird though that for you Firefox guesses UTF-8 as content-type, while it (correctly, especially when following the RFC) guesses ISO-8859-1 for me. If I disable GM and open to the script, I see the è just fine.

But even then, GM still throws the error.

@Ventero
Ventero commented Jul 10, 2012

Interestingly, for me a script which contains valid UTF-8 and is served without charset (but with "Content-Type: text/javascript") works just fine (http://ventero.de/temp/encoding-utf8.user.js). Your example is sent with "Content-Type: text/html", so that GM doesn't try to install it anyway.

@arantius
Collaborator

Greasemonkey: 1.0beta5
Firefox: 14.0

Test scripts:

$ for x in iso no utf8; do echo -en "$x\t"; curl -sv "http://arantius.com/misc/gm-test/utf8-encoding-$x-header.user.js" 2>&1 | grep Content-Type; done
iso     < Content-Type: text/plain; charset=iso-8859-1
no      < Content-Type: text/plain
utf8    < Content-Type: text/plain; charset=utf-8

If I disable greasemonkey, and navigate to each as a page, then inspect View>Character Encoding, the first two both choose ISO-8859-1, and display incorrectly, the last displays correctly. In all cases, if I enable Greasemonkey and install the script, they work as the actual delivered content is UTF-8. This is as I believe it should be; it's easy for script authors to control the content of hosted files, and often harder for them to get the HTTP headers set correctly.

Now: if I put ISO-8859 encoding in the file (for all six test cases see http://arantius.com/misc/gm-test/ ): Firefox makes the same guesses, which work for the first two cases (guess is right), but when I explicitly say (in headers) that the content is UTF-8, but the body contains ISO-8859, then Firefox fails to display correctly.

In all file-contains-ISO cases, Greasemonkey dies during the install. This needs to be fixed.

@arantius arantius closed this in f934d77 Jul 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment