Skip to content

GM should fail gracefully when encountering scripts with invalid encoding #1588

Ventero opened this Issue Jul 10, 2012 · 4 comments

2 participants

Ventero commented Jul 10, 2012

When opening a script file with invalid encoding (for example - which doesn't contain valid UTF-8), Greasemonkey throws two errors:

Error: Component returned failure code: 0x8050000e (NS_ERROR_ILLEGAL_INPUT) [nsIScriptableUnicodeConverter.convertFromByteArray]
Source: resource://greasemonkey/remoteScript.js
Line: 104


Error: this._uri is undefined
Source: resource://greasemonkey/remoteScript.js
Line: 188

and doesn't display the install dialogue or the script's source.
It's probably a good idea to display a message explaining the problem.


I agree that things should fail gracefully. But the linked script does not contain UTF-8.

$ curl -s | hexdump -C | tail -n 3 | head -n 1
00000060  65 72 53 63 72 69 70 74  3d 3d 0a 0a 27 e8 27 3b  |erScript==..'.';|

That character between the single quotes is an 0xe8 or in binary 0b11101000. In UTF-8 encoding, a set MSB means that this is not the last byte in the character. A lone 0xe8 byte is not valid UTF-8.

If I load that script (with GM disabled, to see the source directly) then I see a question mark in a box there. If I manually set ISO-8859-1 character encoding (Firefox has guessed UTF-8 without anything from the headers) then I get an 'è'.
This character is correctly encoded to UTF-8 as two bytes: 0xC3 0xA8.

I do still get the same errors from a script that's really delivered in UTF-8 w/out a charset in the headers:

$ curl -v 2>&1 | grep Content-Type
< Content-Type: text/html
$ curl -s | hexdump -C | tail -n 3 | head -n 1
00000060  65 72 53 63 72 69 70 74  3d 3d 0a 0a 27 c3 a8 27  |erScript==..'..'|

In this case, interestingly enough, Firefox guesses ISO even though the content is UTF-8.

Ventero commented Jul 10, 2012

Yeah, sorry, I was experimenting a bit, so I've now uploaded a new version and edited my post. The script now indeed contains valid ISO-8859-1.

It's a bit weird though that for you Firefox guesses UTF-8 as content-type, while it (correctly, especially when following the RFC) guesses ISO-8859-1 for me. If I disable GM and open to the script, I see the è just fine.

But even then, GM still throws the error.

Ventero commented Jul 10, 2012

Interestingly, for me a script which contains valid UTF-8 and is served without charset (but with "Content-Type: text/javascript") works just fine ( Your example is sent with "Content-Type: text/html", so that GM doesn't try to install it anyway.


Greasemonkey: 1.0beta5
Firefox: 14.0

Test scripts:

$ for x in iso no utf8; do echo -en "$x\t"; curl -sv "$x-header.user.js" 2>&1 | grep Content-Type; done
iso     < Content-Type: text/plain; charset=iso-8859-1
no      < Content-Type: text/plain
utf8    < Content-Type: text/plain; charset=utf-8

If I disable greasemonkey, and navigate to each as a page, then inspect View>Character Encoding, the first two both choose ISO-8859-1, and display incorrectly, the last displays correctly. In all cases, if I enable Greasemonkey and install the script, they work as the actual delivered content is UTF-8. This is as I believe it should be; it's easy for script authors to control the content of hosted files, and often harder for them to get the HTTP headers set correctly.

Now: if I put ISO-8859 encoding in the file (for all six test cases see ): Firefox makes the same guesses, which work for the first two cases (guess is right), but when I explicitly say (in headers) that the content is UTF-8, but the body contains ISO-8859, then Firefox fails to display correctly.

In all file-contains-ISO cases, Greasemonkey dies during the install. This needs to be fixed.

@arantius arantius closed this in f934d77 Jul 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.