When opening a script file with invalid encoding (for example http://ventero.de/temp/encoding.user.js - which doesn't contain valid UTF-8), Greasemonkey throws two errors:
Error: Component returned failure code: 0x8050000e (NS_ERROR_ILLEGAL_INPUT) [nsIScriptableUnicodeConverter.convertFromByteArray]
Error: this._uri is undefined
and doesn't display the install dialogue or the script's source.
It's probably a good idea to display a message explaining the problem.
I agree that things should fail gracefully. But the linked script does not contain UTF-8.
$ curl -s http://ventero.de/temp/encoding.user.js | hexdump -C | tail -n 3 | head -n 1
00000060 65 72 53 63 72 69 70 74 3d 3d 0a 0a 27 e8 27 3b |erScript==..'.';|
That character between the single quotes is an 0xe8 or in binary 0b11101000. In UTF-8 encoding, a set MSB means that this is not the last byte in the character. A lone 0xe8 byte is not valid UTF-8.
If I load that script (with GM disabled, to see the source directly) then I see a question mark in a box there. If I manually set ISO-8859-1 character encoding (Firefox has guessed UTF-8 without anything from the headers) then I get an 'è'.
This character is correctly encoded to UTF-8 as two bytes: 0xC3 0xA8.
I do still get the same errors from a script that's really delivered in UTF-8 w/out a charset in the headers:
$ curl -v http://arantius.com/misc/gm-test/utf8-encoding-no-header.user.js 2>&1 | grep Content-Type
< Content-Type: text/html
$ curl -s http://arantius.com/misc/gm-test/utf8-encoding-no-header.user.js | hexdump -C | tail -n 3 | head -n 1
00000060 65 72 53 63 72 69 70 74 3d 3d 0a 0a 27 c3 a8 27 |erScript==..'..'|
In this case, interestingly enough, Firefox guesses ISO even though the content is UTF-8.
Yeah, sorry, I was experimenting a bit, so I've now uploaded a new version and edited my post. The script now indeed contains valid ISO-8859-1.
It's a bit weird though that for you Firefox guesses UTF-8 as content-type, while it (correctly, especially when following the RFC) guesses ISO-8859-1 for me. If I disable GM and open to the script, I see the è just fine.
But even then, GM still throws the error.
$ for x in iso no utf8; do echo -en "$x\t"; curl -sv "http://arantius.com/misc/gm-test/utf8-encoding-$x-header.user.js" 2>&1 | grep Content-Type; done
iso < Content-Type: text/plain; charset=iso-8859-1
no < Content-Type: text/plain
utf8 < Content-Type: text/plain; charset=utf-8
If I disable greasemonkey, and navigate to each as a page, then inspect View>Character Encoding, the first two both choose ISO-8859-1, and display incorrectly, the last displays correctly. In all cases, if I enable Greasemonkey and install the script, they work as the actual delivered content is UTF-8. This is as I believe it should be; it's easy for script authors to control the content of hosted files, and often harder for them to get the HTTP headers set correctly.
Now: if I put ISO-8859 encoding in the file (for all six test cases see http://arantius.com/misc/gm-test/ ): Firefox makes the same guesses, which work for the first two cases (guess is right), but when I explicitly say (in headers) that the content is UTF-8, but the body contains ISO-8859, then Firefox fails to display correctly.
In all file-contains-ISO cases, Greasemonkey dies during the install. This needs to be fixed.
Cleanly display a message when downloading a non UTF-8 script.