Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning when document encoding may create issues #134

Closed
wants to merge 2 commits into from

Conversation

feross
Copy link
Owner

@feross feross commented Sep 14, 2016

Fixes: #133

Print the following warning if the document encoding may create issues
for buffer:

The character encoding of the HTML document was not declared as UTF-8.
Therefore, the buffer package may behave incorrectly in some
situations. Consider adding <meta charset="utf-8"> to the <head>
section of the HTML document.

Print the following warning if the document encoding may create issues
for `buffer`:

"The character encoding of the HTML document was not declared as UTF-8.
Therefore, the `buffer` package may behave incorrectly in some
situations. Consider adding <meta charset="utf-8"> to the <head>
section of the HTML document."
@jessetane
Copy link
Collaborator

It's not so much that Buffer requires the document's character encoding to be utf8, just that the source file encoding matches the document's.

@feross
Copy link
Owner Author

feross commented Sep 14, 2016

@jessetane Can you suggest an improved warning message?

@jessetane
Copy link
Collaborator

This might be tricky, since I'm not sure it's possible to reliably detect arbitrary encoding mismatches, the test here just happens to work for this particular case... I'll think about it.

@jessetane
Copy link
Collaborator

Ok so, there is document.characterSet, which on my machine at least defaults to 'windows-1252' (not exactly latin-1 after all!). It's unclear to me though whether it should be Buffer's job to look at this though?

@feross
Copy link
Owner Author

feross commented Sep 14, 2016

The goal of this package is to mimic the node.js buffer API exactly. So if the encoding is set in a way that will make it work differently than node.js then I think it's fair to warn the user.

The question for me is: does the encoding have to be UTF-8, or are there other encodings that are okay? i.e. does the 'ト'.length !== 1 test pass for other encodings and is that sufficient to guarantee that we'll get correct behavior?

At the very least, it seems that if 'ト'.length !== 1 is true, then we have a problem and should warn the user, right?

@jessetane
Copy link
Collaborator

Buffer will function properly as long as no characters that compose its source are misinterpreted due to an encoding mismatch. In #133 the issue was related to the fact that the string literal was misinterpreted, not Buffer itself. Buffer can likely be served safely in a wide range of formats including ones that don't even support Unicode, such as 'windows-1252'.

I think in general the rule is just that you just have to know what format your data is in, and if you send that data to anyone else, you must be explicit about it - no exceptions!

@feross
Copy link
Owner Author

feross commented Sep 14, 2016

Buffer can likely be served safely in a wide range of formats including ones that don't even support Unicode, such as 'windows-1252'.

Gotcha. Then it sounds like this check doesn't belong inside this package.

@feross feross closed this Sep 14, 2016
@feross feross deleted the add-warning branch September 14, 2016 19:34
@jessetane
Copy link
Collaborator

Yeah, it should be the browser's job, since only it can know for sure that it took a guess at the encoding. Here's what Firefox says if you miss the meta tag / http header:

The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol.

@dcousens
Copy link
Collaborator

dcousens commented Sep 15, 2016

If you are using utf-8 as your encoding, and your document isn't utf-8... you're probably going to have a bad time.
And since this repository is fundamental to most of JS cryptography... I'm not sure if we should just leave it to the vendors? I dunno.
I still think the warning has merit, personally. But I also agree that my browser let me down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants