Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ’ with '. The ’ character cannot be converted to other charsets #16

Closed
wants to merge 1 commit into from

Conversation

esarafianou
Copy link

@esarafianou esarafianou commented May 3, 2016

I suggest this change due to a problem I've encountered. The UTF8 character (U+2019) can’t be converted to other charsets (ASCII-8BIT in my case) which causes an error like that:

incompatible character encodings: ASCII-8BIT and UTF-8

@coveralls
Copy link

coveralls commented May 3, 2016

Coverage Status

Coverage remained the same at 92.593% when pulling 88c5226 on esarafianou:master into 6377c00 on mathiasbynens:master.

@mathiasbynens
Copy link
Owner

mathiasbynens commented May 3, 2016

This sounds like a problem with your local setup rather than a bug in utf8.js.

Why do you need to convert the source code of your dependencies to ASCII?

@mathiasbynens
Copy link
Owner

mathiasbynens commented May 3, 2016

Does http://stackoverflow.com/a/6916663/96656 help?

@dionyziz
Copy link

dionyziz commented May 3, 2016

@mathiasbynens you're right that the problem is in our particular use case. However, don't you agree that it's good practice to maintain source code files in ASCII? This change solves both.

@mathiasbynens
Copy link
Owner

However, don't you agree that it's good practice to maintain source code files in ASCII?

I strongly disagree. UTF-8 should be used/supported everywhere.

@dionyziz
Copy link

dionyziz commented May 3, 2016

OK.

Then let's discuss the issue we have at hand. We wish to use bettercap to inject Javascript source code into unprotected websites (our attack is documented at https://ruptureit.com). As ASCII is compatible with most existing encodings, it is suitable to encode the Javascript we inject in the same encoding as the target website, so that we don't have to change their encoding headers too.

Do you think a more appropriate solution would be to convert all of our dependencies to ASCII using a tool such as iconv?

@mathiasbynens
Copy link
Owner

mathiasbynens commented May 3, 2016

As ASCII is compatible with most existing encodings, it is suitable to encode the JavaScript we inject in the same encoding as the target website, so that we don't have to change their encoding headers too.

The scenario you describe is not even a problem for utf8.js since it only contains non-ASCII symbols in comments.

Do you think a more appropriate solution would be to convert all of our dependencies to ASCII using a tool such as iconv?

That certainly sounds more realistic than expecting all your dependencies to stick to ASCII-only, IMHO.

I’m curious — what would this PR solve exactly? You don’t seem to be using utf8.js in Rupture.

@esarafianou
Copy link
Author

esarafianou commented May 3, 2016

We do use it. If you run realtime/install_realtime.sh or client/install_client.sh, you 'll find out that npm installs utf8.js

@mathiasbynens
Copy link
Owner

Ah, indeed — it’s an indirect dependency. The dependency chain is as follows: socket.io → engine.io → engine.io-parser → utf8.

Still I don’t fully understand why this is a problem. Even <script src=utf8.js></script> on a non-UTF-8 page shouldn’t cause issues since the actual source code (ignoring the comments) is 100% ASCII-only.

Did http://stackoverflow.com/a/6916663/96656 help at all?

@dionyziz
Copy link

dionyziz commented May 4, 2016

We are including the code itself inline in the injected response in an inline <script></script>. This is preferable in the case of an attack as, first, it limits the number of requests to our service (which makes it less noisy), even though in normal web services it would be bad practice (notice that we are not injecting it in our own website – instead, we are only injecting it maliciously in target HTTP websites that the victim visits, e.g. cnn.com, in order to achieve this attack).

The result is a large string which has been packed with gulp/browserify (switching to babel/webpack soon). So, using <script src="..." /> or the ruby on rails solutions for UTF-8 will not work in this case.

I agree with you that it's more sensible to remove UTF-8 characters completely using iconv. This will make future additional problematic characters disappear (although if they are actually in the code, it could cause problems). Additionally, we should probably ask our packing script to also remove JS comments from the bundle that we inject – no reason to keep them there.

@mathiasbynens
Copy link
Owner

I agree with you that it's more sensible to remove UTF-8 characters completely using iconv. This will make future additional problematic characters disappear (although if they are actually in the code, it could cause problems). Additionally, we should probably ask our packing script to also remove JS comments from the bundle that we inject – no reason to keep them there.

You could consider minifying the bundled script on-the-fly using something like https://github.com/mishoo/UglifyJS2. That not only strips comments but also compresses the payload resulting in better performance. (For utf8.js specifically, just minifying is enough to remove all non-ASCII characters — no iconv needed.)

@dionyziz
Copy link

Sounds like a plan. Thanks for all your help and suggestions. Feel free to close this issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants