Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Check: Utf8EncodingCheck #5

Closed
maxvetrenko opened this issue May 26, 2014 · 1 comment
Closed

New Check: Utf8EncodingCheck #5

maxvetrenko opened this issue May 26, 2014 · 1 comment

Comments

@maxvetrenko
Copy link
Owner

The file has to be UTF-8 encoded. http://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s2.2-file-encoding

@romani
Copy link

romani commented Jun 7, 2014

Byte order mark is not requirement for files - http://en.wikipedia.org/wiki/Byte-order_mark#UTF-8

23:11 ~/java/git-others/checkstyle/checkstyle [master|✔] $ sudo apt-get install moreutils
.....
23:11 ~/java/git-others/checkstyle/checkstyle [master|✔] $ file -i pom.xml 
pom.xml: application/xml; charset=utf-8
23:11 ~/java/git-others/checkstyle/checkstyle [master|✔] $ file -i import-control.xml 
import-control.xml: application/xml; charset=us-ascii
23:12 ~/java/git-others/checkstyle/checkstyle [master|✔] $ isutf8 pom.xml 
23:12 ~/java/git-others/checkstyle/checkstyle [master|✔] $ isutf8 import-control.xml 
23:12 ~/java/git-others/checkstyle/checkstyle [master|✔] $ xxd pom.xml | head -2 
0000000: 3c3f 786d 6c20 7665 7273 696f 6e3d 2231  <?xml version="1
0000010: 2e30 2220 656e 636f 6469 6e67 3d22 5554  .0" encoding="UT
23:13 ~/java/git-others/checkstyle/checkstyle [master|✔] $ xxd import-control.xml | head -2 
0000000: 3c3f 786d 6c20 7665 7273 696f 6e3d 2231  <?xml version="1
0000010: 2e30 223f 3e0a 3c21 444f 4354 5950 4520  .0"?>.<!DOCTYPE 

We might need to port "isutf8" application from C++ to Java, sources https://joeyh.name/code/moreutils/ , file "isutf8.c".

Attention: we cannot force to use only utf-8!!!, any ascii is more preferable and should be accepted, see my example above.

We might need to use - http://jchardet.sourceforge.net/ , that could give us full functional support for most of encoding detection (not only utf-8).

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants