为解决GBK环境的输出乱码问题,尝试https://stackoverflow.com/questions/3017695/how-to-… #12

nobodxbodon · 2017-08-27T07:09:59Z

不知道有没有用?

…configure-encoding-in-maven

nobodxbodon · 2017-09-04T02:14:54Z

@azige 我之前在linux下试了好像没有用 (也把系统设为GBK). 不过症状不大一样. 你好像还有些字符能显示, 我是全都乱码. 不知有没有空试试中文windows?

azige · 2017-09-10T07:13:22Z

似乎没有效果，我把 project.reporting.outputEncoding 改成 GBK 也仍然没有变化

nobodxbodon · 2017-10-01T03:00:48Z

@azige 继续研究中。确认一下，除了这一处还有其他位置有乱码吗？我在Ubuntu14.04里用export LC_ALL=zh_CN.gbk把系统编码改成了GBK之后， mvn输出里开头也有乱码：

[INFO] ------------------------------------------------------------------------
[INFO] Building JUnit4���Ļ� 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------

如果你也是的话，应该就不是这个checkstyle插件特有的问题了。

azige · 2017-10-01T05:43:24Z

你试试看在那个环境下 Charset.defaultCharset() 获得的字符集是否跟你的设置相符？

这个问题修不好的话就暂时放置吧，毕竟影响不大而且本人这边也有替代方案

nobodxbodon · 2017-10-01T16:07:32Z

Charset.defaultCharset()返回的是UTF-8。好像开头加上<?xml version="1.0" encoding="UTF-8"?>能解决name输出乱码的问题。

虽然不一定能搞定这个问题（找到根源），不过还是想明白乱码的直接原因。。。

                String 原文本 = "开始检查……";
                System.out.println(new String(原文本.getBytes("GBK"), UTF_8)); 
		System.out.println(new String(原文本.getBytes(UTF_8), "GBK"));

输出是

��ʼ��顭��
寮�濮嬫鏌モ�︹��

第二个可以解释GBK编码的控制台输出了UTF8格式的字符串，但是解释不了UTF8编码的控制台为啥还是输出了部分乱码（��?始检查�?��??）

nobodxbodon · 2017-10-02T06:56:53Z

能够复现问题的简单代码： https://gist.github.com/nobodxbodon/8b7ff2df54b845a5a0851b887f038686

nobodxbodon · 2017-10-03T06:59:40Z

初步分析:
"开始检查……"的UTF-8表示是:
\xE5\xBC\x80\xE5\xA7\x8B\xE6\xA3\x80\xE6\x9F\xA5\xE2\x80\xA6\xE2\x80\xA6

GBK格式输出寮�濮嬫鏌モ�︹��的原因是:

编码	分段1	2	3	4	5	6	7	8	9	10	11
UTF8表示	E5 BC	80	E5 A7	8B E6	A3 80	E6 9F	A5 E2	80	A6 E2	80	A6
GBK字符	寮	�	濮	嬫		鏌	モ	�	︹	�	�

GBK码是双字节, 总体编码范围为 8140-FEFE, 首字节在 81-FE 之间(来源). 因此80不是合法GBK码, 显示为�; A3 80合法但没有对应字符, 显示成

更关键的问题: UTF8格式输出�?始检查�?��??. 在转为GBK编码时, 所有"不合法"的字节(见上表), 被置为了3f:

编码	原字1	原字2	原字3	原字4	原字5	原字6
原字	开	始	检	查	…	…
UTF8表示	e5 bc 80	e5 a7 8b	e6 a3 80	e6 9f a5	e2 80 a6	e2 80 a6
转为GBK后	e5 bc 3f	e5 a7 8b	e6 a3 80	e6 9f a5	e2 3f a6	e2 3f 3f
转回UTF8	�?	开	始	检	�?�	�??

3f对应的UTF8字符是?, 因此在转换回UTF8时显示为?, 而其他如e5 bc, e2,a6没有对应显示字符,因此显示为�

至于为何在转成GBK字符时置为了3f而不是保留原字节, 是因为CharsetEncoder默认在找不到匹配字符时, 用?替代:

protected CharsetEncoder(Charset cs,
                             float averageBytesPerChar,
                             float maxBytesPerChar)
    {
        this(cs,
             averageBytesPerChar, maxBytesPerChar,
             new byte[] { (byte)'?' });   <--- 这里是replacement 字节数组
    }

另外, 在转格式时, 字节数组的分段方式. 从上面两个表看, 好像是简单的从左到右最小匹配.

为解决GBK环境的输出乱码问题,尝试https://stackoverflow.com/questions/3017695/how-to-…

d87eb5b

…configure-encoding-in-maven

nobodxbodon requested a review from azige August 27, 2017 07:09

nobodxbodon mentioned this pull request Sep 21, 2017

简单易用的中文编程(脚本)语言原型 program-in-chinese/overview#33

Closed

This was referenced Oct 3, 2017

GBK<->UTF8 互转问题: Maven checkstyle输出乱码 program-in-chinese/overview#26

Closed

zh-cn ,,,, cmd gbk encode checkstyle/checkstyle#3569

Open

nobodxbodon mentioned this pull request Oct 18, 2017

索引: 对现有编程语言的英文关键词进行汉化或者再创造的实例 program-in-chinese/overview#25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

为解决GBK环境的输出乱码问题,尝试https://stackoverflow.com/questions/3017695/how-to-… #12

为解决GBK环境的输出乱码问题,尝试https://stackoverflow.com/questions/3017695/how-to-… #12

nobodxbodon commented Aug 27, 2017

nobodxbodon commented Sep 4, 2017

azige commented Sep 10, 2017

nobodxbodon commented Oct 1, 2017

azige commented Oct 1, 2017

nobodxbodon commented Oct 1, 2017

nobodxbodon commented Oct 2, 2017

nobodxbodon commented Oct 3, 2017 •

edited

Loading

为解决GBK环境的输出乱码问题,尝试https://stackoverflow.com/questions/3017695/how-to-… #12

Are you sure you want to change the base?

为解决GBK环境的输出乱码问题,尝试https://stackoverflow.com/questions/3017695/how-to-… #12

Conversation

nobodxbodon commented Aug 27, 2017

nobodxbodon commented Sep 4, 2017

azige commented Sep 10, 2017

nobodxbodon commented Oct 1, 2017

azige commented Oct 1, 2017

nobodxbodon commented Oct 1, 2017

nobodxbodon commented Oct 2, 2017

nobodxbodon commented Oct 3, 2017 • edited Loading

nobodxbodon commented Oct 3, 2017 •

edited

Loading