Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help text has wrong encoding when "Beta: Use Unicode UTF-8 for worldwide language support" is enabled #43

Closed
everything411 opened this issue May 16, 2021 · 17 comments
Labels
bug Something isn't working openjdk

Comments

@everything411
Copy link

Describe the bug
I'm using the Chinese Simplified version of Windows 10 with "Beta: Use Unicode UTF-8 for worldwide language support" enabled. Help text in java is still encoded with GBK, so it cannot be displayed correctly. Other texts shown by java.exe are also affected. I don't know if this happens for other languages or not.

I also tryed AdoptOpenJDK, and it also have this problem. So maybe this is an upstream bug? Where should i report this?

Thanks

Steps to reproduce the behavior:

  1. Windows 10, Chinese Simplified
  2. Enable "Beta: Use Unicode UTF-8 for worldwide language support" in control panel
  3. java --help
  4. See error

Expected behavior
Help text is shown correctly.

Screenshots
screenshot

output of java --help, help text encoded with GBK while the system is using UTF-8
java-help.txt

@karianna karianna added this to the July 2021 PSU milestone May 20, 2021
@karianna karianna added this to Needs triage in Microsoft Build of OpenJDK via automation May 20, 2021
@karianna karianna added the bug Something isn't working label May 20, 2021
@karianna
Copy link
Member

Hi @everything411 - If this is also occurring with AdoptOpenJDK then the likely issue is with OpenJDK itself (or some common configuration that you need to set). We'll see if we can reproduce and advise on next steps (or submit an upstream issue on your behalf).

@karianna karianna changed the title Help text has wrong encode when "Beta: Use Unicode UTF-8 for worldwide language support" is enabled Help text has wrong encoding when "Beta: Use Unicode UTF-8 for worldwide language support" is enabled Jun 7, 2021
@karianna karianna added bug Something isn't working and removed bug Something isn't working labels Jun 17, 2021
@brunoborges brunoborges removed this from the July 2021 PSU milestone Sep 13, 2021
@brunoborges
Copy link
Member

@gdams could you take a look into this please and verify if this also occurs on Adoptium?

@everything411
Copy link
Author

@gdams could you take a look into this please and verify if this also occurs on Adoptium?

fixed in latest Adoptium JDK17-beta. still wrong encoding for Adoptium JDK16 JDK11 and JDK8
jdk17
jdk11
jdk16

@brunoborges
Copy link
Member

@everything411 could you please check if this happens with the MS Build of OpenJDK binaries? Which versions the problem appears, which don't?

@brunoborges brunoborges added this to the October PSU milestone Oct 7, 2021
@everything411
Copy link
Author

@brunoborges

MS Build of OpenJDK 17: the problem don't appear
MS Build of OpenJDK 11: the problem appears

so it seems that this problem is fixed in upstream jdk 17 but not in other versions of jdk?

@brunoborges
Copy link
Member

@everything411 thanks for testing! If you don't mind one final question: is there an OpenJDK 11 build that you've seen that doesn't has this problem?

Maybe Zulu, or Oracle JDK?

@brunoborges
Copy link
Member

Hi @everything411 @cyhhao

Could you please check if this issue is still happening with the packages published at microsoft.com/openjdk ?

@everything411
Copy link
Author

@brunoborges

> chcp
Active code page: 65001

java11

> java --version
openjdk 11.0.14.1 2022-02-08 LTS
OpenJDK Runtime Environment Microsoft-31205 (build 11.0.14.1+1-LTS)
OpenJDK 64-Bit Server VM Microsoft-31205 (build 11.0.14.1+1-LTS, mixed mode)

java17

> java --version
openjdk 17.0.3 2022-04-19 LTS
OpenJDK Runtime Environment Microsoft-32931 (build 17.0.3+7-LTS)
OpenJDK 64-Bit Server VM Microsoft-32931 (build 17.0.3+7-LTS, mixed mode, sharing)

the same result as before, ok for jdk17 and bad encoding for jdk11.

i also notice that javac's help text still broken in both jdk11 and jdk17. these texts are GBK-encoded and then printed to the UTF-8 console, leading to these "�"

> javac
�÷�: javac <options> <source files>
����, ���ܵ�ѡ�����:
  @<filename>                  ���ļ���ȡѡ����ļ���
  -Akey[=value]                ���ݸ�ע�ʹ�������ѡ��
  --add-modules <ģ��>(,<ģ��>)*
        ���˳�ʼģ��֮��Ҫ�����ĸ�ģ��; ��� <module>
                Ϊ ALL-MODULE-PATH, ��Ϊģ��·���е�����ģ�顣
  --boot-class-path <path>, -bootclasspath <path>
        �����������ļ���λ��

encoding of compiling error texts are bad, too, GBK-encoded text printed to UTF-8 console

> java .\test.java
.\test.java:7: ����: δ������쳣����FileNotFoundException; ���������в���������Ա��׳�
                InputStreamReader fileReader = new InputStreamReader(new FileInputStream(new File("not exist")), StandardCharsets.UTF_8);
                                                                     ^
1 ������
错误: 编译失败

and i also find that runtime exception texts encoding for jdk11 is ok but for jdk17 it is bad

for jdk11 "系统找不到指定的文件" means "No such file or directory" in english

> java .\test.java
Exception in thread "main" java.io.FileNotFoundException: not exist (系统找不到指定的文件)
        at java.base/java.io.FileInputStream.open0(Native Method)
        at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
        at Test.main(test.java:5)

for jdk17, "绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆�" is meaningless, and it seems that "绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆�" is the text "系统找不到指定的文件" encoded in UTF-8 is decoded as GBK, and then the GBK-decoded text is encoded in UTF-8 and printed to the UTF-8 console

> java .\test.java
Exception in thread "main" java.io.FileNotFoundException: not exist (绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆�)
        at java.base/java.io.FileInputStream.open0(Native Method)
        at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
        at Test.main(test.java:5)

@everything411
Copy link
Author

output of java.exe -XshowSettings:properties -version for jdk11

> java.exe -XshowSettings:properties -version
Property settings:
    awt.toolkit = sun.awt.windows.WToolkit
    file.encoding = GBK
    file.separator = \
    java.awt.graphicsenv = sun.awt.Win32GraphicsEnvironment
    java.awt.printerjob = sun.awt.windows.WPrinterJob
    java.class.path =
    java.class.version = 55.0
    java.home = C:\Program Files\Microsoft\jdk-11.0.14.101-hotspot
    java.io.tmpdir = C:\Users\EVERYT~1\AppData\Local\Temp\
    java.library.path = C:\Program Files\Microsoft\jdk-11.0.14.101-hotspot\bin
(omitted)
        .
    java.runtime.name = OpenJDK Runtime Environment
    java.runtime.version = 11.0.14.1+1-LTS
    java.specification.name = Java Platform API Specification
    java.specification.vendor = Oracle Corporation
    java.specification.version = 11
    java.vendor = Microsoft
    java.vendor.url = https://www.microsoft.com
    java.vendor.url.bug = https://github.com/microsoft/openjdk/issues
    java.vendor.version = Microsoft-31205
    java.version = 11.0.14.1
    java.version.date = 2022-02-08
    java.vm.compressedOopsMode = Zero based
    java.vm.info = mixed mode
    java.vm.name = OpenJDK 64-Bit Server VM
    java.vm.specification.name = Java Virtual Machine Specification
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 11
    java.vm.vendor = Microsoft
    java.vm.version = 11.0.14.1+1-LTS
    jdk.debug = release
    line.separator = \r \n
    os.arch = amd64
    os.name = Windows 11
    os.version = 10.0
    path.separator = ;
    sun.arch.data.model = 64
    sun.boot.library.path = C:\Program Files\Microsoft\jdk-11.0.14.101-hotspot\bin
    sun.cpu.endian = little
    sun.cpu.isalist = amd64
    sun.desktop = windows
    sun.io.unicode.encoding = UnicodeLittle
    sun.java.launcher = SUN_STANDARD
    sun.jnu.encoding = GBK
    sun.management.compiler = HotSpot 64-Bit Tiered Compilers
    sun.os.patch.level =
    sun.stderr.encoding = cp65001
    sun.stdout.encoding = cp65001
    user.country = CN
    user.dir = C:\Users\everything411
    user.home = C:\Users\everything411
    user.language = zh
    user.name = everything411
    user.script =
    user.timezone =
    user.variant =

openjdk version "11.0.14.1" 2022-02-08 LTS
OpenJDK Runtime Environment Microsoft-31205 (build 11.0.14.1+1-LTS)
OpenJDK 64-Bit Server VM Microsoft-31205 (build 11.0.14.1+1-LTS, mixed mode)

output of java.exe -XshowSettings:properties -version for jdk17

    file.encoding = GBK
    file.separator = \
    java.class.path =
    java.class.version = 61.0
    java.home = C:\Program Files\Microsoft\jdk-17.0.3.7-hotspot
    java.io.tmpdir = C:\Users\EVERYT~1\AppData\Local\Temp\
    java.library.path = C:\Program Files\Microsoft\jdk-17.0.3.7-hotspot\bin
(omitted)
        .
    java.runtime.name = OpenJDK Runtime Environment
    java.runtime.version = 17.0.3+7-LTS
    java.specification.name = Java Platform API Specification
    java.specification.vendor = Oracle Corporation
    java.specification.version = 17
    java.vendor = Microsoft
    java.vendor.url = https://www.microsoft.com
    java.vendor.url.bug = https://github.com/microsoft/openjdk/issues
    java.vendor.version = Microsoft-32931
    java.version = 17.0.3
    java.version.date = 2022-04-19
    java.vm.compressedOopsMode = Zero based
    java.vm.info = mixed mode, sharing
    java.vm.name = OpenJDK 64-Bit Server VM
    java.vm.specification.name = Java Virtual Machine Specification
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 17
    java.vm.vendor = Microsoft
    java.vm.version = 17.0.3+7-LTS
    jdk.debug = release
    line.separator = \r \n
    native.encoding = GBK
    os.arch = amd64
    os.name = Windows 11
    os.version = 10.0
    path.separator = ;
    sun.arch.data.model = 64
    sun.boot.library.path = C:\Program Files\Microsoft\jdk-17.0.3.7-hotspot\bin
    sun.cpu.endian = little
    sun.cpu.isalist = amd64
    sun.io.unicode.encoding = UnicodeLittle
    sun.java.launcher = SUN_STANDARD
    sun.jnu.encoding = GBK
    sun.management.compiler = HotSpot 64-Bit Tiered Compilers
    sun.os.patch.level =
    sun.stderr.encoding = UTF-8
    sun.stdout.encoding = UTF-8
    user.country = CN
    user.dir = C:\Users\everything411
    user.home = C:\Users\everything411
    user.language = zh
    user.name = everything411
    user.script =
    user.variant =

openjdk version "17.0.3" 2022-04-19 LTS
OpenJDK Runtime Environment Microsoft-32931 (build 17.0.3+7-LTS)
OpenJDK 64-Bit Server VM Microsoft-32931 (build 17.0.3+7-LTS, mixed mode, sharing)

@everything411
Copy link
Author

i tried Temurin JDK 18 and java and javac is ok.

> java.exe
用法:java [options] <主类> [args...]
           (执行类)
   或  java [options] -jar <jar 文件> [args...]
           (执行 jar 文件)
   或  java [options] -m <模块>[/<主类>] [args...]
       java [options] --module <模块>[/<主类>] [args...]
           (执行模块中的主类)
   或  java [options] <源文件> [args]
           (执行单个源文件程序)

> javac.exe
用法: javac <options> <source files>
其中, 可能的选项包括:
  @<filename>                  从文件读取选项和文件名
  -Akey[=value]                传递给注释处理程序的选项
  --add-modules <模块>(,<模块>)*
        除了初始模块之外要解析的根模块; 如果 <module>
                为 ALL-MODULE-PATH, 则为模块路径中的所有模块。
  --boot-class-path <path>, -bootclasspath <path>
        覆盖引导类文件的位置

> java.exe" .\test.java
.\test.java:5: 错误: 未报告的异常错误FileNotFoundException; 必须对其进行捕获或声明以便抛出
                InputStreamReader fileReader = new InputStreamReader(new FileInputStream(new File("not exist")), StandardCharsets.UTF_8);
                                                                     ^
1 个错误
错误: 编译失败

However, runtime exception texts are still bad, the same problem as jdk17

Exception in thread "main" java.io.FileNotFoundException: not exist (绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆�)
        at java.base/java.io.FileInputStream.open0(Native Method)
        at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
        at Test.main(test.java:5)

output of java.exe -XshowSettings:properties -version for jdk18

Property settings:
    file.encoding = UTF-8
    file.separator = \
    java.class.path =
    java.class.version = 62.0
    java.home = C:\Program Files\Eclipse Adoptium\jdk-18.0.1.10-hotspot
    java.io.tmpdir = C:\Users\EVERYT~1\AppData\Local\Temp\
    java.library.path = C:\Program Files\Eclipse Adoptium\jdk-18.0.1.10-hotspot\bin
(omitted)
        .
    java.runtime.name = OpenJDK Runtime Environment
    java.runtime.version = 18.0.1+10
    java.specification.name = Java Platform API Specification
    java.specification.vendor = Oracle Corporation
    java.specification.version = 18
    java.vendor = Eclipse Adoptium
    java.vendor.url = https://adoptium.net/
    java.vendor.url.bug = https://github.com/adoptium/adoptium-support/issues
    java.vendor.version = Temurin-18.0.1+10
    java.version = 18.0.1
    java.version.date = 2022-04-19
    java.vm.compressedOopsMode = Zero based
    java.vm.info = mixed mode, sharing
    java.vm.name = OpenJDK 64-Bit Server VM
    java.vm.specification.name = Java Virtual Machine Specification
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 18
    java.vm.vendor = Eclipse Adoptium
    java.vm.version = 18.0.1+10
    jdk.debug = release
    line.separator = \r \n
    native.encoding = GBK
    os.arch = amd64
    os.name = Windows 11
    os.version = 10.0
    path.separator = ;
    sun.arch.data.model = 64
    sun.boot.library.path = C:\Program Files\Eclipse Adoptium\jdk-18.0.1.10-hotspot\bin
    sun.cpu.endian = little
    sun.cpu.isalist = amd64
    sun.io.unicode.encoding = UnicodeLittle
    sun.java.launcher = SUN_STANDARD
    sun.jnu.encoding = GBK
    sun.management.compiler = HotSpot 64-Bit Tiered Compilers
    sun.os.patch.level =
    sun.stderr.encoding = UTF-8
    sun.stdout.encoding = UTF-8
    user.country = CN
    user.dir = C:\Users\everything411
    user.home = C:\Users\everything411
    user.language = zh
    user.name = everything411
    user.script =
    user.variant =

openjdk version "18.0.1" 2022-04-19
OpenJDK Runtime Environment Temurin-18.0.1+10 (build 18.0.1+10)
OpenJDK 64-Bit Server VM Temurin-18.0.1+10 (build 18.0.1+10, mixed mode, sharing)

@imba-tjd
Copy link

imba-tjd commented May 6, 2022

Digression: Why do you want to enable this beta utf-8 option?

@everything411
Copy link
Author

@imba-tjd linux and macos both set the default encoding to utf8. i need to share source codes with chinese characters between my windows machine and linux machine (wsl1 and wsl2 use utf-8, too).

@imba-tjd
Copy link

imba-tjd commented May 6, 2022

Did you tried to input Chinese from stdin? Try this

System.out.println(new Scanner(System.in).nextLine());

You will find that it fails to read, if you enabled the beta utf8.

@everything411
Copy link
Author

I have also noticed this bug before. infact it not only affects java, but for C scanf, C++ cin, C# Console.Readline, they all don't accept chinese when utf8 enabled.

I believe that this is a windows console related bug instead of the language runtime.
see https://docs.microsoft.com/zh-cn/windows/console/classic-vs-vt and microsoft/terminal#7777

@d3r3kk d3r3kk removed their assignment Jul 21, 2022
@eirikbakke
Copy link

The issue https://bugs.openjdk.org/browse/JDK-8272352 might be relevant here; it was backported to OpenJDK 11.0.17 and Java 17.0.5 quite recently.

(Just passing by... I saw this thread as I was fixing Unicode problems in the NetBeans IDE.)

@karianna
Copy link
Member

@everything411 Are you able to try with our latest 17.0.5 build? As @eirikbakke mentions, the upstream issue seems to be fixed.

@everything411
Copy link
Author

@karianna I can confirm that all bugs I reported here no longer exist now.

Microsoft Build of OpenJDK automation moved this from TODO to Closed Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working openjdk
Projects
Development

No branches or pull requests

6 participants