8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern #5704
I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
The failure is caused by non-ASCII chars in the format string of sscanf , which is non-portable on our Windows platform.
So it would be nice to remove these non-ASCII chars (
This is because:
You may argue that the non-ASCII may be used by the parser itself.
So I suggest to remove these non-ASCII code to make HotSpot to be more portable.
The text was updated successfully, but these errors were encountered:
FWIW, there was some prior discussion here about this code as well: #3107
tl;dr MSVC uses the system locale's code page to parse this, which must be set to
Though, I can't comment on the changes in this PR.
Thanks @JornVernee for your comments.
My system local is zh-cn.
But changing the locale isn't acceptable since many of our Apps require zh-cn in our country.
According to the JBS, C4819 warning was first observed with VS2017 and was disabled by JDK-8216154.
If the non-ascii code is useless, it should be removed to make HotSpot to be more portable.
I understand, and that is totally reasonable to me.
There might be another way to change the locale just for the compilation , but I haven't had time to test that (so for now I think the official advice is to us
In your case the compiler produced some warnings, but I'm wondering if using a different encoding could also silently create subtle behavioral changes. I think it would be good if a specific encoding could be used at build time.
I agree with your reasoning, but I can not comment on the contents of the patch, because I'm not a maintainer of this code.
Thanks for your suggestions, @JornVernee .
Let's see what others think of the change.
Hope the non-ascii code is actually not used.
Thanks @vnkozlov for your very helpful comments.
I have one question: how can we specify (non-ascii chars) and (non-printable ascii chars) through
I just learned from https://bugs.openjdk.java.net/browse/JDK-8027829 that we can use unicode like
My example was made from: https://bugs.openjdk.java.net/secure/attachment/17128/UnicodeIdentifierTest.java
And I tried to exclude some specific methods like this
But none of them worked.
So if there is no other way to specify a non-ascii chars, it seems safe to remove the non-ascii code.
If I miss something, please let me know.
Some misc remarks from a build PoV:
From what I see in the discussion here there seems to be no clarity in what range of character the specification allows. This needs to be absolutely clear for any changes here -- we can't filter out legal characters just because they are problematic to build on non en_US platforms.
However, I'm thinking that you need to take a step back and see what you are really trying to solve. To me, it seems that sscanf is not the right tool for the job, and the fact that it has worked until now is more a lucky coincidence. It seems, from a quick glance, that you should consider the input a byte array, and process it like that, instead of a string, if the encoding is unclear, and the spec is talking about character values (like 0x7f) rather than what characters they are supposed to represent in a specific encoding.
Thanks @magicus .
The background is that we want to build CI/CD pipelines for Windows platforms to help the OpenJDK development.
We already have enough Linux and MacOS pipelines but still not have one for Windows.
But to my surprise, OpenJDK fails to build on our Windows platforms.
You may suggest changing the locale settings.
It's not our goal to make CompileCommand work with non-ASCII chars.
(The Chinese characters in this comment may not be displayed properly inside an e-mail reader. Please see this comment on GitHub #5704)
-XX:CompileCommand does not process \uxxxx sequences. However, if your shell's locale is UTF8, you can do something like this, by directly entering them on the command-line, without escaping with \u:
The current limitations of the MethodMather class are:
Note that a "locale" contains 3 parts: language, country and character encoding. For example,
The first two support non-ASCII characters in -XX:CompileCommand, but the third one doesn't.
 MethodMather uses
I don't think we can solve  easily. To handle non-ASCII characters that are non encoded in UTF8, we need to call NewPlatformString() in src/java.base/share/native/libjli/java.c. However, this executes Java code, but -XX:CompileCommand needs to be processed before any Java code execution. ==> Proposal: IGNORE it for now.
For , there are two distinct issues:
(a) The restriction checks are invalid when the JVM is running in an non-UTF8 encoding -- this is a moot point because we can't handle  anyway, so the data given to sscanf() is already bad. => Proposal: IGNORE it for now
(b) VC++ compilation warning when methodMather.cpp is compiled in non-UTF8 environments
This is just a warning, and (I think .....) it doesn't change the object file at all. I.e., the literal strings in methodMatcher.obj are exactly the same as if methodMather.cpp is compiled under a UTF8 environment.
Proposal: use pragma to disable the warning.
@DamonFool could you try this experiment:
(If this doesn't work, an alternative is to avoid using sscanf and write our own parser).
Thanks @iklam for your excellent analysis.
So HotSpot does support non-ASCII chars as names.
I will do your experiment next week.
No need to hurry :-). In case you can't find an English Windows, I think you can use the
I also note the warning
It is already different with the original
Not sure if this fact is sufficient to say the literal strings will be different in methodMatcher.obj.
Hi @iklam ,
methodMatcher.obj  built with
There seems no difference when checking with
The warnings disappear when the system locale is
So it's far more complicated than I had thought.
Thank you all for your help and valuable comments.
My experiments above with
As you can see, the CJK characters in the command-line arguments can't even be correctly passed as arguments to the Java main class. If that doesn't work, I can't see how we can get
On 2021-10-05 08:41, Ioi Lam wrote:
So, what does that mean? That we should explicitly limit
My experiments show that CompileCommand doesn't work with non-US-English env Windows.
The patch has been updated.
What do you think?
@DamonFool This change now passes all automated pre-integration checks.
After integration, the commit message for the final commit will be:
At the time when this comment was updated there had been 126 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
Going to push as commit c833b4d.
Your commit was automatically rebased without conflicts.