New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8285517: System.getenv() returns unexpected value if environment variable has non ASCII character #8378
Conversation
…able has non ASCII character
|
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the issue.
As to the test, it has lots of redundancy, and too complicated to me. Can you simplify the test?
@@ -142,11 +143,22 @@ static Platform get() { | |||
private static final Platform platform = Platform.get(); | |||
private static final LaunchMechanism launchMechanism = platform.launchMechanism(); | |||
private static final byte[] helperpath = toCString(StaticProperty.javaHome() + "/lib/jspawnhelper"); | |||
private static Charset jnuCharset = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply call jnuCharset()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change jnuCharset to "private static".
String nativeEncoding = System.getProperty("native.encoding"); | ||
String fileEncoding = System.getProperty("file.encoding"); | ||
Charset cs; | ||
if (nativeEncoding != null && !nativeEncoding.equals(fileEncoding)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for comparing file.encoding
? Does Charset.forName(nativeEncoding, Charset.defaultCharset())
suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove file.encoding related code.
final static int maxSize = 4096; | ||
|
||
public static void main(String[] args) throws Exception { | ||
ProcessBuilder pb = new ProcessBuilder(javeExe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can utilize jdk.test.lib.process.ProcessTools
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ProcessTools class.
|
||
/* | ||
* @test | ||
* @bug 8285517 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the test should work only on a particular env, should describe here and add @requires
tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "@requires (os.family == "linux")"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @modules jdk.charsets
should be added so that the test should be immediately ignored if it does not have EUC-JP
charset.
|
||
|
||
final class ProcessEnvironment | ||
{ | ||
private static final HashMap<Variable,Value> theEnvironment; | ||
private static final Map<String,String> theUnmodifiableEnvironment; | ||
static final int MIN_NAME_LENGTH = 0; | ||
static final Charset cs = Charset.forName(StaticProperty.nativeEncoding(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cs should have an all-CAPS name to make it clear its a static value throughout the implementation.
And private
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to "private static final"
Class<?> cls = Class.forName("java.lang.ProcessEnvironment"); | ||
Method enviorn_mid = cls.getDeclaredMethod("environ"); | ||
enviorn_mid.setAccessible(true); | ||
byte[][] environ = (byte[][]) enviorn_mid.invoke(null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "enviorn_mid" -> "environ_mid".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo
import java.util.StringJoiner; | ||
|
||
public class i18nEnvArg { | ||
final static String text = "\u6F22\u5B57"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be renamed to indicate it is the string under test? like KANJI_TEXT or EUC_JP_TEXT or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use EUC_JP_TEXT
environ.put("LANG", "ja_JP.eucjp"); | ||
Process p = pb.start(); | ||
InputStream is = p.getInputStream(); | ||
byte[] ba = new byte[maxSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use InputStream.readAllBytes()
here to return a byte buffer.
Its not clear why all the bytes are read? I assume its only for a possible error from the child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use InputStream.readAllBytes()
+ "java"; | ||
final static int maxSize = 4096; | ||
|
||
public static void main(String[] args) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 main()'s in this test; it would be useful to add a comment indicating the purpose of each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments
bb.put((byte)'='); | ||
bb.put(environ[i+1]); | ||
byte[] envb = Arrays.copyOf(ba, bb.position()); | ||
if (Arrays.equals(kanji, envb)) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to exit the loop on the first match.
I think AIX always has environment variables not set by the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this testing, use 2 environment variables:
- LANG=ja_JP.eucjp
- \u6F22\u5B57=\u6F22\u5B57
If the other environment variable is defined, it's printed.
} else if (b >= 0x20 && b <= 0x7F) { | ||
sb.append((char)b); | ||
} else { | ||
sb.append(String.format("\\x%02X", (int)b & 0xFF)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methods of java.util.HexFormat may be useful here.
It seems that the "5C" would fit into this "else" clause and not need a separate statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use HexFormat class
…able has non ASCII character
Hello @naotoj and @RogerRiggs . |
Can you clarify the use of different charset properties for the ProcessEnvironment vs the CommandLine strings? They are used to decode or encode strings in the APIs to native functions respectively. I think I would have expected to see |
Hello @RogerRiggs . |
|
||
/* | ||
* @test | ||
* @bug 8285517 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @modules jdk.charsets
should be added so that the test should be immediately ignored if it does not have EUC-JP
charset.
.shouldHaveExitValue(0); | ||
} | ||
|
||
public static class Start { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need Start
class? Can it be merged with the parent to reduce the complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, it expected one...
I created i18nEnvArg.sh for start-up, then add @modules jdk.charsets
and remove i18nEnvArg$Start
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if my comment was unclear. I meant to not create a shell script (shell scripts in the test are discouraged), but to re-use the main class twice, invoking with different arguments to main(), as both creating a testJvm process (one for eucjp jvm, and the other for Start).
String javeExe = System.getProperty("java.home") + | ||
File.separator + | ||
"bin" + | ||
File.separator + | ||
"java"; | ||
ProcessBuilder pb = new ProcessBuilder(javeExe, | ||
"--add-opens=java.base/java.lang=ALL-UNNAMED", | ||
"-classpath", | ||
System.getProperty("java.class.path"), | ||
"i18nEnvArg$Verify", | ||
EUC_JP_TEXT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can utilize ProcessTools
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ProcessTools
Class<?> cls = Class.forName("java.lang.ProcessEnvironment"); | ||
Method environ_mid = cls.getDeclaredMethod("environ"); | ||
environ_mid.setAccessible(true); | ||
byte[][] environ = (byte[][]) environ_mid.invoke(null, | ||
(Object[])null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I like this piece, invoking a package private method of ProcessEnvironment
class. Can we simply verify the result of System.getenv(EUC_JP_TEXT)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 3 processes required for this testing.
- Start test process on ja_JP.eucjp locale
- Set the test data by encoder
- Verify the encoded data by decoder
To verify the encoded data, I think I need to check the byte data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to verify the intermediate byte encoding? Could we simply compare the text given to environ.put() in Start process and System.getenv() in Verify process match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @naotoj .
I think if 2nd process' encoder (like UTF-8) and 3rd process' decoder are same encoding, System.getenv() returns expected data.
Actually the testcase checks 3 parts on Verify process:
- Expected environment variable is defined or not (it uses System.getenv())
- Expected argument is received or not
- Expected environment variable is encoded by proper encoding
When I ran this testcase with jdk18.0.1, I got following result:
Unexpected argument was received: \u6F22\u5B57<->\u7FB2\u221A\uFFFD
Unexpected environment variables: \xE6\xBC\xA2\xE5\xAD\x97\x3D\xE6\xBC\xA2\xE5\xAD\x97
It means 1st test was passed, 2nd and 3rd test were failed.
I don't think environment variable issue can be seen without 3rd test.
Please let me know if you find out another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the test is very brittle; I'm pretty sure it will fail on AIX that adds its own environment variables.
It should not fail if it finds the two entries it expects. It should ignore other entries.
I don't see what value it has over checking the entries from System.getEnv(), please elaborate.
…able has non ASCII character
Hello @RogerRiggs and @naotoj . I put new testcase. During my code changes, SecurityManager related testcases were failed. |
…able has non ASCII character
Hello @naotoj . |
@SuppressWarnings("removal") | ||
String jnuEncoding = AccessController.doPrivileged((PrivilegedAction<String>) () | ||
-> System.getProperty("sun.jnu.encoding")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@SuppressWarnings("removal") | ||
String jnuEncoding = AccessController.doPrivileged((PrivilegedAction<String>) () | ||
-> System.getProperty("sun.jnu.encoding")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to issue doPrivileged()
, which is deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @naotoj .
If I just use System.getProperty("sun.jnu.encoding")
, following testcases were failed.
java/lang/ProcessBuilder/SecurityManagerClinit.java
java/lang/ProcessHandle/PermissionTest.java
Please give me your suggestion again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. Never mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worthwhile to cache the sun.jnu.encoding
property value in jdk/internal/util/StaticProperty.
And perhaps even cache the Charset (not just the string).
var cmds = List.of( | ||
"--add-modules=" + System.getProperty("test.modules"), | ||
"-classpath", | ||
System.getProperty("test.class.path"), | ||
"-Dtest.jdk=" + System.getProperty("test.jdk"), | ||
"-Dtest.class.path=" + System.getProperty("test.class.path"), | ||
"-Dtest.modules=" + System.getProperty("test.modules"), | ||
"i18nEnvArg", | ||
"Start"); | ||
pb = ProcessTools.createTestJvm(cmds); | ||
Map<String, String> environ = pb.environment(); | ||
environ.clear(); | ||
environ.put("LANG", "ja_JP.eucjp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many duplicate pieces of code here and in the else
block below. Can you simplify this if
statement more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
But I'm not sure, it's expected one.
ByteArrayOutputStream baos = new ByteArrayOutputStream(); | ||
PrintStream ps = new PrintStream(baos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can utilize try-with-resources pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use shouldNotContain()
to find the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting try (ByteArrayOutputStream baos = ...) {
so that no need to clean them up, but I see you removed them. But I prefer not to use shouldNotContain("ERROR: ")
but to check the return value as before.
…able has non ASCII character
Hello @RogerRiggs and @naotoj . |
Charset cs = jnuEncoding != null | ||
? Charset.forName(jnuEncoding, Charset.defaultCharset()) | ||
: Charset.defaultCharset(); | ||
if (new String(tested.getBytes(cs), cs).equals(tested)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it always true that the round trip encoding to bytes and back (using the same Charset) to a string is equal()?
And if it is always true, then the if(...) can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above code is related to following code:
jdk/test/jdk/java/lang/ProcessBuilder/Basic.java
Lines 1569 to 1570 in 5212535
testEncoding("Latin1", "\u00f1\u00e1"); | |
testEncoding("Unicode", "\u22f1\u11e1"); |
If
Charset.defaultCharset()
is UTF-8
, this code is not skipped.I think this code will be skipped on JDK17 on C locale.
byte[] eucjp = "LANG=ja_JP.eucjp".getBytes(cs); | ||
String s = System.getenv(EUC_JP_TEXT); | ||
if (!EUC_JP_TEXT.equals(s)) { | ||
System.err.println("ERROR: getenv() returns unexpected data"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the unexpected data s
to the output string.
for(char ch : EUC_JP_TEXT.toCharArray()) { | ||
System.err.printf("\\u%04X", (int)ch); | ||
} | ||
System.err.print("<->"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be clearer if labeled as the actual/incorrect value and on a separate line.
} | ||
if (!EUC_JP_TEXT.equals(args[0])) { | ||
System.err.print("ERROR: Unexpected argument was received: "); | ||
for(char ch : EUC_JP_TEXT.toCharArray()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the expected value, the previous "Unexpected" labeling might be mis-understood.
Class<?> cls = Class.forName("java.lang.ProcessEnvironment"); | ||
Method environ_mid = cls.getDeclaredMethod("environ"); | ||
environ_mid.setAccessible(true); | ||
byte[][] environ = (byte[][]) environ_mid.invoke(null, | ||
(Object[])null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the test is very brittle; I'm pretty sure it will fail on AIX that adds its own environment variables.
It should not fail if it finds the two entries it expects. It should ignore other entries.
I don't see what value it has over checking the entries from System.getEnv(), please elaborate.
jnuCharset = Charset.forName(SUN_JNU_ENCODING, Charset.defaultCharset()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it is OK to initialize Charset
here, as sun_jnu_encoding
is initialized in System.initPhase1()
and pulling Charset
there may cause some init order change. I'd only cache the encoding string here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the initialization of sun.jnu.encoding
in System.java:2142'ish.
This happens before StaticProperty is initialized; at line 2147.
If the sun.jnu.encoding
is not supported (this is before Providers are enabled)
then it is forced to UTF-8
.
So it is the case that the encoding is supported by that point.
Note that Charset.isSupported(...)
does the full lookup in its implementation.
If it is not supported, the system still comes up using UTF-8 and a warning is printed at line 2282.
Comparing the class initialization order may be a useful thing to cross check.
…able has non ASCII character
Hello @RogerRiggs .
The new testcase verifies internal byte data for EUC_JP_TEXT environment variable and value. |
I checked internal data
|
|
||
|
||
final class ProcessEnvironment | ||
{ | ||
private static final HashMap<Variable,Value> theEnvironment; | ||
private static final Map<String,String> theUnmodifiableEnvironment; | ||
static final int MIN_NAME_LENGTH = 0; | ||
private static final Charset jnuCharset = StaticProperty.jnuCharset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the Charset is cached in StaticProperty, I don't think its worthwhile to make a local copy of the same ref.
Just refer to StaticProperty.jnuCharset()
where it is needed. (Here and in ProcessImpl)
…able has non ASCII character
Hello @RogerRiggs and @naotoj . |
@@ -241,4 +246,29 @@ public static String fileEncoding() { | |||
public static String javaPropertiesDate() { | |||
return JAVA_PROPERTIES_DATE; | |||
} | |||
|
|||
/** | |||
* Return the {@code sun.jnu.encoding} system property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be eliminated by changing the@return
block tag below to {@return the {@code sun.jnu.encoding} ...}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @naotoj .
I appreciate your comment.
I'd like to confirm one thing.
I applied following change
/**
* {@return the {@code sun.jnu.encoding} system property}
*
* <strong>{@link SecurityManager#checkPropertyAccess} is NOT checked
* in this method. The caller of this method should take care to ensure
* that the returned property is not made accessible to untrusted code.</strong>
*/
By javadoc command with -html5 option, above part was converted to
Returns the sun.jnu.encoding system property. SecurityManager.checkPropertyAccess(java.lang.String)
is NOT checked in this method. The caller of this method should take care to ensure that the returned
property is not made accessible to untrusted code.
The 1st word changed from Return to Returns.
Is it OK ?
And it seems @return
is translated to Japanese on Japanese environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1st word changed from Return to Returns.
Is it OK ?
That's more of plain English to me. Maybe some natives can correct it if needed.
And it seems @return is translated to Japanese on Japanese environment.
Yes. I believe that's what is supposed to work.
} | ||
|
||
/** | ||
* Return charset for {@code sun.jnu.encoding} system property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. charset
can be capitalized and changed to {@code}
.
…able has non ASCII character
Hello @naotoj . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for the fix!
@takiguc This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 212 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
Hello @RogerRiggs . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine.
(I don't personally like how {@return looks in the source; but its done now. Usually it is preferred to stick to the style in the file and not mix stylistic changes with functional changes).
Thanks for the updates.
/integrate |
Going to push as commit 890771e.
Your commit was automatically rebased without conflicts. |
On JDK19 with Linux ja_JP.eucjp locale,
System.getenv() returns unexpected value if environment variable has Japanese EUC characters.
It seems this issue happens because of JEP 400.
Arguments for ProcessBuilder have same kind of issue.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8378/head:pull/8378
$ git checkout pull/8378
Update a local copy of the PR:
$ git checkout pull/8378
$ git pull https://git.openjdk.java.net/jdk pull/8378/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8378
View PR using the GUI difftool:
$ git pr show -t 8378
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8378.diff