Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8285517: System.getenv() returns unexpected value if environment variable has non ASCII character #8378

Closed
wants to merge 9 commits into from
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2018, 2021, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2018, 2022, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
Expand All @@ -26,6 +26,7 @@
package jdk.internal.util;

import java.util.Properties;
import java.nio.charset.Charset;

/**
* System Property access for internal use only.
Expand All @@ -52,6 +53,8 @@ public final class StaticProperty {
private static final String NATIVE_ENCODING;
private static final String FILE_ENCODING;
private static final String JAVA_PROPERTIES_DATE;
private static final String SUN_JNU_ENCODING;
private static final Charset jnuCharset;

private StaticProperty() {}

Expand All @@ -69,6 +72,8 @@ private StaticProperty() {}
NATIVE_ENCODING = getProperty(props, "native.encoding");
FILE_ENCODING = getProperty(props, "file.encoding");
JAVA_PROPERTIES_DATE = getProperty(props, "java.properties.date", null);
SUN_JNU_ENCODING = getProperty(props, "sun.jnu.encoding");
jnuCharset = Charset.forName(SUN_JNU_ENCODING, Charset.defaultCharset());
}
Comment on lines +76 to 77
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it is OK to initialize Charset here, as sun_jnu_encoding is initialized in System.initPhase1() and pulling Charset there may cause some init order change. I'd only cache the encoding string here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the initialization of sun.jnu.encoding in System.java:2142'ish.
This happens before StaticProperty is initialized; at line 2147.
If the sun.jnu.encoding is not supported (this is before Providers are enabled)
then it is forced to UTF-8.
So it is the case that the encoding is supported by that point.
Note that Charset.isSupported(...) does the full lookup in its implementation.

If it is not supported, the system still comes up using UTF-8 and a warning is printed at line 2282.

Comparing the class initialization order may be a useful thing to cross check.


private static String getProperty(Properties props, String key) {
Expand Down Expand Up @@ -241,4 +246,29 @@ public static String fileEncoding() {
public static String javaPropertiesDate() {
return JAVA_PROPERTIES_DATE;
}

/**
* Return the {@code sun.jnu.encoding} system property.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be eliminated by changing the@return block tag below to {@return the {@code sun.jnu.encoding} ...}.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @naotoj .
I appreciate your comment.
I'd like to confirm one thing.
I applied following change

    /**
     * {@return the {@code sun.jnu.encoding} system property}
     *
     * <strong>{@link SecurityManager#checkPropertyAccess} is NOT checked
     * in this method. The caller of this method should take care to ensure
     * that the returned property is not made accessible to untrusted code.</strong>
     */

By javadoc command with -html5 option, above part was converted to

Returns the sun.jnu.encoding system property. SecurityManager.checkPropertyAccess(java.lang.String) 
is NOT checked in this method. The caller of this method should take care to ensure that the returned 
property is not made accessible to untrusted code.

The 1st word changed from Return to Returns.
Is it OK ?
And it seems @return is translated to Japanese on Japanese environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1st word changed from Return to Returns.
Is it OK ?

That's more of plain English to me. Maybe some natives can correct it if needed.

And it seems @return is translated to Japanese on Japanese environment.

Yes. I believe that's what is supposed to work.

*
* <strong>{@link SecurityManager#checkPropertyAccess} is NOT checked
* in this method. The caller of this method should take care to ensure
* that the returned property is not made accessible to untrusted code.</strong>
*
* @return the {@code sun.jnu.encoding} system property
*/
public static String jnuEncoding() {
return SUN_JNU_ENCODING;
}

/**
* Return charset for {@code sun.jnu.encoding} system property.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. charset can be capitalized and changed to {@code}.

*
* <strong>If {@code sun.jnu.encoding} system property has invalid
* encoding name, {@link Charset#defaultCharset()} is returned.</strong>
*
* @return charset for {@code sun.jnu.encoding} system property
*/
public static Charset jnuCharset() {
return jnuCharset;
}
}
13 changes: 8 additions & 5 deletions src/java.base/unix/classes/java/lang/ProcessEnvironment.java
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2003, 2011, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2003, 2022, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
Expand Down Expand Up @@ -55,14 +55,17 @@
package java.lang;

import java.io.*;
import java.nio.charset.Charset;
import java.util.*;
import jdk.internal.util.StaticProperty;


final class ProcessEnvironment
{
private static final HashMap<Variable,Value> theEnvironment;
private static final Map<String,String> theUnmodifiableEnvironment;
static final int MIN_NAME_LENGTH = 0;
private static final Charset jnuCharset = StaticProperty.jnuCharset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Charset is cached in StaticProperty, I don't think its worthwhile to make a local copy of the same ref.
Just refer to StaticProperty.jnuCharset() where it is needed. (Here and in ProcessImpl)


static {
// We cache the C environment. This means that subsequent calls
Expand Down Expand Up @@ -163,7 +166,7 @@ public static Variable valueOfQueryOnly(Object str) {
}

public static Variable valueOfQueryOnly(String str) {
return new Variable(str, str.getBytes());
return new Variable(str, str.getBytes(jnuCharset));
}

public static Variable valueOf(String str) {
Expand All @@ -172,7 +175,7 @@ public static Variable valueOf(String str) {
}

public static Variable valueOf(byte[] bytes) {
return new Variable(new String(bytes), bytes);
return new Variable(new String(bytes, jnuCharset), bytes);
}

public int compareTo(Variable variable) {
Expand All @@ -196,7 +199,7 @@ public static Value valueOfQueryOnly(Object str) {
}

public static Value valueOfQueryOnly(String str) {
return new Value(str, str.getBytes());
return new Value(str, str.getBytes(jnuCharset));
}

public static Value valueOf(String str) {
Expand All @@ -205,7 +208,7 @@ public static Value valueOf(String str) {
}

public static Value valueOf(byte[] bytes) {
return new Value(new String(bytes), bytes);
return new Value(new String(bytes, jnuCharset), bytes);
}

public int compareTo(Value value) {
Expand Down
6 changes: 4 additions & 2 deletions src/java.base/unix/classes/java/lang/ProcessImpl.java
Expand Up @@ -35,6 +35,7 @@
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.EnumSet;
import java.util.Locale;
Expand Down Expand Up @@ -141,12 +142,13 @@ static Platform get() {

private static final Platform platform = Platform.get();
private static final LaunchMechanism launchMechanism = platform.launchMechanism();
private static final Charset jnuCharset = StaticProperty.jnuCharset();
private static final byte[] helperpath = toCString(StaticProperty.javaHome() + "/lib/jspawnhelper");

private static byte[] toCString(String s) {
if (s == null)
return null;
byte[] bytes = s.getBytes();
byte[] bytes = s.getBytes(jnuCharset);
byte[] result = new byte[bytes.length + 1];
System.arraycopy(bytes, 0,
result, 0,
Expand All @@ -170,7 +172,7 @@ static Process start(String[] cmdarray,
byte[][] args = new byte[cmdarray.length-1][];
int size = args.length; // For added NUL bytes
for (int i = 0; i < args.length; i++) {
args[i] = cmdarray[i+1].getBytes();
args[i] = cmdarray[i+1].getBytes(jnuCharset);
size += args[i].length;
}
byte[] argBlock = new byte[size];
Expand Down
9 changes: 7 additions & 2 deletions test/jdk/java/lang/ProcessBuilder/Basic.java
Expand Up @@ -27,7 +27,7 @@
* 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313
* 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958
* 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464
* 8067796 8224905 8263729 8265173 8272600 8231297 8282219
* 8067796 8224905 8263729 8265173 8272600 8231297 8282219 8285517
* @key intermittent
* @summary Basic tests for Process and Environment Variable code
* @modules java.base/java.lang:open
Expand All @@ -50,6 +50,7 @@
import static java.lang.ProcessBuilder.Redirect.*;

import java.io.*;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
Expand Down Expand Up @@ -598,7 +599,11 @@ private static void testEncoding(String encoding, String tested) {
try {
// If round trip conversion works, should be able to set env vars
// correctly in child.
if (new String(tested.getBytes()).equals(tested)) {
String jnuEncoding = System.getProperty("sun.jnu.encoding");
Charset cs = jnuEncoding != null
? Charset.forName(jnuEncoding, Charset.defaultCharset())
: Charset.defaultCharset();
if (new String(tested.getBytes(cs), cs).equals(tested)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it always true that the round trip encoding to bytes and back (using the same Charset) to a string is equal()?
And if it is always true, then the if(...) can be removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above code is related to following code:

testEncoding("Latin1", "\u00f1\u00e1");
testEncoding("Unicode", "\u22f1\u11e1");

If Charset.defaultCharset() is UTF-8, this code is not skipped.
I think this code will be skipped on JDK17 on C locale.

out.println("Testing " + encoding + " environment values");
ProcessBuilder pb = new ProcessBuilder();
pb.environment().put("ASCIINAME",tested);
Expand Down
138 changes: 138 additions & 0 deletions test/jdk/java/lang/System/i18nEnvArg.java
@@ -0,0 +1,138 @@
/*
* Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*/

/*
* @test
* @bug 8285517
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test should work only on a particular env, should describe here and add @requires tag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "@requires (os.family == "linux")"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also @modules jdk.charsets should be added so that the test should be immediately ignored if it does not have EUC-JP charset.

* @summary System.getenv() and argument don't return locale dependent data by JEP400
* @requires (os.family == "linux")
* @modules jdk.charsets
* @library /test/lib
* @build jdk.test.lib.process.*
* @run main i18nEnvArg
*/

import java.lang.reflect.Method;
import java.nio.ByteBuffer;
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.HexFormat;
import java.util.List;
import java.util.Map;
import jdk.test.lib.process.ProcessTools;

public class i18nEnvArg {
final static String EUC_JP_TEXT = "\u6F22\u5B57";

/*
* Checks OS is Linux and OS has ja_JP.eucjp locale or not.
* Sets EUC_JP's environment variable and argunments against ProcessBuilder
*/
public static void main(String[] args) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 main()'s in this test; it would be useful to add a comment indicating the purpose of each.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments

ProcessBuilder pb;
List<String> cmds = new ArrayList<>();
cmds.addAll(List.of(
"--add-modules=" + System.getProperty("test.modules"),
"-classpath",
System.getProperty("test.class.path"),
"-Dtest.class.path=" + System.getProperty("test.class.path"),
"-Dtest.modules=" + System.getProperty("test.modules")));
if (args.length == 0) {
cmds.addAll(
List.of("-Dtest.jdk=" + System.getProperty("test.jdk"),
"i18nEnvArg",
"Start"));
} else {
String jnuEncoding = System.getProperty("sun.jnu.encoding");
Charset dcs = jnuEncoding != null
? Charset.forName(jnuEncoding)
: Charset.defaultCharset();
Charset cs = Charset.forName("x-euc-jp-linux");
if (!dcs.equals(cs)) {
return;
}
cmds.addAll(
List.of("--add-opens=java.base/java.lang=ALL-UNNAMED",
"i18nEnvArg$Verify",
EUC_JP_TEXT));
}
pb = ProcessTools.createTestJvm(cmds);
Map<String, String> environ = pb.environment();
environ.clear();
environ.put("LANG", "ja_JP.eucjp");
if (args.length != 0) {
environ.put(EUC_JP_TEXT, EUC_JP_TEXT);
}
ProcessTools.executeProcess(pb)
.outputTo(System.out)
.errorTo(System.err)
.shouldNotContain("ERROR:");
}

public static class Verify {
private final static int maxSize = 4096;
/*
* Verify environment variable and argument are encoded by Linux's eucjp or not
*/
public static void main(String[] args) throws Exception {
Charset cs = Charset.forName("x-euc-jp-linux");
byte[] euc = (EUC_JP_TEXT + "=" + EUC_JP_TEXT).getBytes(cs);
byte[] eucjp = "LANG=ja_JP.eucjp".getBytes(cs);
String s = System.getenv(EUC_JP_TEXT);
if (!EUC_JP_TEXT.equals(s)) {
System.err.println("ERROR: getenv() returns unexpected data");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the unexpected data s to the output string.

}
if (!EUC_JP_TEXT.equals(args[0])) {
System.err.print("ERROR: Unexpected argument was received: ");
for(char ch : EUC_JP_TEXT.toCharArray()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the expected value, the previous "Unexpected" labeling might be mis-understood.

System.err.printf("\\u%04X", (int)ch);
}
System.err.print("<->");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer if labeled as the actual/incorrect value and on a separate line.

for(char ch : args[0].toCharArray()) {
System.err.printf("\\u%04X", (int)ch);
}
System.err.println();
}
Class<?> cls = Class.forName("java.lang.ProcessEnvironment");
Method environ_mid = cls.getDeclaredMethod("environ");
environ_mid.setAccessible(true);
byte[][] environ = (byte[][]) environ_mid.invoke(null,
(Object[])null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I like this piece, invoking a package private method of ProcessEnvironment class. Can we simply verify the result of System.getenv(EUC_JP_TEXT)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 3 processes required for this testing.

  1. Start test process on ja_JP.eucjp locale
  2. Set the test data by encoder
  3. Verify the encoded data by decoder

To verify the encoded data, I think I need to check the byte data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to verify the intermediate byte encoding? Could we simply compare the text given to environ.put() in Start process and System.getenv() in Verify process match?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @naotoj .
I think if 2nd process' encoder (like UTF-8) and 3rd process' decoder are same encoding, System.getenv() returns expected data.
Actually the testcase checks 3 parts on Verify process:

  1. Expected environment variable is defined or not (it uses System.getenv())
  2. Expected argument is received or not
  3. Expected environment variable is encoded by proper encoding

When I ran this testcase with jdk18.0.1, I got following result:

Unexpected argument was received: \u6F22\u5B57<->\u7FB2\u221A\uFFFD
Unexpected environment variables: \xE6\xBC\xA2\xE5\xAD\x97\x3D\xE6\xBC\xA2\xE5\xAD\x97

It means 1st test was passed, 2nd and 3rd test were failed.
I don't think environment variable issue can be seen without 3rd test.
Please let me know if you find out another way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the test is very brittle; I'm pretty sure it will fail on AIX that adds its own environment variables.
It should not fail if it finds the two entries it expects. It should ignore other entries.

I don't see what value it has over checking the entries from System.getEnv(), please elaborate.

HexFormat hf = HexFormat.of().withUpperCase().withPrefix("\\x");
byte[] ba = new byte[maxSize];
for(int i = 0; i < environ.length; i += 2) {
ByteBuffer bb = ByteBuffer.wrap(ba);
bb.put(environ[i]);
bb.put((byte)'=');
bb.put(environ[i+1]);
byte[] envb = Arrays.copyOf(ba, bb.position());
if (Arrays.equals(eucjp, envb)) continue;
if (!Arrays.equals(euc, envb)) {
System.err.println("ERROR: Unexpected environment variables: " +
hf.formatHex(envb));
}
}
}
}
}