Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set -Dsun.jnu.encoding=UTF-8 in elasticsearch.in.sh #15552

Closed
wants to merge 1 commit into from

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Dec 18, 2015

Currently we set -Dfile.encoding=UTF-8 in elasticsearch.in.sh (which affects the content of a file) but not the dual sun.jnu.encoding which affects the creation of file names. The value of sun.jnu.encoding is usually determined on Unix systems by the LANG environment variable.

With UTF-8 index names, this can lead to interesting issues. I have observed this first when setting up my build server as the REST test rest-api-spec/src/main/resources/rest-api-spec/test/index/10_with_id.yaml failed. This test creates an index with name test-weird-index-中文.

To reproduce:

  • Change the following line in elasticsearch.in.sh:
    JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8" to JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=ANSI_X3.4-1968"
  • start elasticsearch
  • Execute curl -XPOST localhost:9200/test-weird-index-中文
  • see an endless stream of exceptions fly by
[2015-12-18 22:04:27,637][WARN ][gateway                  ] [One Above All] [test-weird-index-¦ᄌᆳ₩ヨヌ][4]: failed to list shard for shard_started on node [0SZxDwUvSoSdMLkVhBRsUQ]
FailedNodeException[Failed node [0SZxDwUvSoSdMLkVhBRsUQ]]; nested: RemoteTransportException[[One Above All][127.0.0.1:9300][internal:gateway/local/started_shards[n]]]; nested: ElasticsearchException[failed to load started shards]; nested: InvalidPathException[Malformed input or input contains unmappable characters: test-weird-index-¦ᄌᆳ₩ヨヌ];
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:187)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$700(TransportNodesAction.java:94)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$2.handleException(TransportNodesAction.java:160)
    at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:821)
    at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:799)
    at org.elasticsearch.transport.TransportService$4.onFailure(TransportService.java:361)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: RemoteTransportException[[One Above All][127.0.0.1:9300][internal:gateway/local/started_shards[n]]]; nested: ElasticsearchException[failed to load started shards]; nested: InvalidPathException[Malformed input or input contains unmappable characters: test-weird-index-¦ᄌᆳ₩ヨヌ];
Caused by: ElasticsearchException[failed to load started shards]; nested: InvalidPathException[Malformed input or input contains unmappable characters: test-weird-index-¦ᄌᆳ₩ヨヌ];
    at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:154)
    at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:59)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:211)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:207)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: test-weird-index-¦ᄌᆳ₩ヨヌ
    at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
    at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
    at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
    at sun.nio.fs.AbstractPath.resolve(AbstractPath.java:53)
    at org.elasticsearch.env.NodeEnvironment$NodePath.resolve(NodeEnvironment.java:91)
    at org.elasticsearch.env.NodeEnvironment$NodePath.resolve(NodeEnvironment.java:84)
    at org.elasticsearch.env.NodeEnvironment.availableShardPaths(NodeEnvironment.java:634)
    at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:125)
    ... 8 more
``

@ywelsch ywelsch added >bug :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts labels Dec 18, 2015
@rmuir
Copy link
Contributor

rmuir commented Dec 18, 2015

We can't set this: it will break systems like windows. Its something we shouldn't touch.

Things like index names (which are like identifiers) cannot safely be unicode filenames without serious effort: windows is case sensitive, OS X is even worse (doing unicode normalization and other things). some windows filesystems like FAT32 simply aren't encoded in unicode but instead use platform encoding.

Even if we went to crazy extremes to get all this stuff right, some filesystems just can't encode unicode names, and the JDK still struggles (e.g. https://bugs.openjdk.java.net/browse/JDK-7130915): it won't be reliable.

restrict to basic ascii...

@ywelsch
Copy link
Contributor Author

ywelsch commented Dec 18, 2015

I'm ok with restricting index names to ASCII. I'll close this PR and open an issue for ASCII index names.

@ywelsch ywelsch closed this Dec 18, 2015
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants