Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing OS specific exception messages that vary with locale? #42688

Open
jasontedor opened this issue May 30, 2019 · 10 comments
Open

Parsing OS specific exception messages that vary with locale? #42688

jasontedor opened this issue May 30, 2019 · 10 comments
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team

Comments

@jasontedor
Copy link
Member

jasontedor commented May 30, 2019

In some places in the codebase we parse exception messages that come from the OS. These are dependent on the locale and it means our parsing will not be successful unless they are in the en_us locale. This is done, for example, when parsing network exceptions and we want to know the cause. This is used to make determinations whether or not to retry, for example, in CCR. If we don’t retry, we treat the exception as fatal and resort to taking no further action until the user intervenes. So, this distinction is meaningful. How should we handle this?

Relates #41689

@jasontedor jasontedor added :Core/Infra/Core Core issues without another label discuss labels May 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@rjernst
Copy link
Member

rjernst commented May 31, 2019

I think the problem here is the use of toString(). getMessage() does not return a localized message, so we should be able to "depend" on it, while toString() internally calls getLocalizedMessage(). I've opened #42742 to address the issue that spurred this discussion.

@jasontedor
Copy link
Member Author

See my comment there. Further, this issue is about messages that are coming from the operating system, not from within the JDK.

For example, when an IOException is thrown based on an error invoking an OS library call, we end up in sun.nio.fs.UnixException.translateToIOException. There you see:

        // fallback to the more general exception
        return new FileSystemException(file, other, errorString());

The error string is obtained from:

    String errorString() {
        if (msg != null) {
            return msg;
        } else {
            return Util.toString(UnixNativeDispatcher.strerror(errno()));
        }
    }

and we see that the JDK is invoking the native method strerror to get the OS error message.

With:

#include <locale.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) {
  printf("%s\n",
    strerror_l(
      20,
      newlocale(
        LC_CTYPE_MASK |
        LC_NUMERIC_MASK |
        LC_TIME_MASK |
        LC_COLLATE_MASK |
        LC_MONETARY_MASK |
        LC_MESSAGES_MASK,
        "",
        (locale_t)0)));
}

we see the output from varying the locale:

centos-7:~/temp$ LANG=en_US.UTF-8 ./a.out 
Not a directory
centos-7:~/temp$ LANG=zh_CN.UTF-8 ./a.out 
不是目录
centos-7:~/temp$ 

and if the default is set to zh_CN.UTF-8:

centos-7:~/temp$ ./a.out 
不是目录

This means that the JDK has no information to provide a different error message in getMessage and getLocalizedMessage, the message is coming from the operating system based on the locale settings.

@jasontedor jasontedor removed the discuss label Jun 6, 2019
@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@gwbrown
Copy link
Contributor

gwbrown commented Dec 16, 2020

I'm marking this as team-discuss to get input from the team on if anyone been able to come up with a better strategy to address this in the past 18 months, and if not, what should we do with this issue? This is clearly a concern we would like to address, but don't really have any reasonable options for doing so.

@gwbrown gwbrown added team-discuss and removed needs:triage Requires assignment of a team area label labels Dec 16, 2020
@DaveCTurner
Copy link
Contributor

One option is to restrict the locales (at least, the values of LC_MESSAGES) in which Elasticsearch can be run. Of course this is rather hostile to users in locales that don't speak one of this restricted list of languages, but at the same time we must recognise that today there's essentially no localisation of messages generated by Elasticsearch itself and I'm unsure of the status of localisation of messages emitted by the JDK too. For instance, does this "Connection reset by peer" message ever get translated?

I expect that most (all?) of the messages we care about could be triggered synthetically with sufficient effort, much like Jason did above for Not a directory. I think it'd be an improvement to do so in tests to verify that the messages we're capturing are really the ones returned on supported OSs in supported locales that represent the situations about which we care.

@rjernst
Copy link
Member

rjernst commented Mar 6, 2021

We discussed this within core/infra and wondered if we should do similar to what @DaveCTurner is suggesting, except make it internal to Elasticsearch. That is, instead of requiring users set their locales with a limited set of supported values, we set LANG=en_US.UTF-8 inside elasticsearch-env. This would ensure we have the English versions of low level messages we expect, but do not trouble the user with setting it at all. As Dave said, there is no localization of messages by Elasticsearch, so any situation a user is in now with partially localized messages from the jdk/OS isn't particularly useful, and actually probably more troublesome if trying to search for an error line that may contain both English and another language.

So, the concrete proposal we would have then is to set LANG within our scripts, as well as with unit tests. @jasontedor @DaveCTurner thoughts?

@jasontedor
Copy link
Member Author

This is a good and pragmatic solution. I like it. It doesn't help on Windows, but I'm willing to accept that tradeoff since I don't know how much of our current OS message dependence has been tuned for Windows to begin with.

@DaveCTurner
Copy link
Contributor

Can we safely set LANG without introducing BWC problems? I would expect that to affect date-time parsing in cases where the user doesn't specify a locale in the mapping. LC_MESSAGES has narrower scope and should be enough for this issue.

I'd like to hear from @sajjadwahmed just to make sure that we're ok running in a single locale going forwards from the PM point of view too. It'll mean that non-English-speaking sysadmins may fail to recognise some familiar OS error messages. It might also exclude us from procurement processes that include constraints on how OS error messages are displayed to the admin. I'm not sure if that's a thing, but I've definitely seen questions about localisation in that context in previous jobs.

Re. Windows, today we check for connection was aborted and forcibly closed which I believe are the rough Windows equivalents of the libc messages. I don't know much about localisation on Windows but I did a bit of digging and couldn't find a clean method for configuring it per-process (but I did find an unclean one!). The right way seems to be to set it per account, so maybe we can set this up on the account as which Elasticsearch runs. @sajjadwahmed, it'd be good to hear from you on this front too.

@nik9000
Copy link
Member

nik9000 commented Mar 8, 2021 via email

@bytebilly
Copy link
Contributor

From a Product perspective, setting the locale (better if just LC_MESSAGES) automatically to the supported value for the Elasticsearch process only is an acceptable tradeoff.

Asking users to change their locale, or system locale, seems a pretty strong requirement that may impact on the overall perception of the product, and so it should be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

7 participants