-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing OS specific exception messages that vary with locale? #42688
Comments
Pinging @elastic/es-core-infra |
I think the problem here is the use of |
See my comment there. Further, this issue is about messages that are coming from the operating system, not from within the JDK. For example, when an // fallback to the more general exception
return new FileSystemException(file, other, errorString()); The error string is obtained from:
and we see that the JDK is invoking the native method With: #include <locale.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv) {
printf("%s\n",
strerror_l(
20,
newlocale(
LC_CTYPE_MASK |
LC_NUMERIC_MASK |
LC_TIME_MASK |
LC_COLLATE_MASK |
LC_MONETARY_MASK |
LC_MESSAGES_MASK,
"",
(locale_t)0)));
} we see the output from varying the locale:
and if the default is set to
This means that the JDK has no information to provide a different error message in |
I'm marking this as |
One option is to restrict the locales (at least, the values of I expect that most (all?) of the messages we care about could be triggered synthetically with sufficient effort, much like Jason did above for |
We discussed this within core/infra and wondered if we should do similar to what @DaveCTurner is suggesting, except make it internal to Elasticsearch. That is, instead of requiring users set their locales with a limited set of supported values, we set So, the concrete proposal we would have then is to set LANG within our scripts, as well as with unit tests. @jasontedor @DaveCTurner thoughts? |
This is a good and pragmatic solution. I like it. It doesn't help on Windows, but I'm willing to accept that tradeoff since I don't know how much of our current OS message dependence has been tuned for Windows to begin with. |
Can we safely set I'd like to hear from @sajjadwahmed just to make sure that we're ok running in a single locale going forwards from the PM point of view too. It'll mean that non-English-speaking sysadmins may fail to recognise some familiar OS error messages. It might also exclude us from procurement processes that include constraints on how OS error messages are displayed to the admin. I'm not sure if that's a thing, but I've definitely seen questions about localisation in that context in previous jobs. Re. Windows, today we check for |
I believe if you leave off the locale you get parsing in the empty string
locale. I don't have any comments on the other points though.
https://github.com/elastic/elasticsearch/blob/ffe61fb0972c6e3f501f7ced5488f9e1711f092a/server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java#L220
…On Mon, Mar 8, 2021, 03:53 David Turner ***@***.***> wrote:
Can we safely set LANG without introducing BWC problems? I would expect
that to affect date-time parsing in cases where the user doesn't specify a
locale in the mapping. LC_MESSAGES has narrower scope and should be
enough for this issue.
I'd like to hear from @sajjadwahmed <https://github.com/sajjadwahmed>
just to make sure that we're ok running in a single locale going forwards
from the PM point of view too. It'll mean that non-English-speaking
sysadmins may fail to recognise some familiar OS error messages. It might
also exclude us from procurement processes that include constraints on how
OS error messages are displayed to the admin. I'm not sure if that's a
thing, but I've definitely seen questions about localisation in that
context in previous jobs.
Re. Windows, today we check for connection was aborted and forcibly closed
which I believe are the rough Windows equivalents of the libc messages. I
don't know much about localisation on Windows but I did a bit of digging
and couldn't find a clean method for configuring it per-process (but I did
find an unclean one
<https://github.com/xupefei/Locale-Emulator/blob/ee43cb462448c5bb3e897c90c57f05215b974602/LEProc/LERegistryRedirector.cs>!).
The right way seems to be to set it per account, so maybe we can set this
up on the account as which Elasticsearch runs. @sajjadwahmed
<https://github.com/sajjadwahmed>, it'd be good to hear from you on this
front too.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#42688 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIV332GH7JDIBLBBNI3TCSGCFANCNFSM4HQ23P3A>
.
|
From a Product perspective, setting the locale (better if just Asking users to change their locale, or system locale, seems a pretty strong requirement that may impact on the overall perception of the product, and so it should be avoided. |
In some places in the codebase we parse exception messages that come from the OS. These are dependent on the locale and it means our parsing will not be successful unless they are in the en_us locale. This is done, for example, when parsing network exceptions and we want to know the cause. This is used to make determinations whether or not to retry, for example, in CCR. If we don’t retry, we treat the exception as fatal and resort to taking no further action until the user intervenes. So, this distinction is meaningful. How should we handle this?
Relates #41689
The text was updated successfully, but these errors were encountered: