Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale.getpreferredencoding() dies when setlocale fails #42982

Closed
catherinedevlin mannequin opened this issue Mar 5, 2006 · 23 comments
Closed

locale.getpreferredencoding() dies when setlocale fails #42982

catherinedevlin mannequin opened this issue Mar 5, 2006 · 23 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@catherinedevlin
Copy link
Mannequin

catherinedevlin mannequin commented Mar 5, 2006

BPO 1443504
Nosy @loewis, @birkenfeld, @ashemedai, @bitdancer
Files
  • patches-2.5.1-Linux.diff
  • locale.diff: Module/_localemodule.c patch to fix invalid locale semantics
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-05-06.08:27:57.901>
    created_at = <Date 2006-03-05.13:50:20.000>
    labels = ['type-bug', 'library']
    title = 'locale.getpreferredencoding() dies when setlocale fails'
    updated_at = <Date 2009-05-06.08:27:57.900>
    user = 'https://bugs.python.org/catherinedevlin'

    bugs.python.org fields:

    activity = <Date 2009-05-06.08:27:57.900>
    actor = 'asmodai'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-05-06.08:27:57.901>
    closer = 'asmodai'
    components = ['Library (Lib)']
    creation = <Date 2006-03-05.13:50:20.000>
    creator = 'catherinedevlin'
    dependencies = []
    files = ['8833', '13850']
    hgrepos = []
    issue_num = 1443504
    keywords = ['patch']
    message_count = 23.0
    messages = ['27684', '27685', '57964', '86856', '86857', '86897', '86900', '86905', '86909', '86911', '86983', '86985', '87036', '87037', '87038', '87039', '87040', '87051', '87052', '87079', '87308', '87315', '87321']
    nosy_count = 7.0
    nosy_names = ['loewis', 'georg.brandl', 'catherinedevlin', 'jminka', 'heikki', 'asmodai', 'r.david.murray']
    pr_nums = []
    priority = 'low'
    resolution = 'accepted'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1443504'
    versions = ['Python 2.6', 'Python 3.0']

    @catherinedevlin
    Copy link
    Mannequin Author

    catherinedevlin mannequin commented Mar 5, 2006

    I'm on Ubuntu 5.10, with Python 2.4.2-0ubuntu2, and
    when I open a terminal window and run python, I get

    >>> import locale
    >>> locale.getpreferredencoding()
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/usr/lib/python2.4/locale.py", line 417, in
    getpreferredencoding
        setlocale(LC_CTYPE, "")
      File "/usr/lib/python2.4/locale.py", line 381, in
    setlocale
        return _setlocale(category, locale)
    locale.Error: unsupported locale setting

    However, if I su - root - or even su right back to my
    own account (catherine) ! - then everything works.

    This is of concern (to me, anyway) because this error
    crashes bzr.

    I chose "Esperanto" as my language when setting up
    Ubuntu. (No, I wasn't trying to be funny - I really do
    speak Esperanto!) That may be why I found the problem,
    but I don't think this is simply a problem with flawed
    Esperanto support in Ubuntu - because the routine works
    after su is used, and because
    locale.nl_langinfo(CODESET) works fine (please read on).

    Anyway, within locale.getpreferredencoding(), line 417

    • setlocale(LC_CTYPE, "") - seems to be the problem...
    >>> locale.setlocale(locale.LC_CTYPE)
    'C'
    >>> locale.setlocale(locale.LC_CTYPE, "")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/usr/lib/python2.4/locale.py", line 381, in
    setlocale
        return _setlocale(category, locale)
    locale.Error: unsupported locale setting
    >>> locale.setlocale(locale.LC_CTYPE, None)
    'C'

    This makes me wonder if setlocale(LC_TYPE, "") is
    really so very necessary. It seems to be there to prep
    for the nl_langinfo call, but it doesn't actually seem
    strictly necessary for that call to work.

    >>> locale.nl_langinfo(locale.CODESET)
    'ANSI_X3.4-1968'

    ... I get that result whether before or after calling
    setlocale, and I get it under any account (including
    root, where setlocale does not raise an exception).

    Thus, as far as I can tell, it isn't really necessary
    to set setlocale(LC_CTYPE, "") or die trying, and
    accepting the nl_langinfo result without a
    successful setlocale(LC_CTYPE, "") would be preferable
    to raising an unhandled exception. I suggest that
    setlocale(LC_TYPE, "") be enclosed in a try block.

                try:
                    setlocale(LC_CTYPE, "")
                except:
                    None
    

    Since I don't really understand what it's doing in the
    first place, I don't know if this is really a good patch.

    Thanks!

    @catherinedevlin catherinedevlin mannequin added stdlib Python modules in the Lib dir labels Mar 5, 2006
    @jminka
    Copy link
    Mannequin

    jminka mannequin commented Mar 17, 2006

    Logged In: YES
    user_id=1116964

    I've got the same problem with bzr on Gentoo. If LANG or
    LC_ALL consists '/', then bzr has the problem (e.g. en_US is
    ok, en_US/ISO8859-1 is wrong).

    @heikki
    Copy link
    Mannequin

    heikki mannequin commented Nov 29, 2007

    We noticed this too in Chandler. We worked around this issue with the
    patch I am attaching. Maybe not a correct fix, though.

    @devdanzin devdanzin mannequin added type-bug An unexpected behavior, bug, or error labels Apr 7, 2009
    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented Apr 30, 2009

    Shouldn't the fallback be to setlocale(LC_CTYPE, "C") instead of
    silently passing, though?

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented Apr 30, 2009

    You don't want to completely nix the setlocale(LC_CTYPE, "") call
    though. The "" denotes to grab the native environment, in other words,
    to grab whatever the current user's LC_CTYPE environment variable is set
    to (see locale -a) and then set the program's LC_CTYPE to that.

    Of course, this might be set to something that might be valid (e.g.
    cy_GB.ISO8859-15), but has no matching entry in /usr/share/locale (or
    wherever your system provides it) and as such it fails.

    Reading SUS (The Single Unix Specification) I see that it explicitly says:

    "Upon successful completion, setlocale() shall return the string
    associated with the specified category for the new locale. Otherwise,
    setlocale() shall return a null pointer and the locale of the process is
    not changed."

    So the patch seems to be correct actually. We try to setlocale(LC_CTYPE,
    "") to grab a locale from the environment to set LC_CTYPE, but we fail
    for whatever, so we should just pass since we should not adjust LC_CTYPE.

    Mmm, but it seems setlocale() in locale.py is not adhering to the
    standard by not allowing the "" case properly. _parse_localename() is
    being overly pedantic about this by raising a ValueError.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 1, 2009

    The patch looks fine to me.

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 1, 2009

    OK, then I'll apply it.

    But I am curious about your thoughts about the _parse_localename()
    method being called from setlocale() raising a ValueError, whereas a
    setlocale(LC_CTYPE, "") should not fail at all, which it currently does
    if the locale in the environment is not valid.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 1, 2009

    But I am curious about your thoughts about the _parse_localename()
    method being called from setlocale() raising a ValueError, whereas a
    setlocale(LC_CTYPE, "") should not fail at all, which it currently does
    if the locale in the environment is not valid.

    I fail to see how this is related to this issue. In the OP's report,
    the exception was locale.Error, not ValueError, and _parse_localename
    isn't ever being called from setlocale() - why do you think it is being
    called? AFAICT, the only callers of _parse_localename are getlocale and
    getdefaultlocale (which, IMO, should both be deprecated).

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 1, 2009

    Sorry, I was actually off by a method last night.

    It turns out the problem lies in _localemodule.c.

    Let me start with the basic question: is our setlocale() supposed to
    mirror POSIX' operations/semantics?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 1, 2009

    Let me start with the basic question: is our setlocale() supposed to
    mirror POSIX' operations/semantics?

    Yes, it is.

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 2, 2009

    I will first point out where our current implementation is broken, in my
    opinion of course, after which I propose a small patch.

    Both C90 (7.4.1.1) and C99 (7.11.1.1) state:

    "A value of "C" for locale specifies the minimal environment for C
    translation; a value of "" for locale specifies the locale-specific
    native environment. Other implementation-defined strings may be passed
    as the second argument to setlocale.

    [...]

    If a pointer to a string is given for locale and the selection can be
    honored, the setlocale function returns a pointer to the string
    associated with the specified category for the new locale. If the
    selection cannot be honored, the setlocale function returns a null
    pointer and the program’s locale is not changed."

    Note that neither C or POSIX defines any errors or sets errno or the
    likes. It simply returns a null pointer.

    In C you would typically start your program with a call like:

    #include <locale.h>
    
    int main(int argc, char *argv[]) {
    	setlocale(LC_CTYPE, "");
    ...
    

    }

    This will try to set the locale to what the native environment
    specifies, but will not error out if the value, if any, it receives does
    not map to a valid locale. It will return a null pointer if it cannot
    set the locale. Execution continues and the locale is set to the default
    "C".

    Our current behaviour in Python does not adhere to these semantics. To
    illustrate:

    # Obvious non-existing locale
    >>> from locale import setlocale, LC_CTYPE
    >>> setlocale(LC_CTYPE, 'B')
    Error: unsupported locale setting
    
    # Valid locale, but not available on my system
    >>> from os import getenv
    >>> from locale import setlocale, LC_CTYPE
    >>> getenv('LANG')
    >>> 'cy_GB.UTF-8'
    >>> setlocale(LC_CTYPE, '')
    Error: unsupported locale setting

    Neither Perl or PHP throw any error when setlocale() is passed an
    invalid locale. Python is being unnecessarily disruptive by throwing an
    error.

    As such I think PyLocale_setlocale() in Modules/_localemodule.c needs to
    be adjusted. Patch against trunk enclosed. This changes the semantics of
    our current implementation to the following:

    >>> from locale import setlocale, LC_CTYPE
    >>> rv = setlocale(LC_CTYPE, 'B')
    >>> type(rv)
    <class 'NoneType'>
    >>> rv = setlocale(LC_CTYPE, 'C')
    >>> type(rv)
    <class 'str'>
    >>> rv
    'C'

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 2, 2009

    If a pointer to a string is given for locale and the selection can be
    honored, the setlocale function returns a pointer to the string
    associated with the specified category for the new locale. If the
    selection cannot be honored, the setlocale function returns a null
    pointer and the program’s locale is not changed."

    Note that neither C or POSIX defines any errors or sets errno or the
    likes. It simply returns a null pointer.

    Still, this is considered as an error case.

    #include <locale.h>

    int main(int argc, char *argv[]) {
    setlocale(LC_CTYPE, "");

    ...
    }

    This will try to set the locale to what the native environment
    specifies, but will not error out if the value

    Yes, but that's a bug in the C code, which fails to check the
    return value of setlocale. The fact that the bug is wide-spread
    doesn't make it any better.

    As such I think PyLocale_setlocale() in Modules/_localemodule.c needs to
    be adjusted

    -1. Errors should never pass silently. That's the whole point of exceptions.

    @birkenfeld
    Copy link
    Member

    Interestingly, my setlocale(3p) man page says:

    """
    ERRORS
    No errors are defined.
    """

    So isn't it debatable if returning the NULL pointer really is an error?

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 3, 2009

    I asked that as well on the POSIX/SUS list and Don Cragun responded with:

    "If you make the last argument to setlocale() be a pointer to
    unallocated memory, implementations would be allowed to set errno to
    EFAULT and terminate the process with a core dump even when this section
    says "No errors are defined." An implementation could also set errno to
    ENOENT (e.g., if the "B" locale wasn't known) or to EINVAL (e.g., if the
    "B" locale existed but the LC_CTYPE portion of the locale was not in the
    proper format). That wording just means that the standard doesn't
    require implementations to detect errors like these nor to report
    specific error values for different possible errors."

    On the subject whether or not returning a null pointer should be
    considered he said:

    "The standard is silent on this issue.
    Why does it make any difference to an application?
    If setlocale(LC_CTYPE, "B") returns a null pointer, the LC_CTYPE portion
    of the locale was not changed. If setlocale(LC_CTYPE, "B") does not
    return a null pointer, the LC_CTYPE portion of the locale was
    successfully changed."

    I am just wondering why we want to be quite different from how many
    other languages are approaching the issue. Sure enough, we can use a
    try: construct, but it kind of defeats the principle of least
    astonishment by being different from the rest on this issue.

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 3, 2009

    On the subject whether or not returning a null pointer should be
    considered he said:

    ->

    On the subject whether or not returning a null pointer should be
    considered an error he said:

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 3, 2009

    Georg pointed out a mistake I introduced in my patch, updated now.

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 3, 2009

    Really correct this time.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 3, 2009

    """
    ERRORS
    No errors are defined.
    """

    So isn't it debatable if returning the NULL pointer really is an error?

    As Jeroen reports, this really means two different things
    a) "no errors" really means "no errno codes". Whether or not
    an error may occur is an independent issue.
    b) "are defined" really means that POSIX doesn't define any
    standard errno codes; the system may indeed still set errno
    (C99, 7.5p3)

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 3, 2009

    I am just wondering why we want to be quite different from how many
    other languages are approaching the issue.

    Because we have exceptions, and they don't. Would you also propose
    that open() should return None, just because fopen(3) returns NULL?

    While it may be debatable whether applications care about the error
    when passing "" as the locale, there is also the second case where
    applications pass an explicit locale

      setlocale(locale.LC_ALL, "de_DE@euro")

    When they do that, they surely want to be told if this actually
    worked.

    Sure enough, we can use a
    try: construct, but it kind of defeats the principle of least
    astonishment by being different from the rest on this issue.

    There is also the backwards compatibility issue: your change
    will break existing code.

    @bitdancer
    Copy link
    Member

    On Sun, 3 May 2009 at 08:55, Jeroen Ruigrok van der Werven wrote:

    I am just wondering why we want to be quite different from how many
    other languages are approaching the issue. Sure enough, we can use a
    try: construct, but it kind of defeats the principle of least
    astonishment by being different from the rest on this issue.

    Only if you imagine that the principal applies to expectations inherited
    from other languages. In a Python context, which is what the principle
    actually refers to, it would be astonishing if the error were to be
    silently ignored.

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 6, 2009

    Committed the initial patch in r72375 for trunk and r72376 for py3k.

    Any other branches that would need the merge? 3.0?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 6, 2009

    It looks like a bug fix to me - so it would apply to all four active
    branches.

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented May 6, 2009

    Committed in r72381 and r72395.

    @ashemedai ashemedai mannequin closed this as completed May 6, 2009
    @ashemedai ashemedai mannequin closed this as completed May 6, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants