Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary code execution in gettext.c2py #72749

Closed
CarlEkerot mannequin opened this issue Oct 30, 2016 · 32 comments
Closed

Arbitrary code execution in gettext.c2py #72749

CarlEkerot mannequin opened this issue Oct 30, 2016 · 32 comments
Assignees
Labels
3.7 only security fixes deferred-blocker stdlib Python modules in the Lib dir type-security A security issue

Comments

@c81f654c-6959-4ba7-821a-1c7079e63a37
Copy link
Mannequin

CarlEkerot mannequin commented Oct 30, 2016

BPO 28563
Nosy @loewis, @tiran, @serhiy-storchaka, @timgraham, @zhangyangyu, @CarlEkerot
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • gettext_c2py.patch
  • gettext_c2py_func.patch: Based on Xiang Zhang's patch
  • gettext_c2py_v2.patch
  • gettext-parse-plural.patch
  • gettext-parse-plural-2.patch
  • gettext-non-integer-plural.patch
  • gettext-non-integer-plural-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-11-14.17:39:32.157>
    created_at = <Date 2016-10-30.16:58:41.674>
    labels = ['type-security', 'deferred-blocker', '3.7', 'library']
    title = 'Arbitrary code execution in gettext.c2py'
    updated_at = <Date 2017-03-31.16:36:31.156>
    user = 'https://github.com/CarlEkerot'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:31.156>
    actor = 'dstufft'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-11-14.17:39:32.157>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2016-10-30.16:58:41.674>
    creator = 'Carl Ekerot'
    dependencies = []
    files = ['45345', '45349', '45373', '45381', '45387', '45481', '45482']
    hgrepos = []
    issue_num = 28563
    keywords = ['patch']
    message_count = 32.0
    messages = ['279734', '280037', '280048', '280082', '280084', '280103', '280104', '280107', '280112', '280118', '280119', '280120', '280122', '280157', '280191', '280196', '280211', '280276', '280287', '280289', '280293', '280299', '280307', '280337', '280780', '280781', '280784', '280789', '280791', '280792', '280800', '280802']
    nosy_count = 7.0
    nosy_names = ['loewis', 'christian.heimes', 'python-dev', 'serhiy.storchaka', 'Tim.Graham', 'xiang.zhang', 'Carl Ekerot']
    pr_nums = ['552']
    priority = 'deferred blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue28563'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @c81f654c-6959-4ba7-821a-1c7079e63a37
    Copy link
    Mannequin Author

    CarlEkerot mannequin commented Oct 30, 2016

    The c2py-function in the gettext module is seriously flawed in many ways due
    to its use of eval to create a plural function:

    return eval('lambda n: int(%s)' % plural)

    My first discovery was that nothing prevents an input plural string that
    resembles a function call:

    gettext.c2py("n()")(lambda: os.system("sh"))

    This is of course a low risk bug, since it requires control of both the plural
    function string and the argument.

    Gaining arbitrary code execution using only the plural function string requires
    that the security checks are bypassed. The security checks utilize the tokenize
    module and makes sure that no NAME-tokens that are not "n" exist in the string.
    However, it does not check if the parser succeeds without any token.ERRORTOKEN
    being present. Hence, the following input will pass the security checks:

    gettext.c2py( '"(eval(foo) && ""' )(0)

    ----> 1 gettext.c2py( '"(eval(foo) && ""' )(0)
    gettext.pyc in <lambda>(n)
    NameError: global name 'foo' is not defined

    It will pass since it recognizes the entire input as a STRING token, and
    subsequently fails with an ERRORTOKEN.

    Passing a string in the argument to eval will however break the exploit since
    the parser will read the start-of-string in the eval argument as end-of-string,
    and the eval argument will be read as a NAME-token.

    Instead of passing a string to eval, we can build a string from characters in
    the docstrings available in the context of the gettext module:

       gettext.c2py('"(eval('
           'os.__doc__[152:155] + ' # os.
           'os.__doc__[46:52] + '   # system
           'os.__doc__[318] + '     # (
           'os.__doc__[55] + '      # '
           'os.__doc__[10] + '      # s
           'os.__doc__[42] + '      # h
           'os.__doc__[55] + '      # '
           'os.__doc__[329]'        # )
           ') && ""')(0)

    This will successfully spawn a shell in Python 2.7.11.

    Bonus: With the new string interpolation in Python 3.7, exploiting gettext.c2py
    becomes trivial:

    gettext.c2py('f"{os.system(\'sh\')}"')(0)

    The tokenizer will recognize the entire format-string as just a string, thus
    bypassing the security checks.

    To mitigate these vulnerabilities, eval should be avoided by implementing a
    custom parser for the gettext plural function DSL.

    @c81f654c-6959-4ba7-821a-1c7079e63a37 CarlEkerot mannequin added 3.7 only security fixes stdlib Python modules in the Lib dir type-security A security issue labels Oct 30, 2016
    @zhangyangyu
    Copy link
    Member

    gettext_c2py.patch tries to avoid the problem. It still uses eval but manually parse the expression using tokens extracted from gettext. Tests are passed.

    But there is still a problem. Both patched and original c2py fail to handle nested ternary operator. They respect the right associative but fails to handle expressions like '1?2?3:4:5'.

    @c81f654c-6959-4ba7-821a-1c7079e63a37
    Copy link
    Mannequin Author

    CarlEkerot mannequin commented Nov 4, 2016

    It doesn't solve the case when an identifier or number is used as a function:

       >>> import os
       >>> gettext.c2py("n()")(lambda: os.system("sh"))
       $ 
       0
       >>> gettext.c2py("1()")(0)
       Traceback (most recent call last):
         File "<stdin>", line 1, in <module>
         File "<string>", line 1, in <lambda>
       TypeError: 'int' object is not callable

    This is more of an unintended behavior than a security issue.

    Xiang Zhang: I've created a patch based on yours which handles the above case. I've also added a corresponding test case.

    Imo it would be even better if we could avoid eval. One possible (and safe) way would be to construct a safe subset of Python using the ast module. This would however still require that the C-style syntax is translated to Python. As you mention, there are issues parsing and translating nested ternary operators, and I doubt it will be possible to cover all cases with the regex replace utilized today.

    @serhiy-storchaka
    Copy link
    Member

    But there is still a problem. Both patched and original c2py fail to handle nested ternary operator. They respect the right associative but fails to handle expressions like '1?2?3:4:5'.

    What if make repeated replacements with regular expression r'([^?:]?)\?([^?:]?):([^?:]*)'?

    @serhiy-storchaka
    Copy link
    Member

    It doesn't solve the case when an identifier or number is used as a function:

    In the first case we should convert an argument to integer.

        ns = {}
        exec('''if True:
            def func(arg):
                n = int(arg)
                return {}
            '''.format(plural), ns)
        return ns['func']

    Or raise an exception if argument is not integer.

    In the second case I think we can just left it as is. This is not valid C expression.

    Or we can add try/except in above code and convert TypeError to ValueError if this is more preferable exception.

    @zhangyangyu
    Copy link
    Member

    gettext.c2py("n()")(lambda: os.system("sh"))
    gettext.c2py("1()")(0)

    Empty parentheses should be disallowed. Function calls are not allowed in plural expression. And non-integer argument should be disallowed either, just as Serhiy's example shows.

    What if make repeated replacements with regular expression r'([^?:]?)\?([^?:]?):([^?:]*)'?

    How does it work for '1?2:3?4:5'? :-( I am considering a parser.

    @serhiy-storchaka
    Copy link
    Member

    How does it work for '1?2:3?4:5'?

    '1?2:3?4:5' -> '(2 if 1 else 3)?4:5' -> '(4 if (2 if 1 else 3) else 5'

    But there are other problems. Precedence of some operators is different in C and Python. Chained comparison in Python cause different result that in C (e.g. '2 == 2 == 2'). Seems there is no other way besides a simple parser.

    gettext_c2py.patch itself LGTM for fixing security issue, but tests are needed. It would be nice to make c2py() working with any expressions, but if this is too hard, this can be left for other issue. I'm going to commit a variant of gettext_c2py.patch before 3.6 release if there will be not better patches.

    @c81f654c-6959-4ba7-821a-1c7079e63a37
    Copy link
    Mannequin Author

    CarlEkerot mannequin commented Nov 5, 2016

    Verified gettext.c2py with gettext_c2py.patch applied agains the plural forms actually used in localization, listed over at http://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html. I tested all of the none-trivial forms, and from what I can tell they generate valid syntax and are correct.

    @zhangyangyu
    Copy link
    Member

    '1?2:3?4:5' -> '(2 if 1 else 3)?4:5' -> '(4 if (2 if 1 else 3) else 5'

    This is not right. It's right associative so it should be

    1?2:(3?4:5) -> 1?2:(4 if 3 else 5) -> 2 if 1 else (4 if 3 else 5)

    It would be nice to make c2py() working with any expressions, but if this is too hard, this can be left for other issue.

    Agree. But I am interested in trying.

    gettext_c2py.patch itself LGTM for fixing security issue, but tests are needed.

    It gets drawbacks so I don't include tests. I'll add in next try.

    @tiran
    Copy link
    Member

    tiran commented Nov 5, 2016

    The gettext module might be vulnerable to f-string attacks, too. Also see bpo-18317.

    @c81f654c-6959-4ba7-821a-1c7079e63a37
    Copy link
    Mannequin Author

    CarlEkerot mannequin commented Nov 5, 2016

    The gettext module might be vulnerable to f-string attacks

    It is. See the example in the first comment:

    gettext.c2py('f"{os.system(\'sh\')}"')(0)

    This vulnerability seems to be solved in Xiang's patch. The DoS aspect is interesting though, since there's no constraints against malicious use of the power-operator, say "9**9**9**..**9". This too would be solved by implementing a parser with only simple arithmetics.

    @tiran
    Copy link
    Member

    tiran commented Nov 5, 2016

    Argh, sorry. I meant to write "The gettext module might be vulnerable to more than f-string attacks.".

    May I suggest that you have a look at my old patch? It uses an AST visitor to inspect the AST of a gettext plural expression. It allows only a limited set of AST types as well as limited amount of expressions. I consider it a superior solution and a fix for more generic attacks.

    I haven't tested my patch with f-strings yet. It either refuses f-strings FormattedValue already or can be easily modified to reject it.

    @zhangyangyu
    Copy link
    Member

    Christian, I think our patches are quite similar in function. They only allow limited tokens.

    I consider it a superior solution and a fix for more generic attacks

    Mine now still allows **. But it can be easily fixed.

    But both our patches still translate a C expression to Python and still suffer from nested ternary operator and different semantics between C and Python, e.g. (2==2==2 as Serhiy notes). :-( I plan to try a simple parser.

    @zhangyangyu
    Copy link
    Member

    gettext_c2py_v2.patch implements a simple C expression parser. More tests are included.

    Carl, hope you are willing to test it.

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 6, 2016
    @c81f654c-6959-4ba7-821a-1c7079e63a37
    Copy link
    Mannequin Author

    CarlEkerot mannequin commented Nov 7, 2016

    Judging by the code, this seems to be a much more rigid implementation. I've only run the unit tests and some variations of my initial examples, and everything seems to work as intended. Will look at it more closely this afternoon.

    One thing that caught my attention in the patch is that gettext.c2py is removed entirely. I know that this function is not exposed in the docs or in __all__, but it still has "public scope" and it's likely used directly in the wild (googling the signature confirms this). Perhaps it should give a DeprectationWarning and delegate to _Plural?

    @zhangyangyu
    Copy link
    Member

    Perhaps it should give a DeprectationWarning and delegate to _Plural?

    I hold a conservative opinion about this. c2py in my mind should be a inner help method. It's not documented so if there are users using it, they are risking changes. And as reported here, it's buggy and dangerous.

    @serhiy-storchaka
    Copy link
    Member

    Sorry Xiang, but your patch looks overcomplicated to me. Too much methods, decorators, classes, too much strange names.

    Here is simpler implementation with using only one recursive function.

    @zhangyangyu
    Copy link
    Member

    Sorry Xiang, but your patch looks overcomplicated to me. Too much methods, decorators, classes, too much strange names.

    It's fine. That's a Pratt parser. Yes, the names are strange. Your patch looks more simpler. I left several comments.

    @serhiy-storchaka
    Copy link
    Member

    Just for reference:

    https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
    http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y

    Yet one security issue is that too deep recursion can cause MemoryError or even a crash in Python compiler. My patch creates too much nested parenthesis. Updated patch minimize using parenthesis to minimum and adds guards against too deep recursion.

    @zhangyangyu
    Copy link
    Member

    LGTM. And I expect there could be a comment about the special decimal number.

    @serhiy-storchaka
    Copy link
    Member

    What a comment you need Xiang? Isn't existing comment enough?

    @c81f654c-6959-4ba7-821a-1c7079e63a37
    Copy link
    Mannequin Author

    CarlEkerot mannequin commented Nov 8, 2016

    Looks good to me. It behaves as intended on every input I can think of.

    @zhangyangyu
    Copy link
    Member

    What a comment you need Xiang? Isn't existing comment enough?

    Serhiy, I mean the case a number starting with 0, e.g. 0123. The plural form is a C expression and in C 0123 is an octal number. c2py now interprets it as a decimal number.

    @1762cc99-3127-4a62-9baf-30c3d0f51ef7
    Copy link
    Mannequin

    python-dev mannequin commented Nov 8, 2016

    New changeset e0cc3fadd7b3 by Serhiy Storchaka in branch '2.7':
    Issue bpo-28563: Fixed possible DoS and arbitrary code execution when handle
    https://hg.python.org/cpython/rev/e0cc3fadd7b3

    New changeset 7e66c5dc4218 by Serhiy Storchaka in branch '3.3':
    Issue bpo-28563: Fixed possible DoS and arbitrary code execution when handle
    https://hg.python.org/cpython/rev/7e66c5dc4218

    New changeset 730cf8cdf02c by Serhiy Storchaka in branch '3.4':
    Issue bpo-28563: Fixed possible DoS and arbitrary code execution when handle
    https://hg.python.org/cpython/rev/730cf8cdf02c

    New changeset e0dd4cc9661a by Serhiy Storchaka in branch '3.5':
    Issue bpo-28563: Fixed possible DoS and arbitrary code execution when handle
    https://hg.python.org/cpython/rev/e0dd4cc9661a

    New changeset 7ea64ab9a5c8 by Serhiy Storchaka in branch '3.6':
    Issue bpo-28563: Fixed possible DoS and arbitrary code execution when handle
    https://hg.python.org/cpython/rev/7ea64ab9a5c8

    New changeset 4ca9a4f96edc by Serhiy Storchaka in branch 'default':
    Issue bpo-28563: Fixed possible DoS and arbitrary code execution when handle
    https://hg.python.org/cpython/rev/4ca9a4f96edc

    @16418c89-2ec4-4af8-8812-31de79becc13
    Copy link
    Mannequin

    timgraham mannequin commented Nov 14, 2016

    Hi, this broke a couple tests with Django because it's passing number as a float rather than an integer. For example:

    ======================================================================
    ERROR: test_localized_formats (template_tests.filter_tests.test_filesizeformat.FunctionTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/tim/code/django/tests/template_tests/filter_tests/test_filesizeformat.py", line 28, in test_localized_formats
        self.assertEqual(filesizeformat(1023), '1023\xa0Bytes')
      File "/home/tim/code/django/django/template/defaultfilters.py", line 895, in filesizeformat
        value = ungettext("%(size)d byte", "%(size)d bytes", bytes_) % {'size': bytes_}
      File "/home/tim/code/django/django/utils/translation/__init__.py", line 91, in ungettext
        return _trans.ungettext(singular, plural, number)
      File "/home/tim/code/django/django/utils/translation/trans_real.py", line 385, in ngettext
        return do_ntranslate(singular, plural, number, 'ngettext')
      File "/home/tim/code/django/django/utils/translation/trans_real.py", line 372, in do_ntranslate
        return getattr(t, translation_function)(singular, plural, number)
      File "/home/tim/code/cpython/Lib/gettext.py", line 441, in ngettext
        tmsg = self._catalog[(msgid1, self.plural(n))]
      File "<string>", line 4, in func
    ValueError: Plural value must be an integer, got 1023.0.

    Can you advise if this patch could be adapted or if Django should adapt?

    By the way, I used this patch to get a more useful error message:

    diff --git a/Lib/gettext.py b/Lib/gettext.py
    index 7032efa..2076a3f 100644
    --- a/Lib/gettext.py
    +++ b/Lib/gettext.py
    @@ -185,7 +185,7 @@ def c2py(plural):
             exec('''if True:
                 def func(n):
                     if not isinstance(n, int):
    -                    raise ValueError('Plural value must be an integer.')
    +                    raise ValueError('Plural value must be an integer, got %%s.
                     return int(%s)
                 ''' % result, ns)
             return ns['func']

    @zhangyangyu
    Copy link
    Member

    GNU gettext only allows the plural value(n) to be an integer(unsigned long int). Django deliberately converts the value to long in filesizeformat. Any reason not to use int?

    @serhiy-storchaka
    Copy link
    Member

    Proposed patch makes gettext accepting non-integer numbers. But DeprecationWarning is emitted if this number have fractional part (because in this case the result of current code can be different from the result of old code). I think Django tests should be passed without errors and warnings.

    In future versions (probably in 3.7) this warning will be replaced with TypeError or ValueError, and new warning will be added for all non-integer numbers.

    @16418c89-2ec4-4af8-8812-31de79becc13
    Copy link
    Mannequin

    timgraham mannequin commented Nov 14, 2016

    Thanks, that does fix that first test. There is one more that still fails:

    $ python -Wall tests/runtests.py humanize_tests.tests.HumanizeTests.test_i18n_intword
    Testing against Django installed in '/home/tim/code/django/django' with up to 3 processes
    Creating test database for alias 'default'...
    Creating test database for alias 'other'...
    /home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.2
      tmsg = self._catalog[(msgid1, self.plural(n))]
    /home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.2
      tmsg = self._catalog[(msgid1, self.plural(n))]
    F

    ======================================================================
    FAIL: test_i18n_intword (humanize_tests.tests.HumanizeTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/tim/code/django/tests/humanize_tests/tests.py", line 133, in test_i18n_intword
        self.humanize_tester(test_list, result_list, 'intword')
      File "/home/tim/code/django/tests/humanize_tests/tests.py", line 39, in humanize_tester
        msg="%s test failed, produced '%s', should've produced '%s'" % (method, rendered, result))
    AssertionError: '1,2 Million' != '1,2 Millionen' : intword test failed, produced '1,2 Million', should've produced '1,2 Millionen'

    I think the relevant code is https://github.com/django/django/blob/cbae4d31847d75d889815bfe7c04af035f45e28d/django/contrib/humanize/templatetags/humanize.py#L60-L63. I'm not too familiar with this code, but I'll try to get a better explanation if it's not clear to you.

    @serhiy-storchaka
    Copy link
    Member

    What about following patch?

    @16418c89-2ec4-4af8-8812-31de79becc13
    Copy link
    Mannequin

    timgraham mannequin commented Nov 14, 2016

    Yes, that fixes the second test. Current warning (with stacklevel=3):

    /home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.29
    tmsg = self._catalog[(msgid1, self.plural(n))]

    Possibly the stacklevel should instead be 4:

    /home/tim/code/django/django/utils/translation/trans_real.py:373: DeprecationWarning: Plural value must be an integer, got 1.29
    return getattr(t, translation_function)(singular, plural, number)

    @1762cc99-3127-4a62-9baf-30c3d0f51ef7
    Copy link
    Mannequin

    python-dev mannequin commented Nov 14, 2016

    New changeset 5b6cde995b3a by Serhiy Storchaka in branch '3.3':
    Issue bpo-28563: Make plural form selection more lenient and accepting
    https://hg.python.org/cpython/rev/5b6cde995b3a

    New changeset 6ca91a14a555 by Serhiy Storchaka in branch '2.7':
    Issue bpo-28563: Make plural form selection more lenient and accepting
    https://hg.python.org/cpython/rev/6ca91a14a555

    New changeset f78a05cda5aa by Serhiy Storchaka in branch '3.4':
    Issue bpo-28563: Make plural form selection more lenient and accepting
    https://hg.python.org/cpython/rev/f78a05cda5aa

    New changeset 6a2754055ff8 by Serhiy Storchaka in branch '3.5':
    Issue bpo-28563: Make plural form selection more lenient and accepting
    https://hg.python.org/cpython/rev/6a2754055ff8

    New changeset 4c201c65ce5d by Serhiy Storchaka in branch '3.6':
    Issue bpo-28563: Make plural form selection more lenient and accepting
    https://hg.python.org/cpython/rev/4c201c65ce5d

    New changeset d0efb8532589 by Serhiy Storchaka in branch 'default':
    Issue bpo-28563: Make plural form selection more lenient and accepting
    https://hg.python.org/cpython/rev/d0efb8532589

    @serhiy-storchaka
    Copy link
    Member

    Yes, you are right, the stacklevel should be 4. Thank you for your report and testing Tim.

    Committed the patch without any deprecation. Will add a deprecation in separate issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 only security fixes deferred-blocker stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants