{{ message }}
/ cpython Public

# CVE-2020-10735: Prevent DoS by large int<->str conversions#95778

Closed
opened this issue Aug 8, 2022 · 23 comments
Closed

# CVE-2020-10735: Prevent DoS by large int<->str conversions #95778

opened this issue Aug 8, 2022 · 23 comments
Assignees
Labels
3.7 3.8 3.9 3.10 3.11 3.12 type-bug An unexpected behavior, bug, or error type-feature A feature request or enhancement type-security A security issue

## Problem

A Denial Of Service (DoS) issue was identified in CPython because we use binary bignum’s for our int implementation. A huge integer will always consume a near-quadratic amount of CPU time in conversion to or from a base 10 (decimal) string with a large number of digits. No efficient algorithm exists to do otherwise.

It is quite common for Python code implementing network protocols and data serialization to do int(untrusted_string_or_bytes_value) on input to get a numeric value, without having limited the input length or to do log("processing thing id %s", unknowingly_huge_integer) or any similar concept to convert an int to a string without first checking its magnitude. (http, json, xmlrpc, logging, loading large values into integer via linear-time conversions such as hexadecimal stored in yaml, or anything computing larger values based on user controlled inputs… which then wind up attempting to output as decimal later on). All of these can suffer a CPU consuming DoS in the face of untrusted data.

Everyone auditing all existing code for this, adding length guards, and maintaining that practice everywhere is not feasible nor is it what we deem the vast majority of our users want to do.

This issue has been reported to the Python Security Response Team multiple times by a few different people since early 2020, most recently a few weeks ago while I was in the middle of polishing up the PR so it’d be ready before 3.11.0rc2.

## Mitigation

After discussion on the Python Security Response Team mailing list the conclusion was that we needed to limit the size of integer to string conversions for non-linear time conversions (anything not a power-of-2 base) by default. And offer the ability to configure or disable this limit.

The Python Steering Council is aware of this change and accepts it as necessary.

added type-bug An unexpected behavior, bug, or error type-security A security issue 3.11 3.10 3.9 3.8 3.7 3.12 labels Aug 8, 2022
self-assigned this Aug 8, 2022
mentioned this issue Sep 2, 2022
changed the title Placeholder issue for a specific security fix CVE-2020-10735: Prevent DoS by large int<->str conversions Sep 2, 2022
added the type-feature A feature request or enhancement label Sep 2, 2022
This was referenced Sep 2, 2022
added a commit that referenced this issue Sep 2, 2022
 gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96499) 
 511ca94 
Integer to and from text conversions via CPython's bignum int type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
added a commit that referenced this issue Sep 2, 2022
 [3.11] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96500 
 f8b71da 
)

Integer to and from text conversions via CPython's bignum int type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports #96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
added a commit that referenced this issue Sep 2, 2022
 [3.10] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96501 
 8f0fa4b 
)

Integer to and from text conversions via CPython's bignum int type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports #96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).

### mgorny commented Sep 3, 2022

 Thanks a lot for doing this, and especially for the backports to older versions. It's really nice I don't have to figure them all out myself ;-).

mentioned this issue Sep 3, 2022

### mdickinson commented Sep 3, 2022 • edited

 As commented on the PR, the current code in main doesn't prevent the potential DOS for the int to str direction: the size check is only performed after the quadratic-time algorithm has run. For example: Python 3.12.0a0 (heads/main-dirty:ac4ddab405, Sep 3 2022, 16:35:07) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> n = 10**(10**7) # takes a few seconds to run >>> s = str(n) # expect instant ValueError, but takes many minutes One possible fix would be to keep the existing post-loop check (so that we get the benefit of knowing exactly what the threshold is), but also adding a cruder pre-check that would catch the majority of bad cases before the loop is entered. I think we want something like the following as a pre-check  if (size_a >= 10 * _PY_LONG_MAX_STR_DIGITS_THRESHOLD / (3 * PyLong_SHIFT) + 2) { PyInterpreterState *interp = _PyInterpreterState_GET(); int max_str_digits = interp->int_max_str_digits; if ((max_str_digits > 0) && (size_a >= 10 * max_str_digits / (3 * PyLong_SHIFT) + 2)) { } } Here 10 * _PY_LONG_MAX_STR_DIGITS_THRESHOLD / (3 * PyLong_SHIFT) + 2 is a constant, so in normal use we get an extra predictable(?) never-taken branch. For the logic used in computing the constant: 10/3 is an upper bound for log10(2), so if size_a >= 10 * _PY_LONG_MAX_STR_DIGITS_THRESHOLD / (3 * PyLong_SHIFT) + 2 then (size_a - 1) * (3 * PyLong_SHIFT) >= 10 * _PY_LONG_MAX_STR_DIGITS_THRESHOLD so (size_a - 1) * PyLong_SHIFT >= log10(2) * _PY_LONG_MAX_STR_DIGITS_THRESHOLD so abs(a) >= PyLong_BASE ** (size_a - 1) >= 10**_PY_LONG_MAX_STR_DIGITS_THRESHOLD and similar logic applies for the size_a > 10 * max_str_digits / (3 * PyLong_SHIFT) + 2) expression within the loop. One complication: we can no longer produce an error message that says exactly how many decimal digits there were, but I think that's going to be true no matter how we fix this.

### mdickinson commented Sep 3, 2022

 I'll see if I can turn the above into a PR.

### mdickinson commented Sep 3, 2022

 BTW, did the PR authors consider using type Py_ssize_t instead of int for the new int_max_str_digits field? The max value for int could potentially be as low as 32767, which could be quite limiting. Either Py_ssize_t or a fixed-size type like int32_t or int64_t might work better.

### merwok commented Sep 3, 2022

 -X options and envvars are documented in 4 or 5 spots, I think a couple were missed by the PR. I’ll make a PR to fix this little issue.

mentioned this issue Sep 3, 2022

### gpshead commented Sep 4, 2022

 BTW, did the PR authors consider using type Py_ssize_t instead of int for the new int_max_str_digits field? The max value for int could potentially be as low as 32767, which could be quite limiting. Either Py_ssize_t or a fixed-size type like int32_t or int64_t might work better. I did consider Py_ssize_t instead of int, but I'm unaware of any platform supported by CPython with a 16-bit int.

mentioned this issue Sep 4, 2022
added a commit that referenced this issue Sep 4, 2022
 gh-95778: Correctly pre-check for int-to-str conversion (#96537) 
 b126196 
Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
mentioned this issue Sep 4, 2022
pushed a commit to miss-islington/cpython that referenced this issue Sep 4, 2022
 pythongh-95778: Correctly pre-check for int-to-str conversion (python… 
 51ba378 
…GH-96537)

Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196)

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
pushed a commit to gpshead/cpython that referenced this issue Sep 4, 2022
 [3.10] pythongh-95778: Correctly pre-check for int-to-str conversion (p… 
 8731a0b 
…ythonGH-96537)

Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196)

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
mentioned this issue Sep 4, 2022
added a commit to gpshead/cpython that referenced this issue Sep 4, 2022
 pythongh-95778: Correctly pre-check for int-to-str conversion (python… 
 a1247fd 
…#96537)

Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
added a commit that referenced this issue Sep 4, 2022
 gh-95778: Correctly pre-check for int-to-str conversion (GH-96537) 
 8a776d1 
Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196)

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
added a commit to gpshead/cpython that referenced this issue Sep 4, 2022
 pythongh-95778: Correctly pre-check for int-to-str conversion (python… 
 17bd053 
…#96537)

Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
added a commit to gpshead/cpython that referenced this issue Sep 4, 2022
 pythongh-95778: Correctly pre-check for int-to-str conversion (python… 
 39837b6 
…#96537)

Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
added a commit that referenced this issue Sep 4, 2022
 [3.10] gh-95778: Correctly pre-check for int-to-str conversion (GH-96537 
 eace09e 
) (#96563)

Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196)

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
mentioned this issue Sep 4, 2022
pushed a commit that referenced this issue Sep 5, 2022
 [3.9] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96502) 
 cec1e9d 
* Correctly pre-check for int-to-str conversion (#96537)

Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
mentioned this issue Sep 5, 2022
mentioned this issue Sep 5, 2022
mentioned this issue Sep 5, 2022

### adamant-pwn commented Sep 5, 2022 • edited

 Are you aware that there are algorithms to do conversions faster than quadratic time? For example, one may use divide and conquer + FFT to achieve something around O(n log^2 n) in complexity. Not as great as linear, but certainly much better than quadratic.

### gpshead commented Sep 5, 2022

 Yes, using a better algorithm is covered over in #90716.

### gpshead commented Sep 5, 2022

 But if some version of Tim's suggestions in #90716 went in, billion digit conversions might become feasible. :-) But at that point, maybe we'd be able to drop these limits entirely? Removing the limit might not be feasible as after this feature some code could come to depend on its existence indirectly to protect other things in whole systems from large values, knowingly or not. That doesn't mean we couldn't, just that it'd need consideration. We also need to consider the worst case performance of any fancier algorithm before removing the ability to limit it. Untrusted data seeking a DoS would target that. Ex: If a faster conversion algorithm is typically $O(n * log_2(n))$ but devolves into $O(n^2)$ on some values, we need to handle the worst case unexpected computation. A notebook could even be designed to avoid half the problem by having their REPL repr and storage code auto-switch to hex for large values.

mentioned this issue Sep 5, 2022
pushed a commit that referenced this issue Sep 5, 2022
 [3.8] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96503) 
 b5e331f 
* Correctly pre-check for int-to-str conversion

Converting a large enough int to a decimal string raises ValueError as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10


In GitHub markdown math-speak, writing $M$ for max_str_digits, $L$ for PyLong_SHIFT and $s$ for size_a, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>

### imachug commented Sep 5, 2022

 This is appalling. Situation: people pass untrusted data to slow built-in functions. Reasonable solution: a) patch vulnerable programs, b) optimize said functions. CPython's solution: just limit stuff. Your rationale is founded on the fact that developers constantly add bugs and fixing them is not an option. Fixing bugs is like, more than half of what developers normally do. When ReDoS was discovered, did we limit the number of iterations of the matching algorithm? No. Then why should we do the same here? There are dangerous tools, but they are often much more useful due to lack of artificial limits. When did we drop "Special cases aren't special enough to break the rules." To add insult to injury, people seem to be talking about limiting repr too. How many bugs that will cause, I can't tell: I saw people use repr(...) to generate non-Python code as a universal solution that works with both integers and strings, and I foresee problems when it starts generating non-digit strings. You're limiting integer to string conversions. How soon are you going to limit repr(list) because of the billion laughs attack? Why is _PY_LONG_MAX_STR_DIGITS_THRESHOLD even a constant? No one's going to recompile Python, and if people actually start using this security "feature", chances are it'll be enabled everywhere, and then programs that use big arithmetic and parse numbers from the user input will silently stop working. It's not even big arithmetic, really: the discussed limit to repr seems to be only 512 bit, which is much less than thousands-of-bits numbers used in cryptography right now, in Python. In fact, the problem doesn't go away if it's variable: the problem is that it's global, affecting the whole application rather than the user-facing part. When did we drop "Explicit is better than implicit?" And then, has anyone actually researched how much trusted code parses integers from untrusted sources directly? To me it seems that even ReDoS should be more popular than this. I think people seldom call int on user input directly: they either use parsing libraries like json and yaml, or (hear me out) use (web) frameworks that parse the data themselves. Either way, there's a finite number of places where checks should be added, and even if it's tens of places, it's so much better than limiting every single program, including those that don't even face the outer world, because it'll be actually configurable. I don't think I even understand why a limit is added in the first place. If Python used a quadratic algorithm for multiplication instead of FFT, that would not be considered a vulnerability but an inefficiency. There are much faster algorithms than the one used by CPython, I literally found one by googling "subquadratic base conversion". There: Richard P. Brent and Paul Zimmermann, Modern Computer Arithmetic. #90716, the issue that was tracking this, was dropped because what, developers tried to invent stuff themselves instead of comparing with state of art, and just chose the simplest option after an inevitable failure?

### gpshead commented Sep 5, 2022 • edited

 As a reminder to everybody the Python Community Code Of Conduct applies here. Closing. This is fixed. We'll open new issues for any follow up work necessary.

closed this as completed Sep 5, 2022

### stevendaprano commented Sep 6, 2022

 Everyone auditing all existing code for this, adding length guards, and maintaining that practice everywhere is not feasible nor is it what we deem the vast majority of our users want to do. If applications don't validate their data before running str <-> int conversions, I'm not entirely sure that having an exploit that allows attackers to crash the application with an unexpected ValueError instead of merely slowing it down for a few seconds is a great improvement. I'm only a naive developer but this kinda feels like the fix is worse than the problem its solving. What I am missing?

pushed a commit that referenced this issue Sep 6, 2022
 [3.7] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (GH-9… 
 15ec1af 
…6504)

Converting between int and str in bases other than 2
(binary), 4, 8 (octal), 16 (hexadecimal), or 32 such as base 10 (decimal) now
raises a ValueError if the number of digits in string form is above a
limit to avoid potential denial of service attacks due to the algorithmic
complexity. This is a mitigation for CVE-2020-10735
(https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-10735).

This new limit can be configured or disabled by environment variable, command
line flag, or :mod:sys APIs. See the Integer String Conversion Length
Limitation documentation.  The default limit is 4300
digits in string form.

Patch by Gregory P. Smith [Google] and Christian Heimes [Red Hat] with feedback
from Victor Stinner, Thomas Wouters, Steve Dower, Ned Deily, and Mark Dickinson.

### gpshead commented Sep 6, 2022 • edited by ambv

 Please redirect further discussion to discuss.python.org.

locked as resolved and limited conversation to collaborators Sep 6, 2022

# Further Discussion

... is taking place in discuss.python.org threads

### vstinner commented Sep 15, 2022

 There is a minor bug when parsing the -X option: #96848

### vstinner commented Sep 16, 2022

 I propose PR #96874 to mention sys.set_int_max_str_digits() in the error message.