-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CVE-2021-3733: ReDoS in urllib.request #87241
Comments
Hi, I find this regex '(?:^|,)[ \t]*([^ \\t]+)[ \t]+' may be stucked by input. The vulnerable regex is located in Line 946 in 5c5a938
The ReDOS vulnerability of the regex is mainly due to the sub-pattern ',([^ \\t]+)' and can be exploited with the following string You can execute redos_python.py to reproduce the ReDos vulnerability. I am willing to suggest that you replace '(?:^|,)[ \t]([^ \\t]+)[ \t]+' with '(?:^|,)[ \t]([^ \\t,]+)[ \t]+' Looking forward for your response! Best, |
I agree. There is no catastrophic backtracking here (it was fixed in bpo-39503), but the complexity of matching the regular expression is linear. Searching the pattern in a sequence of commas has quadratic complexity, because every step has linear complexity and we advance only one character at every attempt. The proposed solution looks correct to me and fixes the issue. Yeting Li, do you mind to create a pull request for it? I can do it myself, but since you have found the problem and the solution, it would be better if the commit be attributed to you. |
+1. The suggested fix looks good to me. |
Thank you for your quick reply! I agree. Catastrophic backtracking is typically regarded as a regex with exponential worst-case matching time. Besides regexes with exponential worst-case time complexity, ReDoS also includes ones with other super-linear (e.g., quadratic) worst-case time complexity. Thanks again for your reply, I'm trying to create a pull request for it. |
I see that you attached a redos_python.py benchmark (which looks like a benchmark that I wrote recently ;-)) but you didn't give results. Can you please show that your fix is effective to avoid catastrophic performances? Is this issue related to the CVE-2020-8492? Is it the same issue or is it different? |
Sorry for the delay. I analyzed the performance of the current version '(?:^|,)[ \t]([^ \\t]+)[ \t]+' and the fixed version '(?:^|,)[ \t]([^ \\t,]+)[ \t]+'. I ran the following HTTP header ten times: header = '' + ',' * (10 ** 5) The current version takes about 139.178s-140.946s, while the repaired version takes about 0.006s. You can analyze them with the code below. from time import perf_counter
for _ in range(0, 10):
BEGIN = perf_counter()
header = repeat_10_5_simple
headers = Headers(header)
handler.http_error_auth_reqed("WWW-Authenticate", host, req, Headers(header))
DURATION = perf_counter() - BEGIN
print(f"took {DURATION} seconds!") For CVE-2020-8492, it is the backtracking performance caused by some ambiguity during the matching, and this issue is caused by the regex engine constantly moves the matching regex across the malicious string that does not have a match for the regex. Because the locations of the vulnerabilities are the same, so I refer to your code. Thanks for the code ;-)! |
I guess that a more generic protection against future attacks would be to limit the maximum length of a HTTP header. 100,000 characters for a HTTP Basic authentification does not sound reasonable. But for now, let's fix the regex. |
For a regex has polynomial worst-case complexity, limiting the maximum input length is indeed a very effective method. As shown below, as the input length becomes smaller, the matching time becomes significantly smaller. header = '' + ',' * (10 ** 4) 1.617s |
redos_python2.py: Updated benchmark. I confirm that PR 24391 fix a worst case performance, starting with 100 characters. Since the complexity is quadratic, strings longer 10^4 characters are likely to hang a client for several minutes. == Reference (vulnerable) == simple: Mean +- std dev: 2.10 us +- 0.05 us == With the PR 24391 fix == simple: Mean +- std dev: 2.15 us +- 0.15 us == Comparison == simple: Mean +- std dev: [ref] 2.10 us +- 0.05 us -> [fix] 2.15 us +- 0.15 us: 1.02x slower Geometric mean: 15.59x faster |
RedHat has now assigned CVE-2021-3733 to this security bug. |
I created https://python-security.readthedocs.io/vuln/urllib-basic-auth-regex2.html to track this vulnerability. |
is there any affect of CVE-2021-3733 this security bug on python 2.7.18? |
@OmkarPatil11 Unfortunately yes. Here how a fix for 3.11 looks like: class AbstractBasicAuthHandler:
# XXX this allows for multiple auth-schemes, but will stupidly pick
# the last one with a realm specified.
# allow for double- and single-quoted realm values
# (single quotes are a violation of the RFC, but appear in the wild)
rx = re.compile('(?:^|,)' # start of the string or ','
'[ \t]*' # optional whitespaces
- '([^ \t]+)' # scheme like "Basic"
+ '([^ \t,]+)' # scheme like "Basic"
'[ \t]+' # mandatory whitespaces
# realm=xxx
# realm='xxx'
# realm="xxx"
'realm=(["\']?)([^"\']*)\\2',
re.I)
# XXX could pre-emptively send auth info already accepted (RFC 2617,
# end of section 2, and section 1.2 immediately after "credentials"
# production). And here is the same fragment from v2.7.18, unfixed: Lines 852 to 864 in 8d21aa2
|
@OmkarPatil11 , Python 2.7.x is not supported. It doesn't receive security fixes too. Time to upgrade applications to Python 3.8 + |
Contact your Linux distribution to get a fix on Python 2.7. On Fedora, there are:
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: