Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39503: CVE-2020-8492: Fix AbstractBasicAuthHandler #18284

Merged
merged 2 commits into from Apr 2, 2020

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jan 30, 2020

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking.

Vulnerability reported by Matt Schwager.

https://bugs.python.org/issue39503

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 30, 2020

cc @serhiy-storchaka

@mschwager
Copy link

@mschwager mschwager commented Jan 30, 2020

This fix looks good to me!

@@ -937,7 +937,7 @@ class AbstractBasicAuthHandler:

# allow for double- and single-quoted realm values
# (single quotes are a violation of the RFC, but appear in the wild)
rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
rx = re.compile('(?:[^,]*,)*[ \t]*([^ \t]+)[ \t]+'
Copy link
Member

@serhiy-storchaka serhiy-storchaka Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?:.*,)* is equivalent to (?:.*,)?.

But since this regular expresion is only used with search(). (?:.*,)*[ \t]* can be removed at all.

I'll analyze whether it is correct or there is an error in the regular expression.

Copy link
Member

@serhiy-storchaka serhiy-storchaka Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I am cannot say that I completely understand the code, but to give it some sense we can either

  1. Replace rx.search() with rx.match() and replace (?:.*,)* with (?:.*,)?.

or

  1. Keep rx.search() and replace (?:.*,)* with (?:^|,).

Do not keep (?:[^,]*,)*. It is a waster of resources.

Copy link
Member

@serhiy-storchaka serhiy-storchaka Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, options 1 and 2 are not equivalent if the field value contains more than one challenge. Option 2 is closer to the current behavior. But correct support of more than one challenge need rewriting the code.

https://tools.ietf.org/html/rfc7235#section-4.1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current patch seems to give an O(n^3) time to evaluate - much better than O(2^n), but still very slow - with 2000 commas it takes about a minute to evalute. With 65000 it takes much, much longer. Testing using the code from here gave the following (commas, seconds) values:
[(100, 0.124), (250, 0.261), (500, 0.923), (750, 2.85), (1000, 6.433), (1250, 12.608), (1500, 21.576), (2000, 50.751)]

Copy link
Member

@serhiy-storchaka serhiy-storchaka Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, the first option is equivalent to the current behavior (returns the last realm).

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jan 30, 2020

Would be nice to add a test.

@mschwager
Copy link

@mschwager mschwager commented Jan 30, 2020

Just a heads up, CVE-2020-8492 has been created. I'm not sure how Python CVEs are generally tracked, but it may be useful to include the information on the bug tracker issue 👍

@vstinner
Copy link
Member Author

@vstinner vstinner commented Feb 3, 2020

Not only (?:.*,)* is inefficient, but it's also useless. It can be removed. Simplified example:

# reference
>>> all(re.search("(?:a,)*b", text) for text in ("a,b", "a,a,b", "b"))
True
# only match last ","
>>> all(re.search("(?:,)?b", text) for text in ("a,b", "a,a,b", "b"))
True
# don't match the prefix
>>> all(re.search("b", text) for text in ("a,b", "a,a,b", "b"))
True

We can either simplify the regex to prevent the "catastrophic backtracking" or even remove the prefix.

UPDATE: Oops, my example was wrong, I fixed it :-)

@bcaller
Copy link
Contributor

@bcaller bcaller commented Feb 4, 2020

Does this also fix https://bugs.python.org/issue38826 ?

@encukou
Copy link
Member

@encukou encukou commented Mar 24, 2020

Not only (?:.*,)* is inefficient, but it's also useless. It can be removed.

It seems to me that the (?:.*,)* is there so that the last realm is selected, as mentioned in the comment above the regex. See:

>>> header = 'basic realm="1", x, other realm="2"'
>>> re.search("(?:.*,)*[ \t]*([^ \t]+)[ \t]+", header).group(1)
'other'
>>> re.search("[ \t]*([^ \t]+)[ \t]+", header).group(1)
'basic'
>>> 

I don't see a way to fix this by just changing the regex while preserving the previous behavior. Then again, corner cases of the previous behavior might be wrong.

@vstinner vstinner changed the title bpo-39503: Fix urllib basic auth regex bpo-39503: CVE-2020-8492: Fix urllib basic auth regex Mar 25, 2020
@vstinner vstinner force-pushed the urllib_basic_auth_regex branch from 7c4bae4 to cc59fb1 Compare Mar 25, 2020
@vstinner
Copy link
Member Author

@vstinner vstinner commented Mar 25, 2020

I rebased my PR and added more tests.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Mar 25, 2020

@serhiy-storchaka: I don't understand if you consider that the fix is wrong or that the fix is not enough (it remains possible to create a denial of service)?

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Mar 25, 2020

@vstinner your fix helps, but we can do better. It has cubic complexity, my suggestion has quadratic complexity. It is possible to implement an algorithm with linear complexity, but not with such small changes.

@davidfraser
Copy link

@davidfraser davidfraser commented Mar 25, 2020

I also added some tests and implemented a simpler complexity regex - see master...davidfraser:urllib_basic_auth_regex

@davidfraser
Copy link

@davidfraser davidfraser commented Mar 25, 2020

It's worth seeing how the results of this regex are actually used

Note my comment in f79379c:

Note that the original regex was roughly O(2**n)
The search for commas and spaces is unnecessary
(and insufficient to ensure that this starts a new scheme).
Replace with a simpler search for an initial scheme, since
we already check that the text starts with 'basic'.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Mar 25, 2020

WWW-Authenticate is badly specified. The RFC doesn't specify if a single HTTP header can contain multiple challenges.

I found these resources:

A variant is to have multiple WWW-Authenticate: one challenge per WWW-Authenticate header.

By the way, AbstractBasicAuthHandler code contains this interesting comment:

        # XXX could be multiple headers
        authreq = headers.get(authreq, None)

Current behavior:

  • Even if there are multiple WWW-Authenticate headers, only parse the first header. That's a bug: the Basic challenge may be in a following WWW-Authenticate header. Moreover, there may be two Basic challenges with two different realm.

  • scheme = str.split()[0] parses the scheme, if scheme.lower() != "basic": raise a ValueError.

  • Use the regex to parse the realm.

  • If the header contains multiple realm=xxx: use the last realm, even if it belongs to another challenge using a different scheme. IMO it's a bug: we should not check the scheme at the beginning of the header and use the last realm at the end of the string.

For example, WWW-Authenticate: Basic realm="ACME Widget Store", Digest realm="other realm" header is accepted since it starts with Basic, but the extracted realm is other realm: the wrong realm is used.

@vstinner vstinner force-pushed the urllib_basic_auth_regex branch from cc59fb1 to 477be6e Compare Mar 25, 2020
@vstinner
Copy link
Member Author

@vstinner vstinner commented Mar 25, 2020

@serhiy-storchaka:

  1. Keep rx.search() and replace (?:.*,)* with (?:^|,).

Sorry, I misunderstood this proposition. In fact, I proposed something similar except that I missed the "start of the string" (regex ^) case. I modified my PR to use this PR. I also added comments to the regex to explain it.

I decided to write a way more complex change to not only fix the vulnerability, but also fix the parser since it didn't look possible to fix the regex without changing the behavior. Currently, the code uses the last realm if there are multiple challenges per header. I fixed this behavior to use the realm of the first Basic challenge.

I also modified the code to support multiple headers, except of only parsing the first one.

@vstinner vstinner force-pushed the urllib_basic_auth_regex branch 2 times, most recently from 137cf0b to ee8ff4f Compare Mar 25, 2020
@vstinner vstinner changed the title bpo-39503: CVE-2020-8492: Fix urllib basic auth regex bpo-39503: CVE-2020-8492: Fix urllib AbstractBasicAuthHandler Mar 25, 2020
@vstinner vstinner force-pushed the urllib_basic_auth_regex branch from ee8ff4f to 3c7dae4 Compare Mar 25, 2020
@vstinner vstinner changed the title bpo-39503: CVE-2020-8492: Fix urllib AbstractBasicAuthHandler bpo-39503: CVE-2020-8492: Fix AbstractBasicAuthHandler Mar 25, 2020
@vstinner vstinner force-pushed the urllib_basic_auth_regex branch from 3c7dae4 to c461645 Compare Mar 25, 2020
@vstinner
Copy link
Member Author

@vstinner vstinner commented Mar 25, 2020

Ok, the PR is now ready for a new round of reviews. I fixed the vulnerability but I also changed the code to parse all WWW-Authenticate HTTP Headers and accept multiple challenges per header.

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Apr 2, 2020

GH-19292 is a backport of this pull request to the 3.7 branch.

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Apr 2, 2020

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x SLES 3.x has failed when building commit 0b297d4.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/6/builds/675) and take a look at the build logs.
  4. Check if the failure is related to this commit (0b297d4) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/6/builds/675

Failed tests:

  • test_imaplib

Failed subtests:

  • test_logout - test.test_imaplib.RemoteIMAP_STARTTLSTest

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

404 tests OK.

10 slowest tests:

  • test_concurrent_futures: 3 min 8 sec
  • test_multiprocessing_spawn: 2 min 44 sec
  • test_tokenize: 1 min 47 sec
  • test_multiprocessing_forkserver: 1 min 39 sec
  • test_unparse: 1 min 26 sec
  • test_multiprocessing_fork: 1 min 24 sec
  • test_capi: 1 min 21 sec
  • test_asyncio: 1 min 1 sec
  • test_lib2to3: 56.4 sec
  • test_signal: 51.2 sec

1 test failed:
test_imaplib

15 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_ossaudiodev
test_readline test_sqlite test_startfile test_tix test_tk
test_ttk_guionly test_winconsoleio test_winreg test_winsound
test_zipfile64

1 re-run test:
test_imaplib

Total duration: 7 min 14 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 989, in _command
    self.send(data + CRLF)
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 331, in send
    self.sock.sendall(data)
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/ssl.py", line 1204, in sendall
    v = self.send(byte_view[count:])
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/ssl.py", line 1173, in send
    return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe


Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/test/test_imaplib.py", line 951, in tearDown
    self.server.logout()
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 641, in logout
    typ, dat = self._simple_command('LOGOUT')
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 1213, in _simple_command
    return self._command_complete(name, self._command(name, *args))
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 991, in _command
    raise self.abort('socket error: %s' % val)
imaplib.IMAP4.abort: socket error: [Errno 32] Broken pipe

@miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Apr 2, 2020

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒🤖

@miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Apr 2, 2020

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Apr 2, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)

Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Apr 2, 2020

GH-19296 is a backport of this pull request to the 3.8 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Apr 2, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)

Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Apr 2, 2020

GH-19297 is a backport of this pull request to the 3.7 branch.

vstinner pushed a commit that referenced this issue Apr 2, 2020
…-19296)

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>

(cherry picked from commit 0b297d4)
vstinner pushed a commit that referenced this issue Apr 2, 2020
…-19297)

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>

(cherry picked from commit 0b297d4)
vstinner added a commit to vstinner/cpython that referenced this issue Apr 2, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)
vstinner added a commit to vstinner/cpython that referenced this issue Apr 3, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)
vstinner added a commit to vstinner/cpython that referenced this issue Apr 3, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)
ned-deily pushed a commit that referenced this issue Apr 3, 2020
…-19304)

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)
vegerot pushed a commit to vegerot/cpython that referenced this issue Jun 10, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
vegerot pushed a commit to vegerot/cpython that referenced this issue Jun 10, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
larryhastings pushed a commit that referenced this issue Jun 20, 2020
…9305)

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.
gmelikov pushed a commit to gmelikov/cpython that referenced this issue Aug 22, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
chrisburr pushed a commit to chrisburr/cpython that referenced this issue Dec 9, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants