Skip to content

Commit

Permalink
Merge branch '2.11-authorization' into 2.11
Browse files Browse the repository at this point in the history
  • Loading branch information
Gallaecio committed Feb 14, 2024
2 parents a55e933 + 1c4e932 commit 5bcb8fd
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 3 deletions.
27 changes: 26 additions & 1 deletion docs/news.rst
Expand Up @@ -5,15 +5,26 @@ Release notes

.. _release-2.11.1:

Scrapy 2.11.1 (YYYY-MM-DD)
Scrapy 2.11.1 (unreleased)
--------------------------

Highlights:

- Security bug fixes.

- Support for Twisted >= 23.8.0.

- Documentation improvements.

Security bug fixes
~~~~~~~~~~~~~~~~~~

- The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.

.. _cw9j-q3vf-hrrv security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-cw9j-q3vf-hrrv

Modified requirements
~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -61,6 +72,7 @@ Quality assurance

- Fixed a test issue on PyPy 7.3.14. (:issue:`6204`, :issue:`6205`)


.. _release-2.11.0:

Scrapy 2.11.0 (2023-09-18)
Expand Down Expand Up @@ -2929,6 +2941,19 @@ affect subclasses:

(:issue:`3884`)

.. _release-1.8.4:

Scrapy 1.8.4 (unreleased)
-------------------------

**Security bug fixes:**

- The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.

.. _cw9j-q3vf-hrrv security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-cw9j-q3vf-hrrv


.. _release-1.8.3:

Expand Down
10 changes: 8 additions & 2 deletions scrapy/downloadermiddlewares/redirect.py
Expand Up @@ -17,11 +17,17 @@ def _build_redirect_request(source_request, *, url, **kwargs):
**kwargs,
cookies=None,
)
if "Cookie" in redirect_request.headers:
has_cookie_header = "Cookie" in redirect_request.headers
has_authorization_header = "Authorization" in redirect_request.headers
if has_cookie_header or has_authorization_header:
source_request_netloc = urlparse_cached(source_request).netloc
redirect_request_netloc = urlparse_cached(redirect_request).netloc
if source_request_netloc != redirect_request_netloc:
del redirect_request.headers["Cookie"]
if has_cookie_header:
del redirect_request.headers["Cookie"]
# https://fetch.spec.whatwg.org/#ref-for-cors-non-wildcard-request-header-name
if has_authorization_header:
del redirect_request.headers["Authorization"]
return redirect_request


Expand Down
31 changes: 31 additions & 0 deletions tests/test_downloadermiddleware_redirect.py
Expand Up @@ -247,6 +247,37 @@ def test_utf8_location(self):
perc_encoded_utf8_url = "http://scrapytest.org/a%C3%A7%C3%A3o"
self.assertEqual(perc_encoded_utf8_url, req_result.url)

def test_cross_domain_header_dropping(self):
safe_headers = {"A": "B"}
original_request = Request(
"https://example.com",
headers={"Cookie": "a=b", "Authorization": "a", **safe_headers},
)

internal_response = Response(
"https://example.com",
headers={"Location": "https://example.com/a"},
status=301,
)
internal_redirect_request = self.mw.process_response(
original_request, internal_response, self.spider
)
self.assertIsInstance(internal_redirect_request, Request)
self.assertEqual(original_request.headers, internal_redirect_request.headers)

external_response = Response(
"https://example.com",
headers={"Location": "https://example.org/a"},
status=301,
)
external_redirect_request = self.mw.process_response(
original_request, external_response, self.spider
)
self.assertIsInstance(external_redirect_request, Request)
self.assertEqual(
safe_headers, external_redirect_request.headers.to_unicode_dict()
)


class MetaRefreshMiddlewareTest(unittest.TestCase):
def setUp(self):
Expand Down

0 comments on commit 5bcb8fd

Please sign in to comment.