Skip to content

Commit

Permalink
Merge pull request from GHSA-9x8m-2xpf-crp3
Browse files Browse the repository at this point in the history
* Enforce matching proxy request meta and Proxy-Authorization header

* Cover proxy credential security fix in the release notes

* Remove extra empty line

* Reword the security issue description

* Address scenario where Proxy-Authorization is unexpectedly removed by a prior middleware

* Set the release date of Scrapy 2.6.2 and 1.8.3
  • Loading branch information
Gallaecio committed Jul 25, 2022
1 parent 4205609 commit af7dd16
Show file tree
Hide file tree
Showing 3 changed files with 438 additions and 35 deletions.
106 changes: 103 additions & 3 deletions docs/news.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,57 @@ Release notes

.. _release-2.6.2:

Scrapy 2.6.2 (to be determined)
-------------------------------
Scrapy 2.6.2 (2022-07-25)
-------------------------

**Security bug fix:**

- When :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware`
processes a request with :reqmeta:`proxy` metadata, and that
:reqmeta:`proxy` metadata includes proxy credentials,
:class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` sets
the ``Proxy-Authentication`` header, but only if that header is not already
set.

There are third-party proxy-rotation downloader middlewares that set
different :reqmeta:`proxy` metadata every time they process a request.

Because of request retries and redirects, the same request can be processed
by downloader middlewares more than once, including both
:class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` and
any third-party proxy-rotation downloader middleware.

These third-party proxy-rotation downloader middlewares could change the
:reqmeta:`proxy` metadata of a request to a new value, but fail to remove
the ``Proxy-Authentication`` header from the previous value of the
:reqmeta:`proxy` metadata, causing the credentials of one proxy to be sent
to a different proxy.

To prevent the unintended leaking of proxy credentials, the behavior of
:class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` is now
as follows when processing a request:

Fixes additional regressions introduced in 2.6.0:
- If the request being processed defines :reqmeta:`proxy` metadata that
includes credentials, the ``Proxy-Authorization`` header is always
updated to feature those credentials.

- If the request being processed defines :reqmeta:`proxy` metadata
without credentials, the ``Proxy-Authorization`` header is removed
*unless* it was originally defined for the same proxy URL.

To remove proxy credentials while keeping the same proxy URL, remove
the ``Proxy-Authorization`` header.

- If the request has no :reqmeta:`proxy` metadata, or that metadata is a
falsy value (e.g. ``None``), the ``Proxy-Authorization`` header is
removed.

It is no longer possible to set a proxy URL through the
:reqmeta:`proxy` metadata but set the credentials through the
``Proxy-Authorization`` header. Set proxy credentials through the
:reqmeta:`proxy` metadata instead.

Also fixes the following regressions introduced in 2.6.0:

- :class:`~scrapy.crawler.CrawlerProcess` supports again crawling multiple
spiders (:issue:`5435`, :issue:`5436`)
Expand Down Expand Up @@ -1925,6 +1972,59 @@ affect subclasses:
(:issue:`3884`)


.. _release-1.8.3:

Scrapy 1.8.3 (2022-07-25)
-------------------------

**Security bug fix:**

- When :class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware`
processes a request with :reqmeta:`proxy` metadata, and that
:reqmeta:`proxy` metadata includes proxy credentials,
:class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` sets
the ``Proxy-Authentication`` header, but only if that header is not already
set.

There are third-party proxy-rotation downloader middlewares that set
different :reqmeta:`proxy` metadata every time they process a request.

Because of request retries and redirects, the same request can be processed
by downloader middlewares more than once, including both
:class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` and
any third-party proxy-rotation downloader middleware.

These third-party proxy-rotation downloader middlewares could change the
:reqmeta:`proxy` metadata of a request to a new value, but fail to remove
the ``Proxy-Authentication`` header from the previous value of the
:reqmeta:`proxy` metadata, causing the credentials of one proxy to be sent
to a different proxy.

To prevent the unintended leaking of proxy credentials, the behavior of
:class:`~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware` is now
as follows when processing a request:

- If the request being processed defines :reqmeta:`proxy` metadata that
includes credentials, the ``Proxy-Authorization`` header is always
updated to feature those credentials.

- If the request being processed defines :reqmeta:`proxy` metadata
without credentials, the ``Proxy-Authorization`` header is removed
*unless* it was originally defined for the same proxy URL.

To remove proxy credentials while keeping the same proxy URL, remove
the ``Proxy-Authorization`` header.

- If the request has no :reqmeta:`proxy` metadata, or that metadata is a
falsy value (e.g. ``None``), the ``Proxy-Authorization`` header is
removed.

It is no longer possible to set a proxy URL through the
:reqmeta:`proxy` metadata but set the credentials through the
``Proxy-Authorization`` header. Set proxy credentials through the
:reqmeta:`proxy` metadata instead.


.. _release-1.8.2:

Scrapy 1.8.2 (2022-03-01)
Expand Down
54 changes: 30 additions & 24 deletions scrapy/downloadermiddlewares/httpproxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,31 +45,37 @@ def _get_proxy(self, url, orig_type):
return creds, proxy_url

def process_request(self, request, spider):
# ignore if proxy is already set
creds, proxy_url = None, None
if 'proxy' in request.meta:
if request.meta['proxy'] is None:
return
# extract credentials if present
creds, proxy_url = self._get_proxy(request.meta['proxy'], '')
request.meta['proxy'] = proxy_url
if creds and not request.headers.get('Proxy-Authorization'):
request.headers['Proxy-Authorization'] = b'Basic ' + creds
return
elif not self.proxies:
return

parsed = urlparse_cached(request)
scheme = parsed.scheme
if request.meta['proxy'] is not None:
creds, proxy_url = self._get_proxy(request.meta['proxy'], '')
elif self.proxies:
parsed = urlparse_cached(request)
scheme = parsed.scheme
if (
(
# 'no_proxy' is only supported by http schemes
scheme not in ('http', 'https')
or not proxy_bypass(parsed.hostname)
)
and scheme in self.proxies
):
creds, proxy_url = self.proxies[scheme]

# 'no_proxy' is only supported by http schemes
if scheme in ('http', 'https') and proxy_bypass(parsed.hostname):
return
self._set_proxy_and_creds(request, proxy_url, creds)

if scheme in self.proxies:
self._set_proxy(request, scheme)

def _set_proxy(self, request, scheme):
creds, proxy = self.proxies[scheme]
request.meta['proxy'] = proxy
def _set_proxy_and_creds(self, request, proxy_url, creds):
if proxy_url:
request.meta['proxy'] = proxy_url
elif request.meta.get('proxy') is not None:
request.meta['proxy'] = None
if creds:
request.headers['Proxy-Authorization'] = b'Basic ' + creds
request.headers[b'Proxy-Authorization'] = b'Basic ' + creds
request.meta['_auth_proxy'] = proxy_url
elif '_auth_proxy' in request.meta:
if proxy_url != request.meta['_auth_proxy']:
if b'Proxy-Authorization' in request.headers:
del request.headers[b'Proxy-Authorization']
del request.meta['_auth_proxy']
elif b'Proxy-Authorization' in request.headers:
del request.headers[b'Proxy-Authorization']
Loading

0 comments on commit af7dd16

Please sign in to comment.