Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapy ignores proxy credentials when using "proxy" meta key #2526

Closed
vezunch1k opened this issue Feb 2, 2017 · 1 comment
Closed

Scrapy ignores proxy credentials when using "proxy" meta key #2526

vezunch1k opened this issue Feb 2, 2017 · 1 comment

Comments

@vezunch1k
Copy link

@vezunch1k vezunch1k commented Feb 2, 2017

Code yield Request(link, meta={'proxy': 'http://user:password@ip:port’}) ignores user:password.
Problem is solved by using header "Proxy-Authorization" with base64, but it is better to implement it inside Scrapy.

@redapple
Copy link
Contributor

@redapple redapple commented Feb 2, 2017

Thanks @vezunch1k for reporting.
I can indeed reproduce this.

HttpProxyMiddleware does not touch outgoing requests if they have the "proxy" key set in meta dict. Especially, it does not update headers for Proxy-Authorization.

And Scrapy's downloader Agent only uses the "host" part of the proxy URL, and ignores credentials that may be there, assuming Proxy-Authorization is already there if it's needed:

>>> from scrapy.core.downloader.webclient import _parse
>>> _parse('https://username:password@10.20.30.40:8888')
('https', 'username:password@10.20.30.40:8888', '10.20.30.40', 8888, '/')
        proxy = request.meta.get('proxy')
        if proxy:
            _, _, proxyHost, proxyPort, proxyParams = _parse(proxy)
            scheme = _parse(request.url)[0]
            proxyHost = to_unicode(proxyHost)
            omitConnectTunnel = b'noconnect' in proxyParams
            if  scheme == b'https' and not omitConnectTunnel:
                proxyConf = (proxyHost, proxyPort,
                             request.headers.get(b'Proxy-Authorization', None))
                return self._TunnelingAgent(reactor, proxyConf,
                    contextFactory=self._contextFactory, connectTimeout=timeout,
                    bindAddress=bindaddress, pool=self._pool)

Proxy credentials in proxy URL are correctly processed by HttpProxyMiddleware when http(s)_proxy env vars are being used,
so it makes sense to me to handle them as well when using "proxy" key direclty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants