Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrapy may keep wrong proxy setting when following redirects #767

Closed
redapple opened this issue Jun 26, 2014 · 3 comments
Closed

scrapy may keep wrong proxy setting when following redirects #767

redapple opened this issue Jun 26, 2014 · 3 comments

Comments

@redapple
Copy link
Contributor

When:

  • http_proxy is set for HttpProxyMiddleware,
  • and an http:// request is redirected to an https:// location,

scrapy will use the http_proxy settings for the https scheme.

This also happens for https:// to http://

Proxy-Authorization header is also propagated.

To test:

  • http://www.facebook.com redirects to https://www.facebook.com
  • https://instagram.com/ redirects to http://instagram.com

Note: interesting discussion on HTTP redirection and headers: https://code.google.com/p/go/issues/detail?id=4800

@dangra
Copy link
Member

dangra commented Jun 26, 2014

A possible solution is to cleanup all proxy related metakeys and headers on process_response() hook of HttpProxyMiddleware

@nramirezuy
Copy link
Contributor

I think using something like scrapy.utils.datatypes.MergeDict for the headers could help. We want headers added at spider level to be kept, but the rest to be removed.
Redirect and Retry middlewares returns a Request on process_response which send the request to the beginning of the `DownloaderMiddleware" chain, so the headers added on that instance are going to be added again if they are needed.

But what about redirections to different domains with an Authorization header. Should we have a meta key with the headers to keep?

@Gallaecio
Copy link
Member

GHSA-jm3v-qxmh-hxwv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants