Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Redirect codes in meta #3687

Merged
merged 14 commits into from
Mar 26, 2019
11 changes: 7 additions & 4 deletions docs/topics/downloader-middleware.rst
Original file line number Diff line number Diff line change
Expand Up @@ -725,9 +725,11 @@ RedirectMiddleware
This middleware handles redirection of requests based on response status.

.. reqmeta:: redirect_urls
.. reqmeta:: redirect_reasons

The urls which the request goes through (while being redirected) can be found
in the ``redirect_urls`` :attr:`Request.meta <scrapy.http.Request.meta>` key.
The urls which the request goes through (while being redirected) and their
corresponding status can be found in the ``redirect_urls`` and ``redirect_reasons``
:attr:`Request.meta <scrapy.http.Request.meta>` keys respectively.

Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
The :class:`RedirectMiddleware` can be configured through the following
settings (see the settings documentation for more info):
Expand Down Expand Up @@ -792,8 +794,9 @@ settings (see the settings documentation for more info):
* :setting:`METAREFRESH_ENABLED`
* :setting:`METAREFRESH_MAXDELAY`

This middleware obey :setting:`REDIRECT_MAX_TIMES` setting, :reqmeta:`dont_redirect`
and :reqmeta:`redirect_urls` request meta keys as described for :class:`RedirectMiddleware`
This middleware obey :setting:`REDIRECT_MAX_TIMES` setting, :reqmeta:`dont_redirect`,
:reqmeta:`redirect_urls` and :reqmeta:`redirect_reasons` request meta keys as described
for :class:`RedirectMiddleware`


MetaRefreshMiddleware settings
Expand Down
1 change: 1 addition & 0 deletions docs/topics/request-response.rst
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,7 @@ Those are:
* :reqmeta:`dont_merge_cookies`
* :reqmeta:`cookiejar`
* :reqmeta:`dont_cache`
* :reqmeta:`redirect_reasons`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please build docs locally and check that this link works (I think it doesn't)?

Copy link
Contributor Author

@maramsumanth maramsumanth Mar 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmike , I just copied redirect_urls from above and replaced url with reason.
Could you please tell me how to check docs locally, because I don't know how to open rst files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please build docs locally and check that this link works (I think it doesn't)?

@kmike could you please help me fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must define the target first where you added the documentation. In this case, you could add .. reqmeta:: redirect_reasons right below .. reqmeta:: redirect_urls in downloader-middleware.rst.

* :reqmeta:`redirect_urls`
* :reqmeta:`bindaddress`
* :reqmeta:`dont_obey_robotstxt`
Expand Down
2 changes: 2 additions & 0 deletions scrapy/downloadermiddlewares/redirect.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ def _redirect(self, redirected, request, spider, reason):
redirected.meta['redirect_ttl'] = ttl - 1
redirected.meta['redirect_urls'] = request.meta.get('redirect_urls', []) + \
[request.url]
redirected.meta['redirect_reasons'] = request.meta.get('redirect_reasons', []) + \
[reason]
redirected.dont_filter = request.dont_filter
redirected.priority = request.priority + self.priority_adjust
logger.debug("Redirecting (%(reason)s) to %(redirected)s from %(request)s",
Expand Down
19 changes: 19 additions & 0 deletions tests/test_downloadermiddleware_redirect.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,16 @@ def test_redirect_urls(self):
self.assertEqual(req3.url, 'http://scrapytest.org/redirected2')
self.assertEqual(req3.meta['redirect_urls'], ['http://scrapytest.org/first', 'http://scrapytest.org/redirected'])

def test_redirect_reasons(self):
req1 = Request('http://scrapytest.org/first')
rsp1 = Response('http://scrapytest.org/first', headers={'Location': '/redirected1'}, status=301)
req2 = self.mw.process_response(req1, rsp1, self.spider)
rsp2 = Response('http://scrapytest.org/redirected1', headers={'Location': '/redirected2'}, status=301)
req3 = self.mw.process_response(req2, rsp2, self.spider)

self.assertEqual(req2.meta['redirect_reasons'], [301])
self.assertEqual(req3.meta['redirect_reasons'], [301, 301])

def test_spider_handling(self):
smartspider = self.crawler._create_spider('smarty')
smartspider.handle_httpstatus_list = [404, 301, 302]
Expand Down Expand Up @@ -259,6 +269,15 @@ def test_redirect_urls(self):
self.assertEqual(req3.url, 'http://scrapytest.org/redirected2')
self.assertEqual(req3.meta['redirect_urls'], ['http://scrapytest.org/first', 'http://scrapytest.org/redirected'])

def test_redirect_reasons(self):
req1 = Request('http://scrapytest.org/first')
rsp1 = HtmlResponse('http://scrapytest.org/first', body=self._body(url='/redirected'))
req2 = self.mw.process_response(req1, rsp1, self.spider)
rsp2 = HtmlResponse('http://scrapytest.org/redirected', body=self._body(url='/redirected1'))
req3 = self.mw.process_response(req2, rsp2, self.spider)

self.assertEqual(req2.meta['redirect_reasons'], ['meta refresh'])
self.assertEqual(req3.meta['redirect_reasons'], ['meta refresh', 'meta refresh'])

if __name__ == "__main__":
unittest.main()