[MRG+1] PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly #1767

orangain · 2016-02-06T15:01:34Z

Purpose

Fix #1766, the problem that SitemapSpider fails to extract sitemap urls from robots.txt in Python 3.

Changes

Pass response.text as an argument of sitemap_urls_from_robots() instead of response.body.
Add an unit test.

codecov-io · 2016-02-06T15:09:55Z

Current coverage is `83.33%`

Merging #1767 into master will increase coverage by +0.04% as of f19c27b

Powered by Codecov. Updated on successful CI builds.

eliasdorneles · 2016-02-06T18:33:11Z

Thanks for the patch @orangain, looks good! 👍

kmike · 2016-02-08T05:11:55Z

Thanks @orangain!

[MRG+1] PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly

redapple · 2016-02-08T17:40:31Z

Needs backporting to 1.1 branch

Fix SitemapSpider to extract sitemap urls from robots.txt properly

25c5615

This will fix scrapy#1766.

eliasdorneles changed the title ~~PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly~~ [MRG+1] PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly Feb 6, 2016

kmike added a commit that referenced this pull request Feb 8, 2016

Merge pull request #1767 from orangain/sitemap-robotstxt

44bc4c0

[MRG+1] PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly

kmike merged commit 44bc4c0 into scrapy:master Feb 8, 2016

redapple mentioned this pull request Feb 9, 2016

[MRG+1] [1.1.x][backport] #1766, #1770, #1750, #1662, #1765 #1774

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly #1767

[MRG+1] PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly #1767

orangain commented Feb 6, 2016

codecov-io commented Feb 6, 2016

eliasdorneles commented Feb 6, 2016

kmike commented Feb 8, 2016

redapple commented Feb 8, 2016

[MRG+1] PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly #1767

[MRG+1] PY3: Fix SitemapSpider to extract sitemap urls from robots.txt properly #1767

Conversation

orangain commented Feb 6, 2016

Purpose

Changes

codecov-io commented Feb 6, 2016

Current coverage is 83.33%

eliasdorneles commented Feb 6, 2016

kmike commented Feb 8, 2016

redapple commented Feb 8, 2016

Current coverage is `83.33%`