You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/usr/local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "/usr/local/lib/python3.8/site-packages/sh_scrapy/middlewares.py", line 30, in process_spider_output
for x in result:
File "/usr/local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "/usr/local/lib/python3.8/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/usr/local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "/usr/local/lib/python3.8/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/usr/local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "/usr/local/lib/python3.8/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "/usr/local/lib/python3.8/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/local/lib/python3.8/site-packages/scrapy/spiders/sitemap.py", line 53, in _parse_sitemap
s = Sitemap(body)
File "/usr/local/lib/python3.8/site-packages/scrapy/utils/sitemap.py", line 18, in __init__
self._root = lxml.etree.fromstring(xmltext, parser=xmlp)
File "src/lxml/etree.pyx", line 3234, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1764, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
File "<string>", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1
Steps to Reproduce
Crawl sitemap with at least a blank .xml page
Expected behavior:
Handle the XMLSyntaxError by logging a warning for an invalid sitemap page and returning.
Actual behavior:
Throws XMLSyntaxError
Versions
Scrapy version 1.7.2
The text was updated successfully, but these errors were encountered:
Description
SitemapSpider
throws alxml.etree.XMLSyntaxError
when hitting a blank sitemap page while crawling a sitemap.Example sitemap with blank pages: https://bikeradar.com/sitemap.xml
Stack trace
Steps to Reproduce
.xml
pageExpected behavior:
Handle the
XMLSyntaxError
by logging a warning for an invalid sitemap page and returning.Actual behavior:
Throws
XMLSyntaxError
Versions
Scrapy version 1.7.2
The text was updated successfully, but these errors were encountered: