Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return single element from coroutine callback #4609

Conversation

elacuesta
Copy link
Member

Currently, single elements cannot be returned from coroutine callbacks.

Consider the following sample spider:

from scrapy import Spider, Request

class AsyncDefSpider(Spider):
    name = "async_def"
    start_urls = ["https://example.org"]

    async def parse(self, response):
        return Request("https://example.com")

which produces:

2020-06-01 17:10:24 [scrapy.core.engine] INFO: Spider opened
2020-06-01 17:10:24 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-06-01 17:10:24 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-06-01 17:10:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://example.org> (referer: None)
2020-06-01 17:10:25 [scrapy.core.scraper] ERROR: Spider error processing <GET https://example.org> (referer: None)
Traceback (most recent call last):
  File "/.../scrapy/scrapy/utils/defer.py", line 120, in iter_errback
    yield next(it)
  File "/.../scrapy/scrapy/utils/python.py", line 346, in __next__
    return next(self.data)
  File "/.../scrapy/scrapy/utils/python.py", line 346, in __next__
    return next(self.data)
  File "/.../scrapy/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "/.../scrapy/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/.../scrapy/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "/.../scrapy/scrapy/spidermiddlewares/referer.py", line 340, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/.../scrapy/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "/.../scrapy/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/.../scrapy/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "/.../scrapy/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/.../scrapy/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
TypeError: 'Request' object is not iterable
2020-06-01 17:10:25 [scrapy.core.engine] INFO: Closing spider (finished)

If the returned item is a dict, it's iterated and so the following error appears instead:

2020-06-01 17:03:36 [scrapy.core.scraper] ERROR: Spider must return Request, BaseItem, dict or None, got 'str' in <GET https://example.org>

d = deferred_from_coro(result)
d.addCallback(iterate_spider_output)
return d
return arg_to_iter(result)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safe to remove the deferred_from_coro call here, because at this point this condition is False so the passed object is returned unmodified.

@codecov
Copy link

codecov bot commented Jun 1, 2020

Codecov Report

Merging #4609 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4609      +/-   ##
==========================================
- Coverage   84.63%   84.63%   -0.01%     
==========================================
  Files         163      163              
  Lines        9978     9982       +4     
  Branches     1486     1487       +1     
==========================================
+ Hits         8445     8448       +3     
- Misses       1266     1267       +1     
  Partials      267      267              
Impacted Files Coverage Δ
scrapy/utils/spider.py 77.77% <100.00%> (+2.77%) ⬆️
scrapy/core/downloader/__init__.py 89.47% <0.00%> (-1.51%) ⬇️
scrapy/utils/trackref.py 85.71% <0.00%> (+2.85%) ⬆️

@kmike kmike requested a review from wRAR June 1, 2020 20:33
@Gallaecio Gallaecio merged commit 91e505e into scrapy:master Jun 2, 2020
@elacuesta elacuesta deleted the return-single-element-from-coroutine-callback branch June 2, 2020 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants