Currently, if spider methods are generators that yield results and contain nested function, then the following warning is issued:
[py.warnings] WARNING: /Library/Python/3.7/site-packages/scrapy/core/scraper.py:148: UserWarning: The "MySpider.parse" method is a generator and includes a "return" statement with a value different than None. This could lead to unexpected behaviour. Please see https://docs.python.org/3/reference/simple_stmts.html#the-return-statement for details about the semantics of the "return" statement within generators
warn_on_generator_with_return_value(spider, callback)
The example of a simple spider that results in the warning:
import scrapy
class MySpider(scrapy.Spider):
name = "MySpider"
start_urls = ["https://scrapy.org"]
def parse(self, response):
def is_external(url):
href = url.css('::attr(href)').get()
return href.startswith('http') and 'scrapy.org' not in href
links = [link for link in response.css('a') if is_external(link)]
for link in links:
yield {'link': link.css('::attr(href)').get(), 'text': link.css('::text').get()}
I know it's a bit artificial example as the nested function can be moved, but there is nothing wrong with nested function conceptually.
Motivation
I have a midsize spider function that includes some nested helper functions that I'd like to keep close to where they are called.
Describe alternatives you've considered
Moving nested function out of the generator is an easy fix, but it constrains expressivity of the code.
Additional context
Related function: is_generator_with_return_value
The text was updated successfully, but these errors were encountered:
Summary
Currently, if spider methods are generators that yield results and contain nested function, then the following warning is issued:
The example of a simple spider that results in the warning:
I know it's a bit artificial example as the nested function can be moved, but there is nothing wrong with nested function conceptually.
Motivation
I have a midsize spider function that includes some nested helper functions that I'd like to keep close to where they are called.
Describe alternatives you've considered
Moving nested function out of the generator is an easy fix, but it constrains expressivity of the code.
Additional context
Related function: is_generator_with_return_value
The text was updated successfully, but these errors were encountered: