Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter generator check for combined yield/return statements: ignore nested functions #4720

Closed
soid opened this issue Aug 11, 2020 · 0 comments · Fixed by #4721
Closed

Smarter generator check for combined yield/return statements: ignore nested functions #4720

soid opened this issue Aug 11, 2020 · 0 comments · Fixed by #4721

Comments

@soid
Copy link
Contributor

soid commented Aug 11, 2020

Summary

Currently, if spider methods are generators that yield results and contain nested function, then the following warning is issued:

[py.warnings] WARNING: /Library/Python/3.7/site-packages/scrapy/core/scraper.py:148: UserWarning: The "MySpider.parse" method is a generator and includes a "return" statement with a value different than None. This could lead to unexpected behaviour. Please see https://docs.python.org/3/reference/simple_stmts.html#the-return-statement for details about the semantics of the "return" statement within generators
  warn_on_generator_with_return_value(spider, callback)

The example of a simple spider that results in the warning:

import scrapy

class MySpider(scrapy.Spider):
    name = "MySpider"
    start_urls = ["https://scrapy.org"]
    
    def parse(self, response):
        
        def is_external(url):
            href = url.css('::attr(href)').get()
            return href.startswith('http') and 'scrapy.org' not in href
        
        links = [link for link in response.css('a') if is_external(link)]
        for link in links:
            yield {'link': link.css('::attr(href)').get(), 'text': link.css('::text').get()}

I know it's a bit artificial example as the nested function can be moved, but there is nothing wrong with nested function conceptually.

Motivation

I have a midsize spider function that includes some nested helper functions that I'd like to keep close to where they are called.

Describe alternatives you've considered

Moving nested function out of the generator is an easy fix, but it constrains expressivity of the code.

Additional context

Related function: is_generator_with_return_value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant