New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is_generator_with_return_value raises IndentationError with a flush left doc string #4477
Comments
I can reproduce: import scrapy
class GeneratorSpider(scrapy.Spider):
name = "generator"
start_urls = ["https://example.org"]
def parse(self, response):
"""
docstring
"""
yield {"url": response.url}
|
Seems to me like the following should work fine for docstrings: diff --git scrapy/utils/misc.py scrapy/utils/misc.py
index 52cfba20..fa3d31cc 100644
--- scrapy/utils/misc.py
+++ scrapy/utils/misc.py
@@ -184,7 +184,10 @@ def is_generator_with_return_value(callable):
return value is None or isinstance(value, ast.NameConstant) and value.value is None
if inspect.isgeneratorfunction(callable):
- tree = ast.parse(dedent(inspect.getsource(callable)))
+ code = inspect.getsource(callable)
+ if callable.__doc__:
+ code = code.replace(callable.__doc__, "")
+ tree = ast.parse(dedent(code))
for node in ast.walk(tree):
if isinstance(node, ast.Return) and not returns_none(node):
_generator_callbacks_cache[callable] = True Unfortunately, I found that multiline strings are also a problem. This snippet also raises the exception: import scrapy
class GeneratorSpider(scrapy.Spider):
name = "generator"
start_urls = ["https://example.org"]
def parse(self, response):
url = """
docstring
"""
yield {"url": url} The following works, but I don't really like it: diff --git scrapy/utils/misc.py scrapy/utils/misc.py
index 52cfba20..881e6c07 100644
--- scrapy/utils/misc.py
+++ scrapy/utils/misc.py
@@ -184,7 +184,8 @@ def is_generator_with_return_value(callable):
return value is None or isinstance(value, ast.NameConstant) and value.value is None
if inspect.isgeneratorfunction(callable):
- tree = ast.parse(dedent(inspect.getsource(callable)))
+ code = "class _:\n" + dedent(inspect.getsource(callable))
+ tree = ast.parse(code)
for node in ast.walk(tree):
if isinstance(node, ast.Return) and not returns_none(node):
_generator_callbacks_cache[callable] = True Thoughts? |
It appears based on some lightweight testing that just removing all the leading indentation from the if inspect.isgeneratorfunction(callable):
code = inspect.getsource(callable)
code = re.sub(r"^\s+", "", code)
tree = ast.parse(code) since it turns this: def parse(self, response):
url = """
docstring
"""
yield {"url": url} into def parse(self, response):
url = """
docstring
"""
yield {"url": url} |
…triggering a bug in the Scrapy library. 'scrapy/scrapy#4477'
@elacuesta I think your workaround is quite smart. Although I wonder if maybe it could fail in some corner cases, such as callbacks that are not actually spider methods but external functions. Still, if it solves more problems than it causes, it may be worth it. |
Description
Code that is accepted by the python interpreter raises when fed through
textwrap.dedent
Steps to Reproduce
is_generator_bug.py
with the content below (which I simplified from theis_generator_with_return_value
method bodypython is_generator_bug.py
Expected behavior: [What you expect to happen]
No Error
Actual behavior: [What actually happens]
Reproduces how often: [What percentage of the time does it reproduce?]
100%
Versions
Additional context
The text was updated successfully, but these errors were encountered: