Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError module, class, method, function, traceback, frame, or code object was expected, got partial #5592

Closed
tonal opened this issue Aug 7, 2022 · 4 comments · Fixed by #5599

Comments

@tonal
Copy link
Contributor

tonal commented Aug 7, 2022

Description

Error when responce calbeck is patial

Steps to Reproduce

  1. Config calback:
yield FormRequest(
      talc_get_products, method='POST', formdata=dict(slug='paneli'),
      dont_filter=True, callback=partial(
        self.parse_list_detail,
        parse_detail=partial(
          self.parse_detail, price_css='.product-mod-summ-price ::text',
          price_css_arg=dict(re=self.re_price))))
  1. In log:
2022-08-07 12:50:32,493 [scrapy.core.scraper] ERROR: Spider error processing <POST https://www.talc.ru/wp-content/themes/kurna-tut/js/ajax/get-products.php> (referer: None)
Traceback (most recent call last):
  File "/home/user/projects/remains_grab/remains/venv/lib/python3.10/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration: <200 https://www.talc.ru/wp-content/themes/kurna-tut/js/ajax/get-products.php>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/projects/remains_grab/remains/venv/lib/python3.10/site-packages/scrapy/utils/defer.py", line 67, in mustbe_deferred
    result = f(*args, **kw)
  File "/home/user/projects/remains_grab/remains/venv/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 61, in _process_spider_input
    return scrape_func(response, request, spider)
  File "/home/user/projects/remains_grab/remains/venv/lib/python3.10/site-packages/scrapy/core/scraper.py", line 157, in call_spider
    warn_on_generator_with_return_value(spider, callback)
  File "/home/user/projects/remains_grab/remains/venv/lib/python3.10/site-packages/scrapy/utils/misc.py", line 246, in warn_on_generator_with_return_value
    if is_generator_with_return_value(callable):
  File "/home/user/projects/remains_grab/remains/venv/lib/python3.10/site-packages/scrapy/utils/misc.py", line 229, in is_generator_with_return_value
    code = re.sub(r"^[\t ]+", "", inspect.getsource(callable))
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/inspect.py", line 1147, in getsource
    lines, lnum = getsourcelines(object)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/inspect.py", line 1129, in getsourcelines
    lines, lnum = findsource(object)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/inspect.py", line 940, in findsource
    file = getsourcefile(object)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/inspect.py", line 817, in getsourcefile
    filename = getfile(object)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/inspect.py", line 797, in getfile
    raise TypeError('module, class, method, function, traceback, frame, or '

Expected behavior: [What you expect to happen]
Run callback

Actual behavior: [What actually happens]
Error

Reproduces how often: [What percentage of the time does it reproduce?]
Always

Versions

Please paste here the output of executing scrapy version --verbose in the command line.

$ scrapy version --verbose
Scrapy       : 2.6.2
lxml         : 4.9.1.0
libxml2      : 2.9.14
cssselect    : 1.1.0
parsel       : 1.6.0
w3lib        : 1.22.0
Twisted      : 22.4.0
Python       : 3.10.6 (main, Aug  3 2022, 10:03:21) [GCC 9.4.0]
pyOpenSSL    : 22.0.0 (OpenSSL 3.0.5 5 Jul 2022)
cryptography : 37.0.4
Platform     : Linux-5.15.0-43-generic-x86_64-with-glibc2.31

Additional context

Any additional information, configuration, data or output from commands that might be necessary to reproduce or understand the issue. Please try not to include screenshots of code or the command line, paste the contents as text instead. You can use GitHub Flavored Markdown to make the text look better.

@elacuesta
Copy link
Member

You shouldn't use partials as callbacks. It used to be the case that it could indirectly cause serialization issues (see #1138 (comment)), this exception is different because the code is newer but the point remains. Use Request.cb_kwargs instead to pass arguments across.

@tonal
Copy link
Contributor Author

tonal commented Aug 8, 2022

@elacuesta @kmike
Very unexpected when a function breaks - warn_on_generator_with_return_value - designed only to give a warning.
Why can't the function be checked for partials before issuing a warning?

@Gallaecio
Copy link
Member

I think it may be worth it failing more gracefully on partial functions.

@tonal
Copy link
Contributor Author

tonal commented Aug 16, 2022

My monky path:

def _manky_path_is_generator_with_return_value():
  import ast
  import inspect
  from functools import partial
  import re
  import scrapy.utils.misc as pathed

  _generator_callbacks_cache = pathed._generator_callbacks_cache
  walk_callable = pathed.walk_callable

  def is_generator_with_return_value(callable):
    """
    Returns True if a callable is a generator function which includes a
    'return' statement with a value different than None, False otherwise
    """
    if callable in _generator_callbacks_cache:
      return _generator_callbacks_cache[callable]

    def returns_none(return_node):
      value = return_node.value
      return (
        value is None or (
          isinstance(value, ast.NameConstant) and value.value is None))

    if inspect.isgeneratorfunction(callable):
      func = callable
      while isinstance(func, partial):
        func = func.func

      code = re.sub(r"^[\t ]+", "", inspect.getsource(func)) # callable))
      tree = ast.parse(code)
      for node in walk_callable(tree):
        if isinstance(node, ast.Return) and not returns_none(node):
          _generator_callbacks_cache[callable] = True
          return _generator_callbacks_cache[callable]

    _generator_callbacks_cache[callable] = False
    return _generator_callbacks_cache[callable]

  pathed.is_generator_with_return_value = is_generator_with_return_value

tonal added a commit to tonal/scrapy that referenced this issue Aug 16, 2022
@elacuesta elacuesta linked a pull request Oct 20, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants