Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on api_response #10

Closed
farovictor opened this issue Oct 19, 2023 · 2 comments
Closed

Error on api_response #10

farovictor opened this issue Oct 19, 2023 · 2 comments

Comments

@farovictor
Copy link

I'm having the following issue when scraping yielding ScrapflyScrapyRequest:

ERROR scraper.py:246 Error downloading <GET https://immobilienscout24.de/expose/146870274>
Traceback (most recent call last):
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapfly/api_response.py", line 105, in __call__
    return self.content_loader(content)
  File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapfly/api_response.py", line 51, in _date_parser
    value[k] = _date_parser(v)
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapfly/api_response.py", line 53, in _date_parser
    value[k] = v
TypeError: 'bytes' object does not support item assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapy/core/downloader/middleware.py", line 75, in process_exception
    response = yield deferred_from_coro(method(request=request, exception=exception, spider=spider))
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapfly/scrapy/middleware.py", line 70, in process_exception
    raise exception
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks
    result = context.run(
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapy/core/downloader/middleware.py", line 49, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapfly/scrapy/downloader.py", line 82, in on_body_downloaded
    scrapfly_api_response:ScrapeApiResponse = spider.scrapfly_client._handle_response(
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapfly/client.py", line 295, in _handle_response
    api_response = self._handle_api_response(
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapfly/client.py", line 453, in _handle_api_response
    body = self.body_handler(response.content)
  File "/Users/dev/scraper/.venv/lib/python3.9/site-packages/scrapfly/api_response.py", line 107, in __call__
    raise EncoderError(content=content.decode('utf-8')) from e
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0: invalid start byte

This error is present in 2% of my total requests and it's completely random some URLs may hit this error in a few tries, but in most cases, they don't repeat.

Environment Setup:

  • python = 3.9.9
  • MacOS = Apple M2
  • scrapfly-sdk = {extras = ["all"], version = "^0.8.9"}
@jjsaunier
Copy link
Member

I have batched 1k request against your target, with no issue.

I have added better support to catch the text representation of this faulty binary issue 712b37a#diff-42acd60fcbec2f0f0da4cdc1f17124c2696319b3868b29b7026c76b25d675dfeR111, If you get it, you can share me the base64.

If you can, share a minimum setup to get the same condition as yours (with a poetry lock or requirement.txt with a fixed version). Because in your stack trace, you have /twisted/internet/, I guess there is scrapy involved? (I tested regular SDK and Scrapy integration)

@farovictor
Copy link
Author

We have moved to a more up-to-date version of Scrapy and since we have no occurrences of this. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants