Skip to content

Scrapyrt freezes when bytes passed to item field. #95

@rythm-of-the-red-man

Description

@rythm-of-the-red-man

Im using scrapyrt in docker. When bytes string is passed to item loader scrapyrt freezes, and eventualy throws error:

2019-10-10 10:16:25+0000 [-] Unhandled error in Deferred:
2019-10-10 10:16:25+0000 [-] Unhandled Error
	Traceback (most recent call last):
	  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
	    _inlineCallbacks(r, g, status)
	  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1421, in _inlineCallbacks
	    status.deferred.callback(getattr(e, "value", None))
	  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 460, in callback
	    self._startRunCallbacks(result)
	  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
	    self._runCallbacks()
	--- <exception caught here> ---
	  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
	    current.result = callback(current.result, *args, **kw)
	  File "/usr/local/lib/python3.7/site-packages/scrapyrt/resources.py", line 37, in finish_request
	    request.write(self.render_object(obj, request))
	  File "/usr/local/lib/python3.7/site-packages/scrapyrt/resources.py", line 90, in render_object
	    r = self.json_encoder.encode(obj) + "\n"
	  File "/usr/local/lib/python3.7/json/encoder.py", line 199, in encode
	    chunks = self.iterencode(o, _one_shot=True)
	  File "/usr/local/lib/python3.7/json/encoder.py", line 257, in iterencode
	    return _iterencode(o, 0)
	  File "/usr/local/lib/python3.7/site-packages/scrapy/utils/serialize.py", line 36, in default
	    return super(ScrapyJSONEncoder, self).default(o)
	  File "/usr/local/lib/python3.7/json/encoder.py", line 179, in default
	    raise TypeError(f'Object of type {o.__class__.__name__} '
	builtins.TypeError: Object of type bytes is not JSON serializable

Way to reproduce looks as follows:
let item be:

class SplashScreenshotItem(scrapy.Item):
    some_image = scrapy.Field(output_processor=TakeFirst())

If I define spider as follows:

from scrapy import Request,  Spider
from scrapy.loader import ItemLoader
from scrapy_splash import SplashRequest
from ..items import SomeImageItem
import base64

class GoogleSpider(Spider):
    name = "googlespider"
    def parse(self,response):
        url = 'http://www.google.com'
        yield SplashRequest(url=url,
                            callback=self.return_png,
                            args={
                                'html': 1,
                                'png': 1,
                                'width': 600, },
                            endpoint='render.json'
                            )
    def return_png(self,response):
        new = ItemLoader(item=SplashScreenshotItem())
        new.add_value('screenshot',base64.b64decode(response.data['png']))
        yield new.load_item()

API freezes, and returns no response until i shut down container.

Metadata

Metadata

Assignees

Labels

bugproject maintainers identified this issue as potential bug in project

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions