Skip to content

exceptions.AttributeError: 'dict' object has no attribute 'fields #86

Closed
dfdeshom opened this Issue Feb 3, 2012 · 9 comments

4 participants

@dfdeshom
dfdeshom commented Feb 3, 2012

Hi,
we were running into problems where we would get the following exception when crawling urls:

Traceback (most recent call last):
[...]

    field = item.fields[field_name]
exceptions.AttributeError: 'dict' object has no attribute 'fields'

After some digging, we found that the exception was caused by the newly-introduced item exporters, where the default is to export the item as json. This exception makese more sense:

 ERROR: Error caught on signal handler: <bound method ?.item_scraped of <scrapy.contrib.feedexport.FeedExporter object at 0x2a90cd0>>
Traceback (most recent call last):
[...]
File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
exceptions.TypeError: datetime.datetime(2012, 2, 2, 0, 0, tzinfo=<UTC>) is not JSON serializable

The item exporters extension scrapy.contrib.feedexport.FeedExporter should be either disabled by default or changed to a format that is guaranteed not to fail (like pickle), since not all items are json-serializable.

@piteer1
piteer1 commented Apr 10, 2012

I can confirm similar issue in master branch:

ERROR: Error caught on signal handler: <bound method ?.item_scraped of <scrapy.contrib.feedexport.FeedExporter object at 0x27b9b10>>
        Traceback (most recent call last):
          File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 249, in addCallbacks
            self._runCallbacks()
          File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 441, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
          File "/home/carocean/scrapy/scrapy/core/scraper.py", line 208, in _itemproc_finished
            item=output, response=response, spider=spider)
          File "/home/carocean/scrapy/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
            *arguments, **named)
        --- <exception caught here> ---
          File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 125, in maybeDeferred
            result = f(*args, **kw)
          File "/home/carocean/scrapy/scrapy/xlib/pydispatch/robustapply.py", line 47, in robustApply
            return receiver(*arguments, **named)
          File "/home/carocean/scrapy/scrapy/contrib/feedexport.py", line 177, in item_scraped
            slot.exporter.export_item(item)
          File "/home/carocean/scrapy/scrapy/contrib/exporter/__init__.py", line 85, in export_item
            itemdict = dict(self._get_serialized_fields(item))
          File "/home/carocean/scrapy/scrapy/contrib/exporter/__init__.py", line 59, in _get_serialized_fields
            field_iter = item.iterkeys()
        exceptions.AttributeError: 'NoneType' object has no attribute 'iterkeys'

It would be nice if this extension would be just disabled by default.

@pablohoffman
Scrapy project member

In Scrapy, it is disabled by default, unless you configure the FEED_URI setting or pass the -o option to scrapy command.

Are you getting this error when running from Scrapyd?

@dfdeshom

Hi, to disable scrapy.contrib.feedexport.FeedExporter, I had to set it to None in my EXTENSIONS dict instead of 0. I am getting this error when using scrapyd.

@dfdeshom

off-topic but: in general to disable any extension, middleware, etc I have to set it to None, which can be a little confusing since you would think that 0 would work too

@piteer1
piteer1 commented Apr 29, 2012

@pablohoffman Hello, yes i was running it from Scrapyd. It was looking like this extension was enabled even if i don't have any FEED_URI setting configured. I went around this problem by changing the default scrapy settings (removed this exporter from list of enabled ones)

@pablohoffman
Scrapy project member

@dfdeshom I agree that 0 should also disable middlewares, and I'm happy to merge a pull request for this change
@piteer1 Yes, Scrapyd since 0.15 stores items by default using the JSON serializer. We made this change because we thought it's more useful to have the data stored by default in the most common case, so you have a complete working environment when running spiders on scrapyd. Otherwise, by default, items are just scraped and not stored anywhere. If you want to disable this set the items_dir scrapyd setting to none as the documentation says: http://doc.scrapy.org/en/latest/topics/scrapyd.html#items-dir (but check out the latest code because I just submitted a fix).

@nmalzieu
nmalzieu commented Jan 2, 2013

Hi @pablohoffman , sorry to disturb you on a closed feed, but can we totally disable the scrapy.contrib.feedexport.FeedExporter with scrapyd ? I use DEBUG log but I don't want to dump the serialized items in my logs and I store them in a mongo database with a pipeline.
Thank you !

@pablohoffman
Scrapy project member

@nmalzieu can you ask this in the mailing list?

@nmalzieu
nmalzieu commented Jan 2, 2013

@pablohoffman OK thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.