Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

itemadapter #13

Merged
merged 19 commits into from Jul 1, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 11 additions & 1 deletion docs/declaring-loaders.rst
Expand Up @@ -22,6 +22,12 @@ Item Loaders are declared by using a class definition syntax. Here is an example
def price_out(self, values):
return float(values[0])

loader = ProductLoader()
loader.add_value('name', 'plasma TV')
loader.add_value('price', '999.98')
loader.load_item()
# {'name': 'Plasma Tv', 'price': 999.98}

As you can see, input processors are declared using the ``_in`` suffix while
output processors are declared using the ``_out`` suffix. And you can also
declare a default input/output processors using the
Expand All @@ -32,7 +38,11 @@ The precedence order, for both input and output processors, is as follows:

1. Item Loader field-specific attributes: ``field_in`` and ``field_out`` (most
precedence)
2. Item Loader defaults: :meth:`ItemLoader.default_input_processor` and
2. Field metadata (``input_processor`` and ``output_processor`` keys).
Check out
`itemadapter field metadata <https://github.com/scrapy/itemadapter#metadata-support>`_
for more information.
3. Item Loader defaults: :meth:`ItemLoader.default_input_processor` and
:meth:`ItemLoader.default_output_processor` (least precedence)

See also: :ref:`extending-loaders`.
10 changes: 9 additions & 1 deletion docs/index.rst
Expand Up @@ -13,6 +13,14 @@ To install ``itemloaders``, run::

pip install itemloaders

.. note:: Under the hood, ``itemloaders`` uses
`itemadapter <https://github.com/scrapy/itemadapter>`_ as a common interface.
This means you can use any of the types supported by ``itemadapter`` here.

.. warning:: ``dataclasses`` and ``attrs`` support is still experimental.
Please, refer to :attr:`~ItemLoader.default_item_class` in the
:ref:`api-reference` for more information.


Getting Started with ``itemloaders``
====================================
Expand All @@ -28,7 +36,7 @@ CSS or XPath Selectors. You can add more than one value to
the same item field; the Item Loader will know how to "join" those values later
using a proper processing function.

.. note:: Collected data is internally stored as lists,
.. note:: Collected data is stored internally as lists,
allowing to add several values to the same field.
If an ``item`` argument is passed when creating a loader,
each of the item's values will be stored as-is if it's already
Expand Down
49 changes: 33 additions & 16 deletions itemloaders/__init__.py
Expand Up @@ -6,6 +6,7 @@
from collections import defaultdict
from contextlib import suppress

from itemadapter import ItemAdapter
from parsel.utils import extract_regex, flatten

from itemloaders.common import wrap_loader_context
Expand Down Expand Up @@ -63,6 +64,19 @@ class ItemLoader:
An Item class (or factory), used to instantiate items when not given in
the ``__init__`` method.

.. warning:: Currently, this factory/class needs to be
callable/instantiated without any arguments.
If you are using ``dataclasses``, please consider the following
alternative::

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Product:
name: Optional[str] = field(default=None)
price: Optional[float] = field(default=None)

.. attribute:: default_input_processor

The default input processor to use for those fields which don't specify
Expand Down Expand Up @@ -91,12 +105,13 @@ def __init__(self, item=None, selector=None, parent=None, **context):
context.update(selector=selector)
if item is None:
item = self.default_item_class()
self._local_item = item
context['item'] = item
self.context = context
self.parent = parent
self._local_item = context['item'] = item
self._local_values = defaultdict(list)
# values from initial item
for field_name, value in item.items():
for field_name, value in ItemAdapter(item).items():
self._values[field_name] += arg_to_iter(value)

@property
Expand Down Expand Up @@ -242,14 +257,13 @@ def load_item(self):
data collected is first passed through the :ref:`output processors
<processors>` to get the final value to assign to each item field.
"""
item = self.item
adapter = ItemAdapter(self.item)
for field_name in tuple(self._values):
value = self.get_output_value(field_name)
if value is not None:
print(type(value))
item[field_name] = value
adapter[field_name] = value

return item
return adapter.item

def get_output_value(self, field_name):
"""
Expand All @@ -269,25 +283,28 @@ def get_collected_values(self, field_name):
return self._values[field_name]

def get_input_processor(self, field_name):
"""Return the input processor for the given field."""
proc = getattr(self, '%s_in' % field_name, None)
if not proc:
proc = self.get_default_input_processor_for_field(field_name)
proc = self._get_item_field_attr(
field_name,
'input_processor',
self.default_input_processor
)
ejulio marked this conversation as resolved.
Show resolved Hide resolved
return unbound_method(proc)

def get_default_input_processor_for_field(self, field_name):
return self.default_input_processor

def get_output_processor(self, field_name):
"""Return the output processor for the given field."""
proc = getattr(self, '%s_out' % field_name, None)
if not proc:
proc = self.get_default_output_processor_for_field(field_name)

proc = self._get_item_field_attr(
field_name,
'output_processor',
self.default_output_processor
)
return unbound_method(proc)

def get_default_output_processor_for_field(self, field_name):
return self.default_output_processor
def _get_item_field_attr(self, field_name, key, default=None):
field_meta = ItemAdapter(self.item).get_field_meta(field_name)
return field_meta.get(key, default)

def _process_input_value(self, field_name, value):
proc = self.get_input_processor(field_name)
Expand Down
2 changes: 1 addition & 1 deletion requirements-dev.txt
@@ -1,7 +1,7 @@
w3lib>=1.21.0
parsel>=1.5.2
jmespath>=0.9.5

itemadapter>=0.1.0

pytest==5.4.1
flake8==3.7.9
3 changes: 2 additions & 1 deletion setup.py
Expand Up @@ -40,7 +40,8 @@
# scrapy's requirements
'w3lib>=1.17.0',
'parsel>=1.5.0',
'jmespath>=0.9.5'
'jmespath>=0.9.5',
'itemadapter>=0.1.0',
],
# extras_require=extras_require,
)