The to_item
method of a page object class <page-object-classes>
must return an item.
An item is a data container object supported by the itemadapter library, such as a dict
, an attrs class, or a ~dataclasses.dataclass
class. For example:
@attrs.define
class MyItem:
foo: int
bar: str
Because itemadapter allows implementing support for arbitrary classes, any kind of Python object can potentially work as an item.
When inheriting from ~.ItemPage
, indicate the item class to return between brackets:
@attrs.define
class MyPage(ItemPage[MyItem]):
...
~.ItemPage.to_item
builds an instance of the specified item class based on the page object class fields <fields>
.
page = MyPage(...)
item = await page.to_item()
assert isinstance(item, MyItem)
You can also define ~.ItemPage
subclasses that are not meant to be used, only subclassed, and not annotate ~.ItemPage
in them. You can then annotate those classes when subclassing them:
@attrs.define
class MyBasePage(ItemPage):
...
@attrs.define
class MyPage(MyBasePage[MyItem]):
...
To change the item class of a subclass that has already defined its item class, use ~.Returns
:
@attrs.define
class MyOtherPage(MyPage, Returns[MyOtherItem]):
...
To keep your code maintainable, we recommend you to:
- Instead of
dict
, use proper item classes based ondataclasses
orattrs <attrs:index>
, to make it easier to detect issues like field name typos or missing required fields. Reuse item classes.
For example, if you want to extract product details data from 2 e-commerce websites, try to use the same item class for both of them. Or at least try to define a base item class with shared fields, and only keep website-specific fields in website-specific items.
Keep item classes as logic-free as possible.
For example, any parsing and field cleanup logic is better handled through
page object classes <page-object-classes>
, e.g. usingfield processors <field-processors>
.Having code that makes item field values different from their counterpart page object field values can subvert the expectations of users of your code, which might need to access page object fields directly, for example for field subset selection.
If you are looking for ready-made item classes, check out zyte-common-items.