Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ItemLoader: zeros as field values #2498

Closed
medse opened this issue Jan 15, 2017 · 4 comments
Closed

ItemLoader: zeros as field values #2498

medse opened this issue Jan 15, 2017 · 4 comments

Comments

@medse
Copy link

medse commented Jan 15, 2017

I'm scraping a eshop, there's a price xpath for an item's price, which is empty if the item is out of stock.
I use ItemLoader, and add_xpath():

item.add_xpath('price', './/span[@class="price rub"]/text()')

I want to set the price to 0.0 if the item is missing, for this I check if the xpath is empty in the price_in declaration within the ItemLoader-based class:

price_in=Compose(TakeFirst(), lambda _: float(_) if _ else 0)
But the value of zero isn't stored in the _values dict because of the following code in the scrapy/loaders/init.py:_add_value():

    def _add_value(self, field_name, value):
        value = arg_to_iter(value)
        processed_value = self._process_input_value(field_name, value)
        if processed_value:
            self._values[field_name] += arg_to_iter(processed_value)

I don't know why the logic is like this, but it won't store neither zeros nor empty strings. Is it illegal? I use scrapy for about a week, so don't know the usage practices at all, but this seems strange to me.
Maybe change the condition to “processed_value is not None”?

@kmike
Copy link
Member

kmike commented Jan 15, 2017

See also: #741

@medse
Copy link
Author

medse commented Jan 15, 2017

Thanks, @kmike .
If I understand correctly, it's exactly the same problem (and the same fix incidentally:)) But it's not merged for 2.5 years?

@IAlwaysBeCoding
Copy link
Contributor

I do a lot of scraping as well from ecommerce stores(that is my specialty). I pass the Item instance after loading it through this function

def default_missing_keys(item, default_value='', except_keys=[]):

    missing_keys = list(set(item.fields.keys()) - set(item.keys()))
    for missing_key in missing_keys:
        if except_keys:
            if missing_key not in except_keys:
                item[missing_key] = default_value
        else:
            item[missing_key] = default_value

Essentially, I get the keys from the Item class and minus the keys found. This gives me all of the missing keys, and then I just fill a default value.

It is an ugly hack, but oh well at the moment Scrapy doesn't have a default for missing keys.

@wRAR
Copy link
Member

wRAR commented Oct 28, 2023

scrapy/itemloaders#73

@wRAR wRAR closed this as completed Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants