# [Scrapy Beginners Series Part 2: Cleaning Dirty Data & Dealing With Edge Cases](https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-cleaning-data/)

# Scrapy 初心者向けシリーズ パート 2: ダーティ データのクリーニングとエッジ ケースへの対処

## エッジケースに対処する戦略

- サイトのいくつかの問題
  - 在庫切れや価格が表示されないものがある。
  - 商品の価格がポンドで表示されているが、米ドルで表示させたい。
  - 商品のURLが、絶対URLではなく、相対URLになっている。
  - 一部の商品が重複している。

&nbsp;

- いくつかの対処方法
  - Try/Except
    - パーサーの一部を Try/Except ブロックで囲むことができるので、特定のフィールドのスクレイピングにエラーが発生した場合、別のパーサーに戻すことができる。
  - 条件付き構文解析
    - スパイダーにHTMLレスポンスに特定のDOM要素がないかチェックさせ、状況に応じて特定のパーサーを使用することができる。
  - Item Loaders
    - Item Loadersは、データを解析しながらクリーニングや加工を行うことができる。
  - Item Pipelines
    - Item Pipelinesを使用すると、データを保存する前に、データのクリーニング、操作、検証を行う一連の後処理ステップを設計できる。
  - データ解析時のクリーン化
    - 関連するすべてのフィールドについてデータを解析し、あとでデータ分析パイプラインでデータをクリーンアップすることができる。

## ScrapyのItemsでデータを整理する

- データをより構造的に保存する。
- Scrapy Item PipelinesとItem Loadersをより簡単に使用できるようになる。
- [Spidermon](https://scrapeops.io/python-scrapy-playbook/extensions/scrapy-spidermon-guide)のようなScrapyの拡張機能を使ってユニットテストを構成することができる。

プロジェクトの作成

```bash
scrapy startproject chocolatescraper
```

ディレクトリの移動

```bash
cd chocolatescraper
```

汎用スパイダーの作成

```bash
scrapy genspider chocolatespider chocolate.co.uk/collections/all
```

In [None]:
# item.py

import scrapy

class ChocolateProduct(scrapy.Item):
   name = scrapy.Field()
   price = scrapy.Field()
   url = scrapy.Field()

In [None]:
# chocolatespider.py

import scrapy
from chocolatescraper.items import ChocolateProduct

class ChocolateSpider(scrapy.Spider):

    #the name of the spider
    name = 'chocolatespider'

    #these are the urls that we will start scraping
    start_urls = ['https://www.chocolate.co.uk/collections/all']

def parse(self, response):

    products = response.css('product-item')

    product_item = ChocolateProduct()
    for product in products:
        product_item['name'] = product.css('a.product-item-meta__title::text').get()
        product_item['price'] = product.css('span.price').get().replace('<span class="price">\n              <span class="visually-hidden">Sale price</span>','').replace('</span>','')
        product_item['url'] = product.css('div.product-item-meta a').attrib['href']
        yield product_item

    next_page = response.css('[rel="next"] ::attr(href)').get()

    if next_page is not None:
        next_page_url = 'https://www.chocolate.co.uk' + next_page
        yield response.follow(next_page_url, callback=self.parse)

## ScrapyのItem Loadersでデータを前処理する

- [Item Loaders](https://docs.scrapy.org/en/latest/topics/loaders.html)
  - キャラクターを削除する
    - 解析されたデータから余分なタグや特殊文字を削除すること。
    - 例えば、通貨記号を削除する。
  - Type変換
    - stringをintに変換する。
  - URL変換
    - URLを相対URLから絶対URLに変更する。
  - フィールドの組み合わせ
    - スクレイピングした2つ以上の情報を1つのフィールドに統合する。
  - バリューの置き換え
    - ある値を別の値に置き換えること。
    - 例えば、`$`記号を`£`記号に置き換えます。
  - 単位変換
    - 数値の単位を変換すること。
    - 例えば、`120g`の文字列を`0.12KG`に変換する。
  - データの追加
    - 項目値の前や最後に値を追加すること。
    - 例えば、数値の末尾に「キログラム」を追加する。

### Item Loaderの例

- スクレイピングするデータから`£`記号を削除する。
- スクレイピングされた相対的なURLを完全な絶対的なURLに変換する。

In [None]:
# itemsloaders.py

from itemloaders.processors import TakeFirst, MapCompose
from scrapy.loader import ItemLoader

class ChocolateProductLoader(ItemLoader):

    default_output_processor = TakeFirst()
    price_in = MapCompose(lambda x: x.split("£")[-1])
    url_in = MapCompose(lambda x: 'https://www.chocolate.co.uk' + x )

特定のフィールドに対するプロセッサは、入力プロセッサと出力プロセッサの接尾辞である`_in`と`_out`を使用して定義します。

この例では、`price`と`url`の両フィールドに対して入力プロセッサを宣言しています。

- price: `price`入力プロセッサは、`£`記号で渡された値を分割し、2番目の値を使用する。
- URL: `url` 入力プロセッサは、渡された相対urlをbase urlに追加する。

In [None]:
# chocolatespider.py

import scrapy
from chocolatescraper.itemloaders import ChocolateProductLoader
from chocolatescraper.items import ChocolateProduct


class ChocolateSpider(scrapy.Spider):

    # The name of the spider
    name = 'chocolatespider'

    # These are the urls that we will start scraping
    start_urls = ['https://www.chocolate.co.uk/collections/all']

def parse(self, response):
    products = response.css('product-item')

    for product in products:
        chocolate = ChocolateProductLoader(item=ChocolateProduct(), selector=product)
        chocolate.add_css('name', "a.product-item-meta__title::text")
        chocolate.add_css('price', 'span.price', re='<span class="price">\n              <span class="visually-hidden">Sale price</span>(.*)</span>')
        chocolate.add_css('url', 'div.product-item-meta a::attr(href)')
        yield chocolate.load_item()

    next_page = response.css('[rel="next"] ::attr(href)').get()

    if next_page is not None:
        next_page_url = 'https://www.chocolate.co.uk' + next_page
        yield response.follow(next_page_url, callback=self.parse)

## Scrapy Item Pipelinesでデータを処理する。

ItemとItem Loadersを使ったスパイダーができたので、Item Pipelinesを使って、スクレイピングしたデータを保存する前に操作してみます。

Item Pipelinesを使用して、以下のことを行います：

- `price`を文字列から浮動小数点に変換する（為替レートを乗じることができるようにする）。
- `price`を英ポンドから米ドルに変換する。
- 現在価格がないアイテム（売り切れのため）を削除する。
- アイテムが重複しているかどうかをチェックし、重複している場合は削除する。

### 価格を換算する

- ItemがSpiderによってスクレイピングされると、検証と処理のために`Item Pipeline`に送られる。
- 各`Item Pipeline`は、`process_item`というシンプルなメソッドを実装したPythonのクラス。
- `process_item`メソッドは、Itemを取り込み、そのItemに対してアクションを実行し、そのItemをパイプラインで継続するかドロップするかを決定する。
- 以下のパイプラインでは、"ChocolateProduct" Itemを取得し、価格をフロートに変換し、スクラップした価格に為替レートをかけてポンドからドルに変換している。
- パイプラインはプロジェクト内のpipelines.pyファイル内にあり、上記で作業したitemloaders.pyとitem.pyファイルと同じフォルダレベルにあります。

In [None]:
# pipelines.py

from itemadapter import ItemAdapter
from scrapy.exceptions import DropItem


class PriceToUSDPipeline:

    gbpToUsdRate = 1.3

    def process_item(self, item, spider):
        adapter = ItemAdapter(item)

        ## check is price present
        if adapter.get('price'):

            #converting the price to a float
            floatPrice = float(adapter['price'])

            #converting the price from gbp to usd using our hard coded exchange rate
            adapter['price'] = floatPrice * self.gbpToUsdRate

            return item

        else:
            # drop item if no price
            raise DropItem(f"Missing price in {item}")

### 重複を削除する

- 重複する"ChocolateProduct"のItemを削除するために、商品の名前をチェックし、もしその名前がすでに存在する場合は、Itemを削除する（出力には返されません）。
- この2つ目のパイプラインクラスも、プロジェクト内の同じpipelines.pyファイルに入る。

In [None]:
# pipelines.py

class DuplicatesPipeline:

    def __init__(self):
        self.names_seen = set()

    def process_item(self, item, spider):
        adapter = ItemAdapter(item)
        if adapter['name'] in self.names_seen:
            raise DropItem(f"Duplicate item found: {item!r}")
        else:
            self.names_seen.add(adapter['name'])
            return item

### パイプラインの充実

- 作成した`PriceToUSDPipeline`と`DuplicatesPipeline`を有効にするには、`settings.py`ファイルの`ITEM_PIPELINES`の設定に追加する必要がある。

In [None]:
# settings.py

ITEM_PIPELINES = {
    'chocolatescraper.pipelines.PriceToUSDPipeline': 100,
    'chocolatescraper.pipelines.DuplicatesPipeline': 200,
}

- この設定でクラスに割り当てる整数値は、実行順序を決定する。
- アイテムは、低い値のクラスから高い値のクラスへと移動する。
- 一般的には、0～1000の範囲で定義する。

## データ処理のテスト

- スパイダーを実行すると、
- Item Loadersがデータをクリーニングし、
- Item Pipelineが価格データを変換して重複を削除したあと、
- すべてのチョコレートがクロールされ、
- 価格がドル建てで表示されている。

Spiderの実行

```bash
scrapy crawl chocolatespider
```

```bash
2023-06-04 21:15:48 [scrapy.utils.log] INFO: Scrapy 2.9.0 started (bot: chocolatescraper)
2023-06-04 21:15:48 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.1, Twisted 22.10.0, Python 3.10.11 (main, Apr 24 2023, 17:34:58) [Clang 14.0.3 (clang-1403.0.22.14.1)], pyOpenSSL 23.1.1 (OpenSSL 3.1.0 14 Mar 2023), cryptography 40.0.2, Platform macOS-13.3.1-x86_64-i386-64bit
2023-06-04 21:15:48 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'chocolatescraper',
 'FEED_EXPORT_ENCODING': 'utf-8',
 'NEWSPIDER_MODULE': 'chocolatescraper.spiders',
 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['chocolatescraper.spiders'],
 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2023-06-04 21:15:48 [asyncio] DEBUG: Using selector: KqueueSelector
2023-06-04 21:15:48 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2023-06-04 21:15:48 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop
2023-06-04 21:15:48 [scrapy.extensions.telnet] INFO: Telnet Password: 702a4199dad145b1
2023-06-04 21:15:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2023-06-04 21:15:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2023-06-04 21:15:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-06-04 21:15:49 [scrapy.middleware] INFO: Enabled item pipelines:
['chocolatescraper.pipelines.PriceToUSDPipeline',
 'chocolatescraper.pipelines.DuplicatesPipeline']
2023-06-04 21:15:49 [scrapy.core.engine] INFO: Spider opened
2023-06-04 21:15:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-06-04 21:15:49 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-06-04 21:15:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.chocolate.co.uk/robots.txt> from <GET https://chocolate.co.uk/robots.txt>
2023-06-04 21:15:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.chocolate.co.uk/robots.txt> (referer: None)
2023-06-04 21:15:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.chocolate.co.uk/collections/all> from <GET https://chocolate.co.uk/collections/all>
2023-06-04 21:15:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.chocolate.co.uk/robots.txt> (referer: None)
2023-06-04 21:15:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.chocolate.co.uk/collections/all> (referer: None)
2023-06-04 21:15:52 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': '100% Dark Hot Chocolate Flakes',
 'url': 'https://www.chocolate.co.uk/products/100-dark-hot-chocolate-flakes'}
{'name': '100% Dark Hot Chocolate Flakes',
 'url': 'https://www.chocolate.co.uk/products/100-dark-hot-chocolate-flakes'}
2023-06-04 21:15:52 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': '2.5kg Bulk 41% Milk Hot Chocolate Drops',
 'url': 'https://www.chocolate.co.uk/products/2-5kg-bulk-of-our-41-milk-hot-chocolate-drops'}
{'name': '2.5kg Bulk 41% Milk Hot Chocolate Drops',
 'url': 'https://www.chocolate.co.uk/products/2-5kg-bulk-of-our-41-milk-hot-chocolate-drops'}
2023-06-04 21:15:52 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': '2.5kg Bulk 61% Dark Hot Chocolate Drops',
 'url': 'https://www.chocolate.co.uk/products/2-5kg-of-our-best-selling-61-dark-hot-chocolate-drops'}
{'name': '2.5kg Bulk 61% Dark Hot Chocolate Drops',
 'url': 'https://www.chocolate.co.uk/products/2-5kg-of-our-best-selling-61-dark-hot-chocolate-drops'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': '41% Milk Hot Chocolate Drops',
 'price': 11.375,
 'url': 'https://www.chocolate.co.uk/products/41-colombian-milk-hot-chocolate-drops'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': '61% Dark Hot Chocolate Drops',
 'price': 11.375,
 'url': 'https://www.chocolate.co.uk/products/62-dark-hot-chocolate'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': '70% Dark Hot Chocolate Flakes',
 'price': 12.934999999999999,
 'url': 'https://www.chocolate.co.uk/products/70-dark-hot-chocolate-flakes'}
2023-06-04 21:15:52 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Almost Perfect',
 'url': 'https://www.chocolate.co.uk/products/almost-perfect'}
{'name': 'Almost Perfect',
 'url': 'https://www.chocolate.co.uk/products/almost-perfect'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Assorted Chocolate Malt Balls',
 'price': 11.700000000000001,
 'url': 'https://www.chocolate.co.uk/products/assorted-chocolate-malt-balls'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Blonde Caramel',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/blonde-caramel-chocolate-bar'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Blonde Chocolate Honeycomb',
 'price': 11.700000000000001,
 'url': 'https://www.chocolate.co.uk/products/blonde-chocolate-honeycomb'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Blonde Chocolate Honeycomb - Bag',
 'price': 11.05,
 'url': 'https://www.chocolate.co.uk/products/blonde-chocolate-sea-salt-honeycomb'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Blonde Chocolate Malt Balls',
 'price': 11.700000000000001,
 'url': 'https://www.chocolate.co.uk/products/blonde-chocolate-malt-balls'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Blonde Chocolate Truffles',
 'price': 25.935,
 'url': 'https://www.chocolate.co.uk/products/blonde-chocolate-truffles'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Blonde Hot Chocolate Flakes',
 'price': 12.934999999999999,
 'url': 'https://www.chocolate.co.uk/products/blonde-hot-chocolate-flakes'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Bulk 41% Milk Hot Chocolate Drops 750 grams',
 'price': 22.75,
 'url': 'https://www.chocolate.co.uk/products/bulk-41-milk-hot-chocolate-drops-750-grams'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Bulk 61% Dark Hot Chocolate Drops 750 grams',
 'price': 22.75,
 'url': 'https://www.chocolate.co.uk/products/750-gram-bulk-61-dark-hot-chocolate-drops'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Caramelised Milk',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/caramelised-milk-chocolate-bar'}
2023-06-04 21:15:52 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Chocolate Caramelised Pecan Nuts',
 'url': 'https://www.chocolate.co.uk/products/chocolate-caramelised-pecan-nuts'}
{'name': 'Chocolate Caramelised Pecan Nuts',
 'url': 'https://www.chocolate.co.uk/products/chocolate-caramelised-pecan-nuts'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Chocolate Celebration Hamper',
 'price': 71.5,
 'url': 'https://www.chocolate.co.uk/products/celebration-hamper'}
2023-06-04 21:15:52 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Cinnamon Toast',
 'url': 'https://www.chocolate.co.uk/products/cinnamon-toast-chocolate-bar'}
{'name': 'Cinnamon Toast',
 'url': 'https://www.chocolate.co.uk/products/cinnamon-toast-chocolate-bar'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Collection of 4 of our Best Selling Chocolate Malt Balls',
 'price': 39.0,
 'url': 'https://www.chocolate.co.uk/products/collection-of-our-best-selling-chocolate-malt-balls'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Colombia 61%',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/colombian-dark-chocolate-bar'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all>
{'name': 'Colombian 41%',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/ecuadorian-41-milk-chocolate-bar'}
2023-06-04 21:15:52 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Crunchy Biscuit',
 'url': 'https://www.chocolate.co.uk/products/crunchy-biscuit-blonde-chocolate-bar'}
{'name': 'Crunchy Biscuit',
 'url': 'https://www.chocolate.co.uk/products/crunchy-biscuit-blonde-chocolate-bar'}
2023-06-04 21:15:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.chocolate.co.uk/collections/all?page=2> (referer: https://www.chocolate.co.uk/collections/all)
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Dark Chocolate Ginger',
 'price': 11.635,
 'url': 'https://www.chocolate.co.uk/products/dark-chocolate-ginger'}
2023-06-04 21:15:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Dark Chocolate Honeycomb',
 'price': 11.700000000000001,
 'url': 'https://www.chocolate.co.uk/products/dark-chocolate-honeycomb-1'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Dark Chocolate Honeycomb - Bag',
 'price': 11.05,
 'url': 'https://www.chocolate.co.uk/products/dark-chocolate-honeycomb'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Dark Chocolate Malt Balls',
 'price': 11.700000000000001,
 'url': 'https://www.chocolate.co.uk/products/dark-chocolate-malt-balls'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Dark Chocolate Orange Peel',
 'price': 11.635,
 'url': 'https://www.chocolate.co.uk/products/dark-chocolate-orange-peel'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Dark Chocolate Truffles',
 'price': 25.935,
 'url': 'https://www.chocolate.co.uk/products/dark-chocolate-truffles'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Flat White Coffee',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/flat-white-coffee'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Gift Voucher',
 'url': 'https://www.chocolate.co.uk/products/gift-voucher'}
{'name': 'Gift Voucher',
 'url': 'https://www.chocolate.co.uk/products/gift-voucher'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Ginger Biscuit',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/ginger-biscuit-chocolate-bar'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Honeycomb Crunch',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/honeycomb-crunch'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Hot Chocolate Shaker',
 'url': 'https://www.chocolate.co.uk/products/hot-chocolate-shaker-new'}
{'name': 'Hot Chocolate Shaker',
 'url': 'https://www.chocolate.co.uk/products/hot-chocolate-shaker-new'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'June Box of the Month',
 'url': 'https://www.chocolate.co.uk/products/box-of-the-month-subscription-1'}
{'name': 'June Box of the Month',
 'url': 'https://www.chocolate.co.uk/products/box-of-the-month-subscription-1'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Luxury Chocolate Hamper',
 'price': 110.5,
 'url': 'https://www.chocolate.co.uk/products/luxury-chocolate-hamper'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Luxury Chocolate Hamper with bottle of Champagne',
 'price': 130.0,
 'url': 'https://www.chocolate.co.uk/products/copy-of-luxury-chocolate-hamper'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Luxury Hot Chocolate Hamper',
 'url': 'https://www.chocolate.co.uk/products/luxury-hot-chocolate-hamper'}
{'name': 'Luxury Hot Chocolate Hamper',
 'url': 'https://www.chocolate.co.uk/products/luxury-hot-chocolate-hamper'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Marc De Champagne Truffles',
 'price': 25.935,
 'url': 'https://www.chocolate.co.uk/products/marc-de-champagne-truffles'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Marmalade',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/marmalade-orange-chocolate-bar'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Milk Chocolate Honeycomb',
 'price': 11.700000000000001,
 'url': 'https://www.chocolate.co.uk/products/milk-chocolate-honeycomb-2'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Milk Chocolate Honeycomb - Bag',
 'price': 11.05,
 'url': 'https://www.chocolate.co.uk/products/milk-chocolate-honeycomb'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Milk Chocolate Malt Balls',
 'price': 11.700000000000001,
 'url': 'https://www.chocolate.co.uk/products/milk-chocolate-malt-balls'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=2>
{'name': 'Milk Chocolate Truffles',
 'price': 25.935,
 'url': 'https://www.chocolate.co.uk/products/milk-chocolate-truffles'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Mixed Chocolate Mini Eggs',
 'url': 'https://www.chocolate.co.uk/products/mixed-milk-chocolate-mini-eggs'}
{'name': 'Mixed Chocolate Mini Eggs',
 'url': 'https://www.chocolate.co.uk/products/mixed-milk-chocolate-mini-eggs'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': "Mother's Day Mystery Hamper",
 'url': 'https://www.chocolate.co.uk/products/mystery-box'}
{'name': "Mother's Day Mystery Hamper",
 'url': 'https://www.chocolate.co.uk/products/mystery-box'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Mystery Box',
 'url': 'https://www.chocolate.co.uk/products/mothers-day-mystery-hamper'}
{'name': 'Mystery Box',
 'url': 'https://www.chocolate.co.uk/products/mothers-day-mystery-hamper'}
2023-06-04 21:15:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.chocolate.co.uk/collections/all?page=3> (referer: https://www.chocolate.co.uk/collections/all?page=2)
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Oat M!lk Hot Chocolate Flakes',
 'url': 'https://www.chocolate.co.uk/products/oat-m-lk-hot-chocolate-flakes'}
{'name': 'Oat M!lk Hot Chocolate Flakes',
 'url': 'https://www.chocolate.co.uk/products/oat-m-lk-hot-chocolate-flakes'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Orange Dark Hot Chocolate Flakes',
 'url': 'https://www.chocolate.co.uk/products/orange-dark-hot-chocolate-flakes'}
{'name': 'Orange Dark Hot Chocolate Flakes',
 'url': 'https://www.chocolate.co.uk/products/orange-dark-hot-chocolate-flakes'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Orange Malt Balls',
 'url': 'https://www.chocolate.co.uk/products/orange-malt-balls'}
{'name': 'Orange Malt Balls',
 'url': 'https://www.chocolate.co.uk/products/orange-malt-balls'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Peppermint Malt Balls',
 'url': 'https://www.chocolate.co.uk/products/peppermint-malt-balls'}
{'name': 'Peppermint Malt Balls',
 'url': 'https://www.chocolate.co.uk/products/peppermint-malt-balls'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Peppermint Swirl',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/peppermint-swirl-chocolate-bar'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Pistachio & Cranberry',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/pistachio-cranberry-chocolate-bar'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Praline Quail Eggs',
 'url': 'https://www.chocolate.co.uk/products/praline-quail-eggs'}
{'name': 'Praline Quail Eggs',
 'url': 'https://www.chocolate.co.uk/products/praline-quail-eggs'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Pretzel Caramel',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/pretzel-caramel-chocolate-bar'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Roasted Almonds',
 'url': 'https://www.chocolate.co.uk/products/cocoa-dusted-almonds'}
{'name': 'Roasted Almonds',
 'url': 'https://www.chocolate.co.uk/products/cocoa-dusted-almonds'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Salt Caramel Malt Balls',
 'price': 11.700000000000001,
 'url': 'https://www.chocolate.co.uk/products/salt-caramel-malt-balls'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Salt Caramelised Hazelnuts',
 'url': 'https://www.chocolate.co.uk/products/salt-caramelised-hazelnuts'}
{'name': 'Salt Caramelised Hazelnuts',
 'url': 'https://www.chocolate.co.uk/products/salt-caramelised-hazelnuts'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Salted Caramel',
 'price': 6.5,
 'url': 'https://www.chocolate.co.uk/products/blonde-chocolate-sea-salt'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Salted Caramel Florentines',
 'price': 14.950000000000001,
 'url': 'https://www.chocolate.co.uk/products/salted-caramel-florentines'}
2023-06-04 21:15:53 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Salted Caramel Quail Eggs',
 'url': 'https://www.chocolate.co.uk/products/salted-caramel-quail-eggs'}
{'name': 'Salted Caramel Quail Eggs',
 'url': 'https://www.chocolate.co.uk/products/salted-caramel-quail-eggs'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Salted Caramels Blonde Chocolate',
 'price': 25.935,
 'url': 'https://www.chocolate.co.uk/products/salted-caramels-blonde-chocolate'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Salted Caramels Dark Chocolate',
 'price': 25.935,
 'url': 'https://www.chocolate.co.uk/products/salt-caramels-dark-chocolate'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Salted Caramels Milk Chocolate',
 'price': 25.935,
 'url': 'https://www.chocolate.co.uk/products/salt-caramels-milk-chocolate'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Valrhona Araguani - 72%',
 'price': 100.75,
 'url': 'https://www.chocolate.co.uk/products/valrhona-araguani-72'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Valrhona Bahibe 46%',
 'price': 97.5,
 'url': 'https://www.chocolate.co.uk/products/valrhona-bahibe'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Valrhona Biskelia 34%',
 'price': 97.5,
 'url': 'https://www.chocolate.co.uk/products/valrhona-biskelia-34'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Valrhona Caraibe 66%',
 'price': 97.5,
 'url': 'https://www.chocolate.co.uk/products/valrhona-caraibe-66'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Valrhona Caramelia 36%',
 'price': 97.5,
 'url': 'https://www.chocolate.co.uk/products/valrhona-caramelia-36'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Valrhona Dulcey 32%',
 'price': 100.75,
 'url': 'https://www.chocolate.co.uk/products/valrhona-dulcey-32'}
2023-06-04 21:15:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=3>
{'name': 'Valrhona Guanaja 70%',
 'price': 97.5,
 'url': 'https://www.chocolate.co.uk/products/valrhona-guanaja-70'}
2023-06-04 21:15:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.chocolate.co.uk/collections/all?page=4> (referer: https://www.chocolate.co.uk/collections/all?page=3)
2023-06-04 21:15:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=4>
{'name': 'Valrhona Jivara 40%',
 'price': 97.5,
 'url': 'https://www.chocolate.co.uk/products/valrhona-jivara-40'}
2023-06-04 21:15:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=4>
{'name': 'Valrhona Manjari 64%',
 'price': 97.5,
 'url': 'https://www.chocolate.co.uk/products/valrhona-manjari-64'}
2023-06-04 21:15:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.chocolate.co.uk/collections/all?page=4>
{'name': 'Valrhona Opalys 33%',
 'price': 97.5,
 'url': 'https://www.chocolate.co.uk/products/valrhona-opalys-33'}
2023-06-04 21:15:54 [scrapy.core.scraper] WARNING: Dropped: Missing price in {'name': 'Velvet White',
 'url': 'https://www.chocolate.co.uk/products/velvet-white-chocolate-bar'}
{'name': 'Velvet White',
 'url': 'https://www.chocolate.co.uk/products/velvet-white-chocolate-bar'}
2023-06-04 21:15:54 [scrapy.core.engine] INFO: Closing spider (finished)
2023-06-04 21:15:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 5080,
 'downloader/request_count': 8,
 'downloader/request_method_count/GET': 8,
 'downloader/response_bytes': 176276,
 'downloader/response_count': 8,
 'downloader/response_status_count/200': 6,
 'downloader/response_status_count/301': 2,
 'elapsed_time_seconds': 5.117398,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 6, 4, 12, 15, 54, 291660),
 'httpcompression/response_bytes': 817301,
 'httpcompression/response_count': 6,
 'item_dropped_count': 23,
 'item_dropped_reasons_count/DropItem': 23,
 'item_scraped_count': 53,
 'log_count/DEBUG': 64,
 'log_count/INFO': 10,
 'log_count/WARNING': 23,
 'memusage/max': 57262080,
 'memusage/startup': 57262080,
 'request_depth_max': 3,
 'response_received_count': 6,
 'robotstxt/request_count': 2,
 'robotstxt/response_count': 2,
 'robotstxt/response_status_count/200': 2,
 'scheduler/dequeued': 5,
 'scheduler/dequeued/memory': 5,
 'scheduler/enqueued': 5,
 'scheduler/enqueued/memory': 5,
 'start_time': datetime.datetime(2023, 6, 4, 12, 15, 49, 174262)}
2023-06-04 21:15:54 [scrapy.core.engine] INFO: Spider closed (finished)
```