# [Intro to Web Scraping With Scrapy](https://scrapeops.io/python-scrapy-playbook/scrapy-web-scraping-intro/)

# Scrapy を使った Web スクレイピングの紹介

## Python Scrapyとは何か？

## Scrapyとほかのスクレイピングライブラリやフレームワークとの違いについて

- GET、POSTなどのリクエストの作成
- CSSとXPathセレクタを使ったページからのデータ抽出
- 失敗したリクエストを検出し、自動的に再試行する
- 組み込みの同時実行機能によるリクエストの並列化
- ページネーション、サイトマップ、リンクフォローによるWebサイト全体のクローリング
- パイプラインによるスクレイピングデータのクリーニング、検証、後処理
- CSV/JSONファイル、データベース、オブジェクトストレージへのデータ保存

その他にも様々な機能があります。

## Scrapy をいつ使用する必要がありますか

- 1 大規模なウェブスクレイピング
- 2 強力なウェブスクレイピングフレームワークを学びたい
- 3 Webスクレイピングを始めたばかりで、小さなプロジェクトを持っている
- 4 Javascriptが重いWebサイトのスクレイピング

## Scrapyの概要

### 1 The Scrapy Project

```bash
scrapy startproject <project_name>
```

```bash
(venv) myproject (😁 :main *) :$ tree

.
├── myproject
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       └── __init__.py
└── scrapy.cfg
```

### 2 Scrapy Spiders

```bash
(venv) myproject (😁 :main *) :$ scrapy crawl quotes

2023-05-29 00:12:47 [scrapy.utils.log] INFO: Scrapy 2.9.0 started (bot: myproject)
2023-05-29 00:12:47 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.1, Twisted 22.10.0, Python 3.10.11 (main, Apr 24 2023, 17:34:58) [Clang 14.0.3 (clang-1403.0.22.14.1)], pyOpenSSL 23.1.1 (OpenSSL 3.1.0 14 Mar 2023), cryptography 40.0.2, Platform macOS-13.3.1-x86_64-i386-64bit
2023-05-29 00:12:47 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'myproject',
 'FEED_EXPORT_ENCODING': 'utf-8',
 'NEWSPIDER_MODULE': 'myproject.spiders',
 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['myproject.spiders'],
 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2023-05-29 00:12:47 [asyncio] DEBUG: Using selector: KqueueSelector
2023-05-29 00:12:47 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2023-05-29 00:12:47 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop
2023-05-29 00:12:47 [scrapy.extensions.telnet] INFO: Telnet Password: f4aa98445ab9ce8b
2023-05-29 00:12:47 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2023-05-29 00:12:48 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2023-05-29 00:12:48 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-05-29 00:12:48 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2023-05-29 00:12:48 [scrapy.core.engine] INFO: Spider opened
2023-05-29 00:12:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-05-29 00:12:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-05-29 00:12:49 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://quotes.toscrape.com/robots.txt> (referer: None)
2023-05-29 00:12:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/> (referer: None)
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'author': 'Albert Einstein', 'tags': ['change', 'deep-thoughts', 'thinking', 'world']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'author': 'J.K. Rowling', 'tags': ['abilities', 'choices']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'author': 'Albert Einstein', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'author': 'Jane Austen', 'tags': ['aliteracy', 'books', 'classic', 'humor']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'author': 'Marilyn Monroe', 'tags': ['be-yourself', 'inspirational']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'author': 'Albert Einstein', 'tags': ['adulthood', 'success', 'value']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“It is better to be hated for what you are than to be loved for what you are not.”', 'author': 'André Gide', 'tags': ['life', 'love']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': "“I have not failed. I've just found 10,000 ways that won't work.”", 'author': 'Thomas A. Edison', 'tags': ['edison', 'failure', 'inspirational', 'paraphrased']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”", 'author': 'Eleanor Roosevelt', 'tags': ['misattributed-eleanor-roosevelt']}
2023-05-29 00:12:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“A day without sunshine is like, you know, night.”', 'author': 'Steve Martin', 'tags': ['humor', 'obvious', 'simile']}
2023-05-29 00:12:50 [scrapy.core.engine] INFO: Closing spider (finished)
2023-05-29 00:12:50 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 448,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 11660,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/404': 1,
 'elapsed_time_seconds': 1.769356,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 5, 28, 15, 12, 50, 108697),
 'item_scraped_count': 10,
 'log_count/DEBUG': 15,
 'log_count/INFO': 10,
 'memusage/max': 56995840,
 'memusage/startup': 56995840,
 'response_received_count': 2,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/404': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2023, 5, 28, 15, 12, 48, 339341)}
2023-05-29 00:12:50 [scrapy.core.engine] INFO: Spider closed (finished)
```

- Asynchronous
  - ScrapyはTwistedフレームワークを使用して構築されているため、Webサイトにリクエストを送信すると、それはブロックされていない。
  - Scrapyはウェブサイトにリクエストを送信し、成功した応答を取得すると、元のScrapy Requestで定義されたコールバックを使用して、`parse`メソッドをトリガーします。
  - `yield scrapy.Request(url, callback=self.parse)`
- Spider Name
  - Scrapyプロジェクト内のすべてのスパイダーは、Scrapyがそれを識別できるように、一意の名前を持っている必要があります。
  - これを設定するには、`name = 'quotes'` 属性を使用します。
- Start Requests
  - `start_requests()`メソッドを使用して、スパイダーの開始点を定義します。
  - これらの初期リクエストから、後続のリクエストを順次生成することができます。
- Parse
  - `parse()`メソッドを使ってWebサイトからのレスポンスを処理し、必要なデータを抽出します。
  - 抽出後、このデータは `yield` コマンドを使用して Item Pipelines に送信されます。


```bash
(venv) myproject (😁 :main *) :$ scrapy crawl quotes_2

2023-05-29 00:32:54 [scrapy.utils.log] INFO: Scrapy 2.9.0 started (bot: myproject)
2023-05-29 00:32:54 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.1, Twisted 22.10.0, Python 3.10.11 (main, Apr 24 2023, 17:34:58) [Clang 14.0.3 (clang-1403.0.22.14.1)], pyOpenSSL 23.1.1 (OpenSSL 3.1.0 14 Mar 2023), cryptography 40.0.2, Platform macOS-13.3.1-x86_64-i386-64bit
2023-05-29 00:32:54 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'myproject',
 'FEED_EXPORT_ENCODING': 'utf-8',
 'NEWSPIDER_MODULE': 'myproject.spiders',
 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['myproject.spiders'],
 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2023-05-29 00:32:54 [asyncio] DEBUG: Using selector: KqueueSelector
2023-05-29 00:32:54 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2023-05-29 00:32:54 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop
2023-05-29 00:32:55 [scrapy.extensions.telnet] INFO: Telnet Password: ee10bb6e5f88581c
2023-05-29 00:32:55 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2023-05-29 00:32:55 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2023-05-29 00:32:55 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-05-29 00:32:55 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2023-05-29 00:32:55 [scrapy.core.engine] INFO: Spider opened
2023-05-29 00:32:55 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-05-29 00:32:55 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-05-29 00:32:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://quotes.toscrape.com/robots.txt> (referer: None)
2023-05-29 00:32:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/> (referer: None)
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'author': 'Albert Einstein', 'tags': ['change', 'deep-thoughts', 'thinking', 'world']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'author': 'J.K. Rowling', 'tags': ['abilities', 'choices']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'author': 'Albert Einstein', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'author': 'Jane Austen', 'tags': ['aliteracy', 'books', 'classic', 'humor']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'author': 'Marilyn Monroe', 'tags': ['be-yourself', 'inspirational']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'author': 'Albert Einstein', 'tags': ['adulthood', 'success', 'value']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“It is better to be hated for what you are than to be loved for what you are not.”', 'author': 'André Gide', 'tags': ['life', 'love']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': "“I have not failed. I've just found 10,000 ways that won't work.”", 'author': 'Thomas A. Edison', 'tags': ['edison', 'failure', 'inspirational', 'paraphrased']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”", 'author': 'Eleanor Roosevelt', 'tags': ['misattributed-eleanor-roosevelt']}
2023-05-29 00:32:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'text': '“A day without sunshine is like, you know, night.”', 'author': 'Steve Martin', 'tags': ['humor', 'obvious', 'simile']}
2023-05-29 00:32:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/2/> (referer: https://quotes.toscrape.com/)
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': "“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.”", 'author': 'Marilyn Monroe', 'tags': ['friends', 'heartbreak', 'inspirational', 'life', 'love', 'sisters']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': '“It takes a great deal of bravery to stand up to our enemies, but just as much to stand up to our friends.”', 'author': 'J.K. Rowling', 'tags': ['courage', 'friends']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': "“If you can't explain it to a six year old, you don't understand it yourself.”", 'author': 'Albert Einstein', 'tags': ['simplicity', 'understand']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': "“You may not be her first, her last, or her only. She loved before she may love again. But if she loves you now, what else matters? She's not perfect—you aren't either, and the two of you may never be perfect together but if she can make you laugh, cause you to think twice, and admit to being human and making mistakes, hold onto her and give her the most you can. She may not be thinking about you every second of the day, but she will give you a part of her that she knows you can break—her heart. So don't hurt her, don't change her, don't analyze and don't expect more than she can give. Smile when she makes you happy, let her know when she makes you mad, and miss her when she's not there.”", 'author': 'Bob Marley', 'tags': ['love']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': '“I like nonsense, it wakes up the brain cells. Fantasy is a necessary ingredient in living.”', 'author': 'Dr. Seuss', 'tags': ['fantasy']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': '“I may not have gone where I intended to go, but I think I have ended up where I needed to be.”', 'author': 'Douglas Adams', 'tags': ['life', 'navigation']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': "“The opposite of love is not hate, it's indifference. The opposite of art is not ugliness, it's indifference. The opposite of faith is not heresy, it's indifference. And the opposite of life is not death, it's indifference.”", 'author': 'Elie Wiesel', 'tags': ['activism', 'apathy', 'hate', 'indifference', 'inspirational', 'love', 'opposite', 'philosophy']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': '“It is not a lack of love, but a lack of friendship that makes unhappy marriages.”', 'author': 'Friedrich Nietzsche', 'tags': ['friendship', 'lack-of-friendship', 'lack-of-love', 'love', 'marriage', 'unhappy-marriage']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': '“Good friends, good books, and a sleepy conscience: this is the ideal life.”', 'author': 'Mark Twain', 'tags': ['books', 'contentment', 'friends', 'friendship', 'life']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/2/>
{'text': '“Life is what happens to us while we are making other plans.”', 'author': 'Allen Saunders', 'tags': ['fate', 'life', 'misattributed-john-lennon', 'planning', 'plans']}
2023-05-29 00:32:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/3/> (referer: https://quotes.toscrape.com/page/2/)
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“I love you without knowing how, or when, or from where. I love you simply, without problems or pride: I love you in this way because I do not know any other way of loving but this, in which there is no I or you, so intimate that your hand upon my chest is my hand, so intimate that when I fall asleep your eyes close.”', 'author': 'Pablo Neruda', 'tags': ['love', 'poetry']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“For every minute you are angry you lose sixty seconds of happiness.”', 'author': 'Ralph Waldo Emerson', 'tags': ['happiness']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“If you judge people, you have no time to love them.”', 'author': 'Mother Teresa', 'tags': ['attributed-no-source']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“Anyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.”', 'author': 'Garrison Keillor', 'tags': ['humor', 'religion']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“Beauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.”', 'author': 'Jim Henson', 'tags': ['humor']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“Today you are You, that is truer than true. There is no one alive who is Youer than You.”', 'author': 'Dr. Seuss', 'tags': ['comedy', 'life', 'yourself']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“If you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales.”', 'author': 'Albert Einstein', 'tags': ['children', 'fairy-tales']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“It is impossible to live without failing at something, unless you live so cautiously that you might as well not have lived at all - in which case, you fail by default.”', 'author': 'J.K. Rowling', 'tags': []}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“Logic will get you from A to Z; imagination will get you everywhere.”', 'author': 'Albert Einstein', 'tags': ['imagination']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/3/>
{'text': '“One good thing about music, when it hits you, you feel no pain.”', 'author': 'Bob Marley', 'tags': ['music']}
2023-05-29 00:32:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/4/> (referer: https://quotes.toscrape.com/page/3/)
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': "“The more that you read, the more things you will know. The more that you learn, the more places you'll go.”", 'author': 'Dr. Seuss', 'tags': ['learning', 'reading', 'seuss']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': '“Of course it is happening inside your head, Harry, but why on earth should that mean that it is not real?”', 'author': 'J.K. Rowling', 'tags': ['dumbledore']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': '“The truth is, everyone is going to hurt you. You just got to find the ones worth suffering for.”', 'author': 'Bob Marley', 'tags': ['friendship']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': '“Not all of us can do great things. But we can do small things with great love.”', 'author': 'Mother Teresa', 'tags': ['misattributed-to-mother-teresa', 'paraphrased']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': '“To the well-organized mind, death is but the next great adventure.”', 'author': 'J.K. Rowling', 'tags': ['death', 'inspirational']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': "“All you need is love. But a little chocolate now and then doesn't hurt.”", 'author': 'Charles M. Schulz', 'tags': ['chocolate', 'food', 'humor']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': "“We read to know we're not alone.”", 'author': 'William Nicholson', 'tags': ['misattributed-to-c-s-lewis', 'reading']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': '“Any fool can know. The point is to understand.”', 'author': 'Albert Einstein', 'tags': ['knowledge', 'learning', 'understanding', 'wisdom']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': '“I have always imagined that Paradise will be a kind of library.”', 'author': 'Jorge Luis Borges', 'tags': ['books', 'library']}
2023-05-29 00:32:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/4/>
{'text': '“It is never too late to be what you might have been.”', 'author': 'George Eliot', 'tags': ['inspirational']}
2023-05-29 00:32:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/5/> (referer: https://quotes.toscrape.com/page/4/)
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“A reader lives a thousand lives before he dies, said Jojen. The man who never reads lives only one.”', 'author': 'George R.R. Martin', 'tags': ['read', 'readers', 'reading', 'reading-books']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“You can never get a cup of tea large enough or a book long enough to suit me.”', 'author': 'C.S. Lewis', 'tags': ['books', 'inspirational', 'reading', 'tea']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“You believe lies so you eventually learn to trust no one but yourself.”', 'author': 'Marilyn Monroe', 'tags': []}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“If you can make a woman laugh, you can make her do anything.”', 'author': 'Marilyn Monroe', 'tags': ['girls', 'love']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“Life is like riding a bicycle. To keep your balance, you must keep moving.”', 'author': 'Albert Einstein', 'tags': ['life', 'simile']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“The real lover is the man who can thrill you by kissing your forehead or smiling into your eyes or just staring into space.”', 'author': 'Marilyn Monroe', 'tags': ['love']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': "“A wise girl kisses but doesn't love, listens but doesn't believe, and leaves before she is left.”", 'author': 'Marilyn Monroe', 'tags': ['attributed-no-source']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“Only in the darkness can you see the stars.”', 'author': 'Martin Luther King Jr.', 'tags': ['hope', 'inspirational']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“It matters not what someone is born, but what they grow to be.”', 'author': 'J.K. Rowling', 'tags': ['dumbledore']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/5/>
{'text': '“Love does not begin and end the way we seem to think it does. Love is a battle, love is a war; love is a growing up.”', 'author': 'James Baldwin', 'tags': ['love']}
2023-05-29 00:32:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/6/> (referer: https://quotes.toscrape.com/page/5/)
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': '“There is nothing I would not do for those who are really my friends. I have no notion of loving people by halves, it is not my nature.”', 'author': 'Jane Austen', 'tags': ['friendship', 'love']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': '“Do one thing every day that scares you.”', 'author': 'Eleanor Roosevelt', 'tags': ['attributed', 'fear', 'inspiration']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': '“I am good, but not an angel. I do sin, but I am not the devil. I am just a small girl in a big world trying to find someone to love.”', 'author': 'Marilyn Monroe', 'tags': ['attributed-no-source']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': '“If I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.”', 'author': 'Albert Einstein', 'tags': ['music']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': '“If you only read the books that everyone else is reading, you can only think what everyone else is thinking.”', 'author': 'Haruki Murakami', 'tags': ['books', 'thought']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': '“The difference between genius and stupidity is: genius has its limits.”', 'author': 'Alexandre Dumas fils', 'tags': ['misattributed-to-einstein']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': "“He's like a drug for you, Bella.”", 'author': 'Stephenie Meyer', 'tags': ['drug', 'romance', 'simile']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': '“There is no friend as loyal as a book.”', 'author': 'Ernest Hemingway', 'tags': ['books', 'friends', 'novelist-quotes']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': '“When one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has been opened for us.”', 'author': 'Helen Keller', 'tags': ['inspirational']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/6/>
{'text': "“Life isn't about finding yourself. Life is about creating yourself.”", 'author': 'George Bernard Shaw', 'tags': ['inspirational', 'life', 'yourself']}
2023-05-29 00:32:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/7/> (referer: https://quotes.toscrape.com/page/6/)
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': "“That's the problem with drinking, I thought, as I poured myself a drink. If something bad happens you drink in an attempt to forget; if something good happens you drink in order to celebrate; and if nothing happens you drink to make something happen.”", 'author': 'Charles Bukowski', 'tags': ['alcohol']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': '“You don’t forget the face of the person who was your last hope.”', 'author': 'Suzanne Collins', 'tags': ['the-hunger-games']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': "“Remember, we're madly in love, so it's all right to kiss me anytime you feel like it.”", 'author': 'Suzanne Collins', 'tags': ['humor']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': '“To love at all is to be vulnerable. Love anything and your heart will be wrung and possibly broken. If you want to make sure of keeping it intact you must give it to no one, not even an animal. Wrap it carefully round with hobbies and little luxuries; avoid all entanglements. Lock it up safe in the casket or coffin of your selfishness. But in that casket, safe, dark, motionless, airless, it will change. It will not be broken; it will become unbreakable, impenetrable, irredeemable. To love is to be vulnerable.”', 'author': 'C.S. Lewis', 'tags': ['love']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': '“Not all those who wander are lost.”', 'author': 'J.R.R. Tolkien', 'tags': ['bilbo', 'journey', 'lost', 'quest', 'travel', 'wander']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': '“Do not pity the dead, Harry. Pity the living, and, above all those who live without love.”', 'author': 'J.K. Rowling', 'tags': ['live-death-love']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': '“There is nothing to writing. All you do is sit down at a typewriter and bleed.”', 'author': 'Ernest Hemingway', 'tags': ['good', 'writing']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': '“Finish each day and be done with it. You have done what you could. Some blunders and absurdities no doubt crept in; forget them as soon as you can. Tomorrow is a new day. You shall begin it serenely and with too high a spirit to be encumbered with your old nonsense.”', 'author': 'Ralph Waldo Emerson', 'tags': ['life', 'regrets']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': '“I have never let my schooling interfere with my education.”', 'author': 'Mark Twain', 'tags': ['education']}
2023-05-29 00:32:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/7/>
{'text': "“I have heard there are troubles of more than one kind. Some come from ahead and some come from behind. But I've bought a big bat. I'm all ready you see. Now my troubles are going to have troubles with me!”", 'author': 'Dr. Seuss', 'tags': ['troubles']}
2023-05-29 00:32:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/8/> (referer: https://quotes.toscrape.com/page/7/)
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': '“If I had a flower for every time I thought of you...I could walk through my garden forever.”', 'author': 'Alfred Tennyson', 'tags': ['friendship', 'love']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': '“Some people never go crazy. What truly horrible lives they must lead.”', 'author': 'Charles Bukowski', 'tags': ['humor']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': '“The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.”', 'author': 'Terry Pratchett', 'tags': ['humor', 'open-mind', 'thinking']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': '“Think left and think right and think low and think high. Oh, the thinks you can think up if only you try!”', 'author': 'Dr. Seuss', 'tags': ['humor', 'philosophy']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': "“What really knocks me out is a book that, when you're all done reading it, you wish the author that wrote it was a terrific friend of yours and you could call him up on the phone whenever you felt like it. That doesn't happen much, though.”", 'author': 'J.D. Salinger', 'tags': ['authors', 'books', 'literature', 'reading', 'writing']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': '“The reason I talk to myself is because I’m the only one whose answers I accept.”', 'author': 'George Carlin', 'tags': ['humor', 'insanity', 'lies', 'lying', 'self-indulgence', 'truth']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': "“You may say I'm a dreamer, but I'm not the only one. I hope someday you'll join us. And the world will live as one.”", 'author': 'John Lennon', 'tags': ['beatles', 'connection', 'dreamers', 'dreaming', 'dreams', 'hope', 'inspirational', 'peace']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': '“I am free of all prejudice. I hate everyone equally. ”', 'author': 'W.C. Fields', 'tags': ['humor', 'sinister']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': "“The question isn't who is going to let me; it's who is going to stop me.”", 'author': 'Ayn Rand', 'tags': []}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/8/>
{'text': "“′Classic′ - a book which people praise and don't read.”", 'author': 'Mark Twain', 'tags': ['books', 'classic', 'reading']}
2023-05-29 00:32:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/9/> (referer: https://quotes.toscrape.com/page/8/)
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“Anyone who has never made a mistake has never tried anything new.”', 'author': 'Albert Einstein', 'tags': ['mistakes']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': "“A lady's imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.”", 'author': 'Jane Austen', 'tags': ['humor', 'love', 'romantic', 'women']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“Remember, if the time should come when you have to make a choice between what is right and what is easy, remember what happened to a boy who was good, and kind, and brave, because he strayed across the path of Lord Voldemort. Remember Cedric Diggory.”', 'author': 'J.K. Rowling', 'tags': ['integrity']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“I declare after all there is no enjoyment like reading! How much sooner one tires of any thing than of a book! -- When I have a house of my own, I shall be miserable if I have not an excellent library.”', 'author': 'Jane Austen', 'tags': ['books', 'library', 'reading']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“There are few people whom I really love, and still fewer of whom I think well. The more I see of the world, the more am I dissatisfied with it; and every day confirms my belief of the inconsistency of all human characters, and of the little dependence that can be placed on the appearance of merit or sense.”', 'author': 'Jane Austen', 'tags': ['elizabeth-bennet', 'jane-austen']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“Some day you will be old enough to start reading fairy tales again.”', 'author': 'C.S. Lewis', 'tags': ['age', 'fairytales', 'growing-up']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“We are not necessarily doubting that God will do the best for us; we are wondering how painful the best will turn out to be.”', 'author': 'C.S. Lewis', 'tags': ['god']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“The fear of death follows from the fear of life. A man who lives fully is prepared to die at any time.”', 'author': 'Mark Twain', 'tags': ['death', 'life']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“A lie can travel half way around the world while the truth is putting on its shoes.”', 'author': 'Mark Twain', 'tags': ['misattributed-mark-twain', 'truth']}
2023-05-29 00:32:59 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/9/>
{'text': '“I believe in Christianity as I believe that the sun has risen: not only because I see it, but because by it I see everything else.”', 'author': 'C.S. Lewis', 'tags': ['christianity', 'faith', 'religion', 'sun']}
2023-05-29 00:33:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/10/> (referer: https://quotes.toscrape.com/page/9/)
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': '“The truth." Dumbledore sighed. "It is a beautiful and terrible thing, and should therefore be treated with great caution.”', 'author': 'J.K. Rowling', 'tags': ['truth']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': "“I'm the one that's got to die when it's time for me to die, so let me live my life the way I want to.”", 'author': 'Jimi Hendrix', 'tags': ['death', 'life']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': '“To die will be an awfully big adventure.”', 'author': 'J.M. Barrie', 'tags': ['adventure', 'love']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': '“It takes courage to grow up and become who you really are.”', 'author': 'E.E. Cummings', 'tags': ['courage']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': '“But better to get hurt by the truth than comforted with a lie.”', 'author': 'Khaled Hosseini', 'tags': ['life']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': '“You never really understand a person until you consider things from his point of view... Until you climb inside of his skin and walk around in it.”', 'author': 'Harper Lee', 'tags': ['better-life-empathy']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': '“You have to write the book that wants to be written. And if the book will be too difficult for grown-ups, then you write it for children.”', 'author': "Madeleine L'Engle", 'tags': ['books', 'children', 'difficult', 'grown-ups', 'write', 'writers', 'writing']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': '“Never tell the truth to people who are not worthy of it.”', 'author': 'Mark Twain', 'tags': ['truth']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': "“A person's a person, no matter how small.”", 'author': 'Dr. Seuss', 'tags': ['inspirational']}
2023-05-29 00:33:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/10/>
{'text': '“... a mind needs books as a sword needs a whetstone, if it is to keep its edge.”', 'author': 'George R.R. Martin', 'tags': ['books', 'mind']}
2023-05-29 00:33:00 [scrapy.core.engine] INFO: Closing spider (finished)
2023-05-29 00:33:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 2890,
 'downloader/request_count': 11,
 'downloader/request_method_count/GET': 11,
 'downloader/response_bytes': 110832,
 'downloader/response_count': 11,
 'downloader/response_status_count/200': 10,
 'downloader/response_status_count/404': 1,
 'elapsed_time_seconds': 5.363247,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 5, 28, 15, 33, 0, 654456),
 'item_scraped_count': 100,
 'log_count/DEBUG': 114,
 'log_count/INFO': 10,
 'memusage/max': 57061376,
 'memusage/startup': 57057280,
 'request_depth_max': 9,
 'response_received_count': 11,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/404': 1,
 'scheduler/dequeued': 10,
 'scheduler/dequeued/memory': 10,
 'scheduler/enqueued': 10,
 'scheduler/enqueued/memory': 10,
 'start_time': datetime.datetime(2023, 5, 28, 15, 32, 55, 291209)}
2023-05-29 00:33:00 [scrapy.core.engine] INFO: Spider closed (finished)
```

### 3 Scrapy Items

Scrapy Itemは、私たちがスクレイピングしたデータを保存して処理する方法です。<br>
Scrapy ItemLoaders、Item Pipelines、Feed Exporterで簡単にクリーニング、検証、保存できるように、スクレイピングしたデータの構造化コンテナを提供するものです。


- データを構造化し、明確なスキーマを提供します。
- スクレイピングされたデータを簡単にクリーニングし、処理することができるようになります。
- データフィードの検証、重複排除、監視を可能にします。
- Scrapy Feed Exportsを使用して、データを簡単に保存およびエクスポートできるようになります。
- Scrapy Item PipelinesとItem Loadersを使用するようにします。

```bash
(venv) myproject (😁 :main *) :$ scrapy crawl quotes_3

2023-05-29 01:00:22 [scrapy.utils.log] INFO: Scrapy 2.9.0 started (bot: myproject)
2023-05-29 01:00:22 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.1, Twisted 22.10.0, Python 3.10.11 (main, Apr 24 2023, 17:34:58) [Clang 14.0.3 (clang-1403.0.22.14.1)], pyOpenSSL 23.1.1 (OpenSSL 3.1.0 14 Mar 2023), cryptography 40.0.2, Platform macOS-13.3.1-x86_64-i386-64bit
2023-05-29 01:00:22 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'myproject',
 'FEED_EXPORT_ENCODING': 'utf-8',
 'NEWSPIDER_MODULE': 'myproject.spiders',
 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['myproject.spiders'],
 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2023-05-29 01:00:22 [asyncio] DEBUG: Using selector: KqueueSelector
2023-05-29 01:00:22 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2023-05-29 01:00:22 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop
2023-05-29 01:00:22 [scrapy.extensions.telnet] INFO: Telnet Password: 8e51090101f34e18
2023-05-29 01:00:22 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2023-05-29 01:00:23 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2023-05-29 01:00:23 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-05-29 01:00:23 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2023-05-29 01:00:23 [scrapy.core.engine] INFO: Spider opened
2023-05-29 01:00:23 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-05-29 01:00:23 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-05-29 01:00:23 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://quotes.toscrape.com/robots.txt> (referer: None)
2023-05-29 01:00:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/> (referer: None)
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'Albert Einstein',
 'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
 'text': '“The world as we have created it is a process of our thinking. It '
         'cannot be changed without changing our thinking.”'}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'J.K. Rowling',
 'tags': ['abilities', 'choices'],
 'text': '“It is our choices, Harry, that show what we truly are, far more '
         'than our abilities.”'}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'Albert Einstein',
 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
 'text': '“There are only two ways to live your life. One is as though nothing '
         'is a miracle. The other is as though everything is a miracle.”'}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'Jane Austen',
 'tags': ['aliteracy', 'books', 'classic', 'humor'],
 'text': '“The person, be it gentleman or lady, who has not pleasure in a good '
         'novel, must be intolerably stupid.”'}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'Marilyn Monroe',
 'tags': ['be-yourself', 'inspirational'],
 'text': "“Imperfection is beauty, madness is genius and it's better to be "
         'absolutely ridiculous than absolutely boring.”'}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'Albert Einstein',
 'tags': ['adulthood', 'success', 'value'],
 'text': '“Try not to become a man of success. Rather become a man of value.”'}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'André Gide',
 'tags': ['life', 'love'],
 'text': '“It is better to be hated for what you are than to be loved for what '
         'you are not.”'}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'Thomas A. Edison',
 'tags': ['edison', 'failure', 'inspirational', 'paraphrased'],
 'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'Eleanor Roosevelt',
 'tags': ['misattributed-eleanor-roosevelt'],
 'text': '“A woman is like a tea bag; you never know how strong it is until '
         "it's in hot water.”"}
2023-05-29 01:00:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/>
{'author': 'Steve Martin',
 'tags': ['humor', 'obvious', 'simile'],
 'text': '“A day without sunshine is like, you know, night.”'}
2023-05-29 01:00:24 [scrapy.core.engine] INFO: Closing spider (finished)
2023-05-29 01:00:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 448,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 11660,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/404': 1,
 'elapsed_time_seconds': 0.994178,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 5, 28, 16, 0, 24, 197070),
 'item_scraped_count': 10,
 'log_count/DEBUG': 15,
 'log_count/INFO': 10,
 'memusage/max': 56983552,
 'memusage/startup': 56979456,
 'response_received_count': 2,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/404': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2023, 5, 28, 16, 0, 23, 202892)}
2023-05-29 01:00:24 [scrapy.core.engine] INFO: Spider closed (finished)
```

### 4 Scrapy Item Pipelines

[Item Pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html)はScrapyのデータ処理装置で、スクレイピングされたすべてのアイテムが通過し、そこからデータのクリーニング、処理、検証、保存を行うことができます。


- データのクリーニング (例: 価格から通貨記号を削除)。
- データをフォーマットする（例：文字列をintに変換する）。
- データを豊かにする (例: 相対リンクを絶対リンクに変換)。
- データを評価する（例：スクレイピングされた価格が実行可能な価格であることを確認する）。
- データをデータベース、キュー、ファイル、オブジェクトストレージのバケットに格納する。

```python
# pipelines.py
# Postgres データベースにスクレイピングされたアイテムを保存するアイテム パイプライン

import psycopg2


class PostgresDemoPipeline:


    def __init__(self):
        ## Connection Details
        ## 接続の詳細
        hostname = 'localhost'
        username = 'postgres'
        password = '******' # your password
        database = 'quotes'

        ## Create/Connect to database
        ## データベースの作成・接続
        self.connection = psycopg2.connect(
            host=hostname, user=username, password=password, dbname=database
        )
        
        ## Create cursor, used to execute commands
        ## コマンドの実行に使用するカーソルの作成
        self.cur = self.connection.cursor()
        
        ## Create quotes table if none exists
        ## 引用テーブルが存在しない場合は、引用テーブルを作成する
        self.cur.execute("""
            CREATE TABLE IF NOT EXISTS quotes(
                id serial PRIMARY KEY,
                content text,
                tags text,
                author VARCHAR(255)
            )
        """)


    def process_item(self, item, spider):
        ## Define insert statement
        ## 挿入ステートメントを定義する
        self.cur.execute(
            """ insert into quotes (content, tags, author) values (%s,%s,%s)""",
            (
                item["text"],
                str(item["tags"]),
                item["author"]
            )
        )

        ## Execute insert of data into database
        ## データベースへのデータの挿入を実行する
        self.connection.commit()
        return item


    def close_spider(self, spider):
        ## Close cursor & connection to database
        ## カーソルとデータベースへの接続を閉じる
        self.cur.close()
        self.connection.close()
```

### 5 Scrapy Middlewares

Scrapyは完全なウェブスクレイピングフレームワークで、何も設定することなく、裏でスケールの大きいスクレイピングの複雑さの多くを管理してくれます。

この機能のほとんどは、[Downloader Middlewares](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html)と[Spider Middlewares](https://docs.scrapy.org/en/latest/topics/spider-middleware.html)という形で**Middlewares**の中に含まれています。

#### Downloader Middlewares

- Downloader Middlewaresは、
  - ScrapyエンジンとDownloaderの間に位置する特定のフック
  - エンジンからDownloaderに渡されるリクエストを処理
  - Downloaderからエンジンに渡されるレスポンスを処理


```python
# settings.py

DOWNLOADER_MIDDLEWARES_BASE = {
    # Engine side
    'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
    'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
    'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
    'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
    'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
    'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
    'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
    'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
    'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
    'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
    # Downloader side
}
```


- リクエストのタイムアウト
- リクエストと一緒に送信するヘッダーについて
- リクエストに使用するユーザーエージェント
- 失敗したリクエストの再試行
- クッキー、キャッシュ、レスポンス圧縮の管理


これらのデフォルトのミドルウェアは `settings.py` ファイルで `none` に設定することで無効にすることができます。<br>
以下は、`RobotsTxtMiddleware`を無効にする例です。

```python
# settings.py

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': None,
}
```


- 既存のミドルウェアを上書きしたり、まったく新しいミドルウェアを挿入することも可能です
  - Webサイトに送信する直前にリクエストを変更する（プロキシ、ユーザーエージェントなどを変更する）
  - 受信したレスポンスをスパイダーに渡す前に変更する
  - レスポンスに正しいデータが含まれていない場合、受信したレスポンスをスパイダーに渡す代わりにリクエストを再試行する。
  - Webページをフェッチせずにレスポンスをスパイダーに渡す
  - 一部のリクエストを黙ってドロップする

ここでは、すべてのリクエストでプロキシを使用するために、独自のミドルウェアを挿入する例を示します。<br>
これは `middlewares.py` ファイルに作成します。

```python
## middlewares.py

import base64

class MyProxyMiddleware(object):

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

    def __init__(self, settings):
        self.user = settings.get('PROXY_USER')
        self.password = settings.get('PROXY_PASSWORD')
        self.endpoint = settings.get('PROXY_ENDPOINT')
        self.port = settings.get('PROXY_PORT')

    def process_request(self, request, spider):
        user_credentials = '{user}:{passw}'.format(user=self.user, passw=self.password)
        basic_authentication = 'Basic ' + base64.b64encode(user_credentials.encode()).decode()
        host = 'http://{endpoint}:{port}'.format(endpoint=self.endpoint, port=self.port)
        request.meta['proxy'] = host
        request.headers['Proxy-Authorization'] = basic_authentication
```

`settings.py`ファイルで有効にして、プロキシ接続の詳細を記入します：

```python
## settings.py

PROXY_USER = 'username'
PROXY_PASSWORD = 'password'
PROXY_ENDPOINT = 'proxy.proxyprovider.com'
PROXY_PORT = '8000'

DOWNLOADER_MIDDLEWARES = {
    'myproject.middlewares.MyProxyMiddleware': 350,
}
```

#### Spider Middlewares

Spider Middlewaresは、Scrapyエンジンとスパイダーの間に位置し、スパイダーの入力（応答）と出力（アイテムやリクエスト）を処理する特定のフックです。

デフォルトでは、Scrapyは以下のダウンローダミドルウェアを有効にしています：

```python
# settings.py

SPIDER_MIDDLEWARES_BASE = {
    # Engine side
    'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
    'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
    'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
    'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800,
    'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
    # Spider side
}
```

スパイダーミドルウェアは、以下のような目的で使用されます：

- スパイダーコールバックの出力を後処理する - リクエストやアイテムの変更・追加・削除を行う
- start_requestsの後処理
- スパイダーの例外処理
- レスポンスの内容に基づき、一部のリクエストに対してコールバックの代わりに `errback` を呼び出す。

Downloader Middlewares と同様に、

これらのデフォルトの **Spider Middlewares** は `settings.py` ファイルで `none` に設定することで無効にすることができます。

```python
# settings.py

SPIDER_MIDDLEWARES = {
    'scrapy.spidermiddlewares.referer.RefererMiddleware': None,
}
```

### 6 Scrapy Settings

- `settings.py`ファイルは、Scrapyプロジェクトの中央制御パネルです。
  - デフォルトの機能を有効/無効にしたり、独自のカスタムミドルウェアや拡張機能を統合することができる。
  - `settings.py`ファイルを更新することでプロジェクト単位で設定を変更することができる。
  - `custom_settings`を各スパイダーに追加することで各スパイダー単位で設定を変更することができます。

以下の例では、`custom_settings` 属性を使用して、スクレイピングしたデータを `data.csv` ファイルに保存するように、スパイダーにカスタム設定を追加します。

```python
import scrapy
from demo.items import QuoteItem

class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    custom_settings = {
        'FEEDS': { 'data.csv': { 'format': 'csv',}}
        }


    def start_requests(self):
        url = 'https://quotes.toscrape.com/'
        yield scrapy.Request(url, callback=self.parse)


    def parse(self, response):
        quote_item = QuoteItem()
        for quote in response.css('div.quote'):
            quote_item['text'] = quote.css('span.text::text').get()
            quote_item['author'] = quote.css('small.author::text').get()
            quote_item['tags'] = quote.css('div.tags a.tag::text').getall()
            yield quote_item
```

Scrapyで設定できる内容は非常に多岐にわたるので、

それらをすべて調べたい方は、Scrapyが提供するデフォルト設定の[完全なリスト](https://docs.scrapy.org/en/latest/topics/settings.html#built-in-settings-reference)をご覧ください。