## Demo 3 - Scraping Book Titles Using Scrapy

In this demo, you will use Scrapy to fetch book titles from books.toscrape.com.

<h3>1. Installing scrapy and importing required modules

In [None]:
pip install scrapy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import scrapy
from scrapy.crawler import CrawlerRunner # To Run our spider

<h3>2. Setting up crochet

<h5>To run spiders smoothly scrapy uses twisted library internally. But the problem is the twisted reactor can only be instantiated once. Therefore, crochet is used so that we can test our spider easily.

In [None]:
pip install crochet

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import crochet
crochet.setup()

<h3>3. Inspecting the webpage

- Goto http://books.toscrape.com
- Inspect the first title of the book

![image.png](attachment:image.png)

<hr>

![image.png](attachment:image.png)

<hr>

<h3>4. Building the spider

In [None]:
class BookSpider(scrapy.Spider):
    name='BookSpider' # used to invoke spider
    
    #Used to start the requests
    start_urls=['http://books.toscrape.com/catalogue/page-1.html',
         'http://books.toscrape.com/catalogue/page-2.html',
         'http://books.toscrape.com/catalogue/page-3.html']

    ''' 
    Invoked by scrapy engine for every url
    Here we will use selectors to scrap the website
    '''
    def parse(self,response):
        book_list=response.css('article.product_pod>h3>a::attr(title)').getall()
        
        for i in book_list:
            print(i)

<h3>5. Crawling with spider

<h6>Running the spider using CrawlRunner

In [None]:
process = CrawlerRunner({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
})
process.crawl(BookSpider)

<Deferred at 0x7f8672f182d0>

##### Conclusion: This code demonstrate how to fetch the data using Scrapy.