## Creating a Scrapy project

### Setting up

In [None]:
# ! pip install scrapy
# ! scrapy startproject quotes_crawler
# ! cd quotes_crawler

Now Scrapy will set up a full project within the folder "quotes_crawler" and add some setup and basic files

### Creating the scraper / crawler

Inside quotes_crawler/spiders/quotes_spider.py:

In [1]:
import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ["http://quotes.toscrape.com"]

    def parse(self, response):
        # Loop through all quotes on the page
        for quote in response.css("div.quote"):
            yield {
                'text': quote.css("span.text::text").get(),
                'author': quote.css("small.author::text").get(),
                'tags': quote.css("div.tags a.tag::text").getall()
            }

        # Follow pagination link
        next_page = response.css("li.next a::attr(href)").get()
        if next_page:
            yield response.follow(next_page, self.parse)

Special notes:

- "yield" is used to stream multiple items and requests back to Scrapy one at a time, keeping the spider efficient and able to crawl many pages and extract lots of data in a single run.
- A single parse call often produces multiple items (like multiple quotes on one page). "yield" allows the function to output many items one by one, without exiting the method early.
- Using return would stop the function immediately after returning the first item.


- Scrapy is built to work with generators: it collects yielded items and requests asynchronously.
- Using return would break this flow and not allow Scrapy to handle multiple outputs properly.

### Running the spider

To simply run the crawler and scrape the site

In [None]:
!scrapy crawl quotes

Scrapy 2.13.3 - no active project

The crawl command is not available from this location.
These commands are only available from within a project: check, crawl, edit, list, parse.

Use "scrapy" to see available commands


To run the craler and extract as a json file

In [None]:
# !scrapy crawl quotes -o quotes.json

Similarly if you want a csv

In [11]:
# !scrapy crawl quotes -o quotes.csv