Web Crawler
This project provides a web crawler built with Scrapy that extracts book titles, prices, and availability from a website. To get started using this project, follow the installation steps below.
Before installing the project, make sure you have Python installed on your system. You can download Python from python.org.
-
Clone the GitHub repository to your local machine using the following command:
git clone https://github.com/milosnowcat/crawler.py.git
-
Navigate to the
crawler.py
directory:cd crawler.py
-
Install Scrapy using pip:
pip install -r requirements.txt
-
Run the spider to start crawling:
scrapy crawl books
This will start the web crawling process and extract book information from the specified website.
That's it! You have successfully installed and executed the Crawler.py project.
The Crawler.py project is a web crawler built with Scrapy that extracts book titles, prices, and availability from a website. Here are the steps for using the project.
-
Ensure you have followed the installation steps mentioned in the "Installation of the Crawler.py Project" section.
-
After running the
scrapy crawl books
command, the spider will start crawling the website http://books.toscrape.com/. -
The spider follows two rules defined in the
Crawler.py
class:- It follows links with "catalogue/category" in the URL.
- It follows links with "catalogue" in the URL but excludes those with "category." The
parse_item
callback function is used to extract book information from these pages.
-
The extracted information includes book titles, prices, and availability.
-
The crawled data is printed to the console in JSON format.
-
You can customize the spider to save the data to a file, database, or perform other actions as needed.
-
To stop the spider, press
Ctrl + C
in your terminal.
That's it! You have successfully used the Crawler.py project to scrape book information from a website using Scrapy.