crawler.py

Web Crawler

Installation of the Crawler.py Project

This project provides a web crawler built with Scrapy that extracts book titles, prices, and availability from a website. To get started using this project, follow the installation steps below.

Prerequisites

Before installing the project, make sure you have Python installed on your system. You can download Python from python.org.

Installation Steps

Clone the GitHub repository to your local machine using the following command:
```
git clone https://github.com/milosnowcat/crawler.py.git
```
Navigate to the crawler.py directory:
```
cd crawler.py
```
Install Scrapy using pip:
```
pip install -r requirements.txt
```
Run the spider to start crawling:
```
scrapy crawl books
```
This will start the web crawling process and extract book information from the specified website.

That's it! You have successfully installed and executed the Crawler.py project.

Using the Crawler.py Project

The Crawler.py project is a web crawler built with Scrapy that extracts book titles, prices, and availability from a website. Here are the steps for using the project.

Web Crawling

Ensure you have followed the installation steps mentioned in the "Installation of the Crawler.py Project" section.
After running the scrapy crawl books command, the spider will start crawling the website http://books.toscrape.com/.
The spider follows two rules defined in the Crawler.py class:
- It follows links with "catalogue/category" in the URL.
- It follows links with "catalogue" in the URL but excludes those with "category." The parse_item callback function is used to extract book information from these pages.
The extracted information includes book titles, prices, and availability.
The crawled data is printed to the console in JSON format.
You can customize the spider to save the data to a file, database, or perform other actions as needed.
To stop the spider, press Ctrl + C in your terminal.

That's it! You have successfully used the Crawler.py project to scrape book information from a website using Scrapy.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
crawlerpy		crawlerpy
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
output.json		output.json
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crawlerpy

crawlerpy

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

_config.yml

_config.yml

output.json

output.json

requirements.txt

requirements.txt

scrapy.cfg

scrapy.cfg

Repository files navigation

crawler.py

Installation of the Crawler.py Project

Prerequisites

Installation Steps

Using the Crawler.py Project

Web Crawling

About

Releases 1

Packages

Languages

License

milosnowcat/crawler.py

Folders and files

Latest commit

History

Repository files navigation

crawler.py

Installation of the Crawler.py Project

Prerequisites

Installation Steps

Using the Crawler.py Project

Web Crawling

About

Topics

Resources

License

Stars

Watchers

Forks

Languages