# 📚 Project Brief – Books to Scrape

## ⏱️ Estimated Time : **300 minutes (5 hours)**


This project will guide you through building a **web scraper** for the site [Books to Scrape](https://books.toscrape.com/). It is designed as a hands-on exercise to practice **Python, requests, Pandas, and data analysis**.

## 🎯 Context

The marketing team of a online bookstore wants to better understand their catalog. They want to collect information about all books, analyze categories, prices, ratings, and stock availability.

As a data scientist, your mission is to **scrape the website** and deliver structured datasets and insights.

## ✅ Goals

1. Scrape the website [Books to Scrape](https://books.toscrape.com/)
2. Extract for each book:
   - Title
   - Price
   - Stock availability
   - Rating
   - Product URL
   - Image URL
   - UPC
   - Category
3. Handle **pagination** across all pages
4. Save results into **one CSV per category**
5. Download book cover images into folders per category


## 📦 Deliverables

- CSV files: `outputs/csv/category_<slug>.csv`
- Images: `outputs/images/<category>/<upc>_<slug-title>.jpg`
- Optional: A summary notebook that cleans the data and explores prices, ratings, and stock using Pandas and your best vizualisation tool.

## 💡 Hints

- Use the libraries: `requests`, `scrapy`, and `lxml`
- Inspect the HTML with your browser (right-click → *Inspect*) to identify selectors
- Ratings are given as words in the `class` attribute (e.g., `star-rating Three`)
- Use helper functions to keep your code clean (e.g., `parse_book()`, `parse_category()`)
- Add delays between requests (`time.sleep`) to avoid hammering the server


## 🛠 Suggested Steps

1. Start by scraping **one book page** and extract the required fields
2. Extend your code to **one category** (handle multiple pages)
3. Generalize your scraper to cover **all categories**
4. Save the results into CSV files
5. Extend your scraper to also **download images**
6. (Optional) Explore the dataset with Pandas (average price per category, distribution of ratings, etc.)

## 📝 Evaluation Criteria

- Correctness of the scraper (all required fields extracted)
- Clean code structure (functions, reusable logic)
- Proper CSV and image outputs
- (Optional) Extra credit for insightful visualizations