
# Web Scraping Books Data

This notebook extracts book information from [Books to Scrape](https://books.toscrape.com/).

It retrieves book names, prices, ratings, and availability, along with their categories.

The books are available in random in home page.This project scrapes the categories available and its links to get list of books available in each category .


## 1. Configuration



1.   Import necessary libraries.
2.   Set the source URL
3.   Initialize the data structure for storing book details.

In [None]:
# Importing necessary libraries

import requests
from bs4 import BeautifulSoup
import pandas as pd

In [None]:
# Source URL
source_url = "https://books.toscrape.com/"

In [None]:
# Dictionary to store book details
books ={"Name":[],
        "Category":[],
        "Price":[],
        "Rating":[],
        "Availability":[]}

## 2. Retrieve Category Links

function to get category names and links from the homepage.

In [None]:
def get_categories(base_url):
    response = requests.get(base_url)
    response.raise_for_status()  # Raise an error for bad responses
    soup = BeautifulSoup(response.content, "html.parser")
    sublink = soup.find("ul", class_="nav nav-list")

    categories = []
    links = []

    for a in sublink.find_all("a"):
        categories.append(a.text.strip())
        links.append(a.get("href"))

    # Skip the first item (usually 'Books') as it’s a header
    return categories[1:], links[1:]

categories, category_links = get_categories(source_url)
print("Categories found:", categories)
print("Links to Categories:",category_links)

Categories found: ['Travel', 'Mystery', 'Historical Fiction', 'Sequential Art', 'Classics', 'Philosophy', 'Romance', 'Womens Fiction', 'Fiction', 'Childrens', 'Religion', 'Nonfiction', 'Music', 'Default', 'Science Fiction', 'Sports and Games', 'Add a comment', 'Fantasy', 'New Adult', 'Young Adult', 'Science', 'Poetry', 'Paranormal', 'Art', 'Psychology', 'Autobiography', 'Parenting', 'Adult Fiction', 'Humor', 'Horror', 'History', 'Food and Drink', 'Christian Fiction', 'Business', 'Biography', 'Thriller', 'Contemporary', 'Spirituality', 'Academic', 'Self Help', 'Historical', 'Christian', 'Suspense', 'Short Stories', 'Novels', 'Health', 'Politics', 'Cultural', 'Erotica', 'Crime']
Links to Categories: ['catalogue/category/books/travel_2/index.html', 'catalogue/category/books/mystery_3/index.html', 'catalogue/category/books/historical-fiction_4/index.html', 'catalogue/category/books/sequential-art_5/index.html', 'catalogue/category/books/classics_6/index.html', 'catalogue/category/books/phi

## 3. Scrape Book Data by Category

For each category, navigating through pagination and scraping book details.

In [None]:
def scrape_category(category_name, category_link, base_url):
    url = base_url + category_link
    page_num = 1

    while True:
        response = requests.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, "html.parser")

        ol = soup.find("ol")
        articles = ol.find_all("article", class_="product_pod")

        for art in articles:
            name = art.find("h3").find("a")["title"]
            books["Name"].append(name)

            price = art.find("p", class_="price_color").get_text()
            books["Price"].append(price)

            rating = art.find("p")['class'][1]
            books["Rating"].append(rating)

            availability = art.find("p", class_="instock availability").get_text().strip()
            books["Availability"].append(availability)

            books["Category"].append(category_name)

        # Check if there is a next page
        pagination = soup.find("li", class_="next")
        if pagination:
            next_page = pagination.find("a").get("href")
            if page_num == 1:
                url = url.replace("index.html", next_page)
            else:
                url = url.replace(f"page-{page_num}.html", next_page)
            page_num += 1
        else:
            break

In [None]:
for cat, link in zip(categories, category_links):
    print(f"Scraping category: {cat}")
    scrape_category(cat, link, source_url)

Scraping category: Travel
Scraping category: Mystery
Scraping category: Historical Fiction
Scraping category: Sequential Art
Scraping category: Classics
Scraping category: Philosophy
Scraping category: Romance
Scraping category: Womens Fiction
Scraping category: Fiction
Scraping category: Childrens
Scraping category: Religion
Scraping category: Nonfiction
Scraping category: Music
Scraping category: Default
Scraping category: Science Fiction
Scraping category: Sports and Games
Scraping category: Add a comment
Scraping category: Fantasy
Scraping category: New Adult
Scraping category: Young Adult
Scraping category: Science
Scraping category: Poetry
Scraping category: Paranormal
Scraping category: Art
Scraping category: Psychology
Scraping category: Autobiography
Scraping category: Parenting
Scraping category: Adult Fiction
Scraping category: Humor
Scraping category: Horror
Scraping category: History
Scraping category: Food and Drink
Scraping category: Christian Fiction
Scraping category: 

## 4. Save Data to EXCEL

Converting the collected data into a Pandas DataFrame and saving it to a EXCEL file.

In [None]:
df = pd.DataFrame(books)
df.to_excel("bookscrape.xlsx", index=False)
print("Data saved to bookscrape.xlsx")

Data saved to bookscrape.xlsx


## 5. Display Data

Show the first few rows of the DataFrame to verify the scraped data.

In [None]:
df.head()

Unnamed: 0,Name,Category,Price,Rating,Availability
0,It's Only the Himalayas,Travel,£45.17,Two,In stock
1,Full Moon over Noah’s Ark: An Odyssey to Mount...,Travel,£49.43,Four,In stock
2,See America: A Celebration of Our National Par...,Travel,£48.87,Three,In stock
3,Vagabonding: An Uncommon Guide to the Art of L...,Travel,£36.94,Two,In stock
4,Under the Tuscan Sun,Travel,£37.33,Three,In stock
