### Working with Books on ToScrape
#### About ToScrape
Books ToScrape is a free web scraping sandbox that provides a collection of books categorized by genres, ratings, and prices. It is commonly used by developers for practicing web scraping techniques using Python, BeautifulSoup, and Requests. The website offers a structured and consistent HTML layout, making it an ideal target for learning how to extract data.

In [1]:
# https://books.toscrape.com/catalogue/category/books_1/page-2.html
# https://books.toscrape.com/catalogue/category/books_1/page-3.html

### Defining the URL and Formatting It for Pagination

In [3]:
import requests
from bs4 import BeautifulSoup

In [9]:
base_url = "https://books.toscrape.com/catalogue/category/books_1/page-{}.html"
url = base_url.format(3)
print(url)

https://books.toscrape.com/catalogue/category/books_1/page-3.html


#### Explanation:

base_url contains a formatted string with {} to allow dynamic insertion of page numbers.

base_url.format(3) replaces {} with 3, creating the URL for page 3.

### 2. Sending an HTTP Request and Parsing HTML

In [12]:
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
print(soup)


<!DOCTYPE html>

<!--[if lt IE 7]>      <html lang="en-us" class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]>         <html lang="en-us" class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]>         <html lang="en-us" class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-us"> <!--<![endif]-->
<head>
<title>
    Books | 
     Books to Scrape - Sandbox

</title>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<meta content="24th Jun 2016 09:30" name="created"/>
<meta content="
    
" name="description"/>
<meta content="width=device-width" name="viewport"/>
<meta content="NOARCHIVE,NOCACHE" name="robots"/>
<!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
<!--[if lt IE 9]>
        <script src="//html5shim.googlecode.com/svn/trunk/html5.js"></script>
        <![endif]-->
<link href="../../../static/oscar/favicon.ico" rel="shortcut icon"/>
<link href="../../../static/oscar/css/styles.css" rel="stylesheet" type="

#### Explanation:

requests.get(url): Sends a GET request to fetch the webpage content.

BeautifulSoup(response.text, "html.parser"): Parses the HTML content of the page.

print(soup): Displays the full parsed HTML (useful for debugging but prints a lot of text).

### 3. Selecting Books from the Page

In [14]:
books = soup.select('.product_pod')
print(len(books))

20


#### Explanation:

.select('.product_pod'): Extracts all books from the page, as each book's HTML container has the class "product_pod".

print(len(books)): Prints the number of books found on the page (usually 20 per page).

### 4. Filtering Books with a "Two Star Rating"

In [17]:
for book in books:
    if "star-rating Two" in str(book):
        title = book.h3.a.attrs['title']
        print(title)

Reasons to Stay Alive
Without Borders (Wanderlove #1)


#### Explanation:
```
Iterates over each book element in books.

Checks if the class star-rating Two is present in the book's HTML. This means the book has two stars.

Extracts the book title from the <h3> tag inside <a> (book.h3.a.attrs['title']).

Prints the title of each two-star-rated book.
```

### 5. Scraping Multiple Pages (Looping Through 50 Pages)

In [18]:
for page in range(1, 51):
    url = base_url.format(page)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    books = soup.select(".product_pod")

    for book in books:
        if "star-rating Two" in str(book):
            title = book.h3.a.attrs['title']
            print(title)

Starving Hearts (Triangular Trade Trilogy, #1)
Libertarianism for Beginners
It's Only the Himalayas
How Music Works
Maude (1883-1993):She Grew Up with the country
You can't bury them all: Poems
Reasons to Stay Alive
Without Borders (Wanderlove #1)
Soul Reader
Security
Saga, Volume 5 (Saga (Collected Editions) #5)
Reskilling America: Learning to Labor in the Twenty-First Century
Political Suicide: Missteps, Peccadilloes, Bad Calls, Backroom Hijinx, Sordid Pasts, Rotten Breaks, and Just Plain Dumb Mistakes in the Annals of American Politics
Obsidian (Lux #1)
My Paris Kitchen: Recipes and Stories
Masks and Shadows
Lumberjanes, Vol. 2: Friendship to the Max (Lumberjanes #5-8)
Lumberjanes Vol. 3: A Terrible Plan (Lumberjanes #9-12)
Judo: Seven Steps to Black Belt (an Introductory Guide for Beginners)
I Hate Fairyland, Vol. 1: Madly Ever After (I Hate Fairyland (Compilations) #1-5)
Giant Days, Vol. 2 (Giant Days #5-8)
Everydata: The Misinformation Hidden in the Little Data You Consume Every 

#### Explanation:
```
Loops from page 1 to 50, updating url dynamically for each page.

Fetches and parses each page's HTML using requests.get(url) and BeautifulSoup.

Extracts all books from the page using .select(".product_pod").

Filters books with a two-star rating (star-rating Two).

Extracts the title and prints it.
```

### Summary of Features
- Dynamically builds URLs for paginated book listings.
- Extracts books from each page.
- Filters books by star rating (two-star rating in this case).
- Scrapes multiple pages (1-50) efficiently.
- Prints relevant book titles meeting the criteria.

This script is a great foundation for a full-fledged book scraper, and it can be extended to collect prices, availability, and genres. Let me know if you need enhancements!