# Using requests module for Data Collection
Today we will see how to scrape websites and use requests module to download the raw html of a webpage. In this section we can safely use https://quotes.toscrape.com/ and https://books.toscrape.com/ for scraping demos

## 1. What is requests?
- requests is a Python library used to send HTTP requests easily.
- It allows you to fetch the content of a webpage programmatically.
- It is commonly used as the first step before parsing HTML with BeautifulSoup.

## 2. Sending a Basic GET Request

In [2]:
import requests

url = "https://books.toscrape.com/"
response = requests.get(url)

In [4]:
with open (f"HTMLS/page1.html" , "w") as f:
    f.write(response.text)

## 3. Checking the Response Status
Always check if the request was successful:

`print(response.status_code)`

Common Status Codes
- `200`: OK (Success)
- `404`: Not Found
- `403`: Forbidden
- `500`: Internal Server Error

## 4. Important Response Properties
| Property            | Description                                   |
|---------------------|-----------------------------------------------|
| `response.text`       | HTML content as Unicode text                  |
| `response.content`    | Raw bytes of the response                     |
| `response.status_code`| HTTP status code                              |
| `response.headers`    | Metadata like content-type, server info       |

## 5. Adding Headers to Mimic a Browser
Sometimes websites block automated requests. Adding a `User-Agent` header helps the request look like it is coming from a real browser.

In [5]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}

response = requests.get(url, headers=headers)

## 6. Handling Connection Errors
Wrap your request in a try-except block to handle errors gracefully:

In [7]:
try:
    response = requests.get(url, timeout=5)
    response.raise_for_status()  # Raises an HTTPError for bad responses
    # print(response.text)
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

## 7. Best Practices for Fetching Pages
- Always check the HTTP status code.
- Use proper headers to mimic a browser.
- Set a timeout to avoid hanging indefinitely.
- Respect the website by not making too many rapid requests.

## 8. Summary
- requests makes it simple to fetch web pages using Python.
- It is the starting point for most web scraping workflows.
- Combining requests with BeautifulSoup allows for powerful data extraction.

In [10]:
import os

os.makedirs("HTMLS", exist_ok=True)

for i in range(1, 51):
    response = requests.get(f"https://books.toscrape.com/catalogue/page-{i}.html")
    
    if response.status_code == 200:
        with open(f"HTMLS/page{i}.html", "w", encoding="utf-8") as f:
            f.write(response.text)
        print(f"Page {i} downloaded successfully.")
    else:
        print(f"Page {i} not found. Status code: {response.status_code}")

Page 1 downloaded successfully.
Page 2 downloaded successfully.
Page 3 downloaded successfully.
Page 4 downloaded successfully.
Page 5 downloaded successfully.
Page 6 downloaded successfully.
Page 7 downloaded successfully.
Page 8 downloaded successfully.
Page 9 downloaded successfully.
Page 10 downloaded successfully.
Page 11 downloaded successfully.
Page 12 downloaded successfully.
Page 13 downloaded successfully.
Page 14 downloaded successfully.
Page 15 downloaded successfully.
Page 16 downloaded successfully.
Page 17 downloaded successfully.
Page 18 downloaded successfully.
Page 19 downloaded successfully.
Page 20 downloaded successfully.
Page 21 downloaded successfully.
Page 22 downloaded successfully.
Page 23 downloaded successfully.
Page 24 downloaded successfully.
Page 25 downloaded successfully.
Page 26 downloaded successfully.
Page 27 downloaded successfully.
Page 28 downloaded successfully.
Page 29 downloaded successfully.
Page 30 downloaded successfully.
Page 31 downloaded 

### For more information check out below link:
https://requests.readthedocs.io/en/latest/