# Using `requests` module for Data Collection
Today we will see how to scrape websites and use requests module to download the raw html of a webpage. In this section we can safely use https://quotes.toscrape.com/ and https://books.toscrape.com/ for scraping demos

## 1. What is `requests`?
- requests is a Python library used to send HTTP requests easily.
- It allows you to fetch the content of a webpage programmatically.
- It is commonly used as the first step before parsing HTML with BeautifulSoup.

## 2. Installing requests
- To install `requests`, run:

    ```pip install requests```

## 3. Sending a Basic GET Request
## Example
```python
import requests

url = "https://example.com"
response = requests.get(url)

# Print the HTML content
print(response.text)
```

### Key points:

- `url`: The website you want to fetch.
- `response.text`: The HTML content of the page as a string.

## 4. Checking the Response Status
Always check if the request was successful:

```python 
print(response.status_code)
```

## Common Status Codes
- `200`: OK (Success)
- `404`: Not Found
- `403`: Forbidden
- `500`: Internal Server Error

### Good practice:
```python
if response.status_code == 200:
    print("Page fetched successfully!")
else:
    print("Failed to fetch the page.")
```
## 5. Important Response Properties
| Property            | Description                          |
|------------------------|--------------------------------------|
| `response.text`        | HTML content as Unicode text         |
| `response.content`     | Raw bytes of the response            |
| `response.status_code` | HTTP status code                  |
| `response.headers`     | Metadata like content-type, server info |

## 6. Adding Headers to Mimic a Browser
Sometimes websites block automated requests. Adding a User-Agent header helps the request look like it is coming from a real browser.

```python
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}

response = requests.get(url, headers=headers)

# Accessing response properties
print("Text:", response.text)
print("Content (bytes):", response.content)
print("Status Code:", response.status_code)
print("Headers:", response.headers)
```


## 7. Handling Connection Errors
Wrap your request in a try-except block to handle errors gracefully:

```python
try:
    response = requests.get(url, timeout=5)
    response.raise_for_status()  # Raises an HTTPError for bad responses
    print(response.text)
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
```

## 8. Best Practices for Fetching Pages
- Always check the HTTP status code.
- Use proper headers to mimic a browser.
- Set a timeout to avoid hanging indefinitely.
- Respect the website by not making too many rapid requests.
## 9. Summary
- `requests` makes it simple to fetch web pages using Python.
- It is the starting point for most web scraping workflows.
- Combining `requests` with BeautifulSoup allows for powerful data extraction.

In [7]:
import requests

In [8]:
a = requests.get("https://books.toscrape.com/")

In [9]:
with open(f"htmls/page1.html", "w") as f:
    f.write(a.text)

In [10]:
for i in range (1, 51):
    a = requests.get(f"https://books.toscrape.com/catalogue/page-{i}.html")
    with open(f"htmls/page{i}.html", "w", encoding = "utf-8") as f:
        f.write(a.text)
        print(f"Downloaded page {i} Succesfully")

Downloaded page 1 Succesfully
Downloaded page 2 Succesfully
Downloaded page 3 Succesfully
Downloaded page 4 Succesfully
Downloaded page 5 Succesfully
Downloaded page 6 Succesfully
Downloaded page 7 Succesfully
Downloaded page 8 Succesfully
Downloaded page 9 Succesfully
Downloaded page 10 Succesfully
Downloaded page 11 Succesfully
Downloaded page 12 Succesfully
Downloaded page 13 Succesfully
Downloaded page 14 Succesfully
Downloaded page 15 Succesfully
Downloaded page 16 Succesfully
Downloaded page 17 Succesfully
Downloaded page 18 Succesfully
Downloaded page 19 Succesfully
Downloaded page 20 Succesfully
Downloaded page 21 Succesfully
Downloaded page 22 Succesfully
Downloaded page 23 Succesfully
Downloaded page 24 Succesfully
Downloaded page 25 Succesfully
Downloaded page 26 Succesfully
Downloaded page 27 Succesfully
Downloaded page 28 Succesfully
Downloaded page 29 Succesfully
Downloaded page 30 Succesfully
Downloaded page 31 Succesfully
Downloaded page 32 Succesfully
Downloaded page 3