## Introduction to Web Scraping

Web scraping is not always legal. Here are some important considerations:

- **Respectful Requests**: Avoid overwhelming a website with scraping requests to prevent your IP address from being blocked.
- **Check Permissions**: Always verify the website's terms of use before scraping. If an API is available, use it instead. Most websites prohibit commercial use of their data.
- **Website Specifics**: Each website is unique and may change over time, so your scraping code may need updates or customization.

### When to Scrape a Website:
- **No API Available**: Scraping is useful when the information you need isn't accessible through an API.
- **Anonymity**: Use a VPN if you want to scrape anonymously.

### Web Scraping Practice:
You can practice scraping on this sandbox: [http://toscrape.com/](http://toscrape.com/).

Today, we'll start by scraping [Wikipedia](https://www.wikipedia.org/) because it is legal to scrape.

This lesson is adapted from: [Complete Python 3 Bootcamp](https://github.com/Pierian-Data/Complete-Python-3-Bootcamp/blob/master/13-Web-Scraping/00-Guide-to-Web-Scraping.ipynb).

### Installation:
Make sure you have the following libraries installed:

- **Requests**: `pip install requests`
- **BeautifulSoup**: `pip install bs4`

If you're using Anaconda:
- **Requests**: `conda install requests`
- **BeautifulSoup**: `conda install bs4`

Or install directly in the notebook:
```python
!pip install requests
!pip install bs4

In [1]:
# Import the requests library to fetch web pages
import requests

# Fetch the Wikipedia page for "Black Lives Matter"
response = requests.get("https://en.wikipedia.org/wiki/Black_Lives_Matter")

In [None]:
# Print the HTML content of the page
print(response.text)

### Understanding the Response
The `response.text` contains the HTML content of the Wikipedia page. This is the raw data we will parse using BeautifulSoup.

In [None]:
# Import BeautifulSoup from bs4
from bs4 import BeautifulSoup

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Print the parsed HTML
print(soup.prettify())

### Extracting Data
Now that we have parsed the HTML, we can extract specific elements like titles, paragraphs, and links.

In [None]:
# Extract the title of the page
title = soup.title.string
print("Page Title:", title)

# Extract the first paragraph
first_paragraph = soup.find('p').text
print("First Paragraph:", first_paragraph)

### Extracting Links
We can also extract all the links on the page.

In [None]:
# Find all anchor tags (<a>) in the HTML
links = soup.find_all('a')

# Print the href attribute of each link
for link in links:
    print(link.get('href'))

### Conclusion
This is a basic introduction to web scraping using Python. You can now fetch and parse web pages, extract specific data, and even save it for further analysis.