<a href="https://colab.research.google.com/github/kwb425/class2023Fall/blob/main/class2023Fall_1006.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **1. Understanding Web Scraping and its Applications**

---

Web scraping is the process of extracting data from websites. This data can be anything: text, images, links, and more. It's a way to programmatically interact with a webpage, allowing you to pull data from online sources that may not have an API.

**Applications**:
- Price comparison
- Data collection for research
- Job listings
- Monitoring changes on websites
- And many more...

---

### **2. Introduction to the BeautifulSoup Library**

---

Beautiful Soup is a Python library for web scraping purposes. It creates parse trees from page source code that can be used to extract data easily.

**Installation**:
```python
!pip install beautifulsoup4
```

---

### **3. Extracting Data from Websites**

---

To begin with, let's extract the title of a webpage.

**Example**:

```python
import requests
from bs4 import BeautifulSoup

URL = 'https://www.example.com'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

# Extracting the title of the webpage
title = soup.title.string
print(title)
```

---

### **4. Handling Different HTML Elements**

---

#### **a. Extracting Paragraphs**

```python
# Extracting all paragraphs from the page
paragraphs = soup.find_all('p')
for p in paragraphs:
    print(p.text)
```

#### **b. Extracting Links**

```python
# Extracting all links from the page
links = soup.find_all('a')
for link in links:
    print(link.get('href'))
```

#### **c. Extracting with Class and ID**

HTML elements can also have attributes like classes and IDs. BeautifulSoup can filter elements based on these attributes.

```python
# Extracting elements with a specific class
specific_class_elements = soup.find_all(class_='specific-class')

# Extracting elements with a specific ID
specific_id_elements = soup.find_all(id='specific-id')
```

#### **d. Navigating the Tree Structure**

BeautifulSoup allows you to navigate the parse tree. This means you can access child elements, siblings, and even parent elements.

```python
# Accessing the first child of an element
first_child = soup.p.contents[0]

# Accessing the parent of an element
parent_element = soup.p.parent
```