# Most Used Functions in BeautifulSoup

BeautifulSoup is a Python library for parsing HTML and XML documents. It creates parse trees from page source codes that can be used to extract data easily. In this notebook, we will cover some of the most commonly used functions and techniques in BeautifulSoup.

## 1. Parsing HTML Content

To parse HTML content, you need to create a `BeautifulSoup` object by passing the HTML content and the parser you want to use.

In [None]:
# Example: Parsing HTML content
from bs4 import BeautifulSoup

html_content = "<html><head><title>Example</title></head><body><h1>Hello, world!</h1></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')
print(soup.prettify())

## 2. Finding Elements by Tag Name

You can use the `find` and `find_all` methods to find elements by their tag name.

In [None]:
# Example: Finding elements by tag name
# Finding a single element
title_tag = soup.find('title')
print(f'Title tag: {title_tag}')

# Finding all elements with the same tag name
h1_tags = soup.find_all('h1')
print(f'H1 tags: {h1_tags}')

## 3. Finding Elements by CSS Class

You can use the `find` and `find_all` methods to find elements by their CSS class.

In [None]:
# Example: Finding elements by CSS class
html_content = "<html><body><div class='item'>Item 1</div><div class='item'>Item 2</div></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')

# Finding elements by class name
div_tags = soup.find_all('div', class_='item')
print(f'Div tags with class "item": {div_tags}')

## 4. Finding Elements by ID

You can use the `find` method to find elements by their ID.

In [None]:
# Example: Finding elements by ID
html_content = "<html><body><div id='unique'>Unique Item</div></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')

# Finding element by ID
div_tag = soup.find('div', id='unique')
print(f'Div tag with id "unique": {div_tag}')

## 5. Extracting Text from Elements

You can use the `get_text` method to extract text from elements.

In [None]:
# Example: Extracting text from elements
html_content = "<html><body><div class='item'>Item 1</div><div class='item'>Item 2</div></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')

# Extracting text from elements
for div in soup.find_all('div', class_='item'):
    print(f'Text: {div.get_text()}')

## 6. Navigating the Parse Tree

BeautifulSoup provides several ways to navigate the parse tree, including accessing parents, children, and siblings of elements.

In [None]:
# Example: Navigating the parse tree
html_content = "<html><body><div class='item'><p>Paragraph 1</p><p>Paragraph 2</p></div></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')

# Accessing children
div_tag = soup.find('div', class_='item')
for child in div_tag.children:
    print(f'Child: {child}')

# Accessing parent
p_tag = soup.find('p')
parent = p_tag.parent
print(f'Parent: {parent}')

# Accessing siblings
sibling = p_tag.find_next_sibling()
print(f'Next sibling: {sibling}')

## 7. Modifying the Parse Tree

You can modify the parse tree by adding, removing, or replacing elements.

In [None]:
# Example: Modifying the parse tree
html_content = "<html><body><div class='item'>Item 1</div></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')

# Adding a new tag
new_tag = soup.new_tag('div', class_='item')
new_tag.string = "Item 2"
soup.body.append(new_tag)
print(soup.prettify())

# Removing a tag
soup.div.decompose()
print(soup.prettify())

# Replacing a tag
html_content = "<html><body><div class='item'>Item 1</div></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')
new_tag = soup.new_tag('span', class_='item')
new_tag.string = "Item 1"
soup.div.replace_with(new_tag)
print(soup.prettify())

## 8. Searching with Regular Expressions

You can use regular expressions to search for elements with specific patterns.

In [None]:
# Example: Searching with regular expressions
import re

html_content = "<html><body><div class='item'>Item 1</div><div class='thing'>Item 2</div></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')

# Searching with regular expressions
div_tags = soup.find_all('div', class_=re.compile(r'item|thing'))
print(f'Div tags matching pattern: {div_tags}')

## 9. Working with Attributes

You can access, modify, and delete attributes of elements.

In [None]:
# Example: Working with attributes
html_content = "<html><body><div class='item' id='unique'>Item 1</div></body></html>"
soup = BeautifulSoup(html_content, 'html.parser')

# Accessing attributes
div_tag = soup.find('div', class_='item')
print(f"ID attribute: {div_tag['id']}")

# Modifying attributes
div_tag['id'] = 'new_id'
print(soup.prettify())

# Deleting attributes
del div_tag['id']
print(soup.prettify())

## 10. Using BeautifulSoup with Requests

BeautifulSoup can be used with the `requests` library to scrape data from live websites.

In [None]:
# Example: Using BeautifulSoup with requests
import requests
from bs4 import BeautifulSoup

# Sending a GET request
response = requests.get('http://example.com')

# Parsing the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
print(soup.prettify())