Beautiful Soup is a Python library used for **pulling data out of HTML and XML files**. It's the standard tool for **web scraping** in Python. 🐍

## 1\. Installation

pip install beautifulsoup4



## 2\. Importing and Parsing

First, you need to import the library and pass the raw HTML content (usually obtained from the `requests` library) to the `BeautifulSoup` constructor to create a parse tree.

In [None]:
from bs4 import BeautifulSoup
import requests

html_doc     =     """
<html><head><title>The Title</title></head>
<body>
<p class = "story">Once upon a time...</p>
<a class = "link" href = "http://example.com/link1">Link 1</a>
<a class = "link" href = "http://example.com/link2">Link 2</a>
</body></html>
"""

# Create the BeautifulSoup object
soup = BeautifulSoup(html_doc, 'html.parser')

## 3\. Basic Navigation & Access

| Method | Description | Example | Result |
| :--- | :--- | :--- | :--- |
| **`soup.tag`** | Accesses the **first** instance of a tag. | `soup.title` | `<title>The Title</title>` |
| **`.name`** | Gets the tag name. | `soup.title.name` | `'title'` |
| **`.string`** | Gets the text content inside the tag. | `soup.title.string` | `'The Title'` |
| **`['attr']`** | Gets the value of a specific attribute. | `soup.p['class']` | `['story']` |

## 4\. Searching for Elements

The main power of Beautiful Soup comes from its search methods.

### A. `find()` (Find the First)

Used to find the **first matching tag**.

# Find the first <a> tag
first_link = soup.find('a')

# Find the first <p> tag with the class 'story'
first_story = soup.find('p', class_ = 'story')

# Result
print(first_link) # <a class = "link" href = "http://example.com/link1">Link 1</a>


### B. `find_all()` (Find All)

Used to find **all matching tags** and returns them as a **list**.


# Find all <a> tags
all_links = soup.find_all('a')

# Find all tags with the class 'link'
all_links_by_class = soup.find_all(class_ = 'link')

# Result
print(len(all_links)) # 2
print(all_links[1].get('href')) # http://example.com/link2
```

# Find all <a> tags
all_links = soup.find_all('a')

# Find all tags with the class 'link'
all_links_by_class = soup.find_all(class_ = 'link')

# Result
print(len(all_links)) # 2
print(all_links[1].get('href')) # http://example.com/link2
```


**Note:** When searching for the **`class`** attribute, you must use **`class_`** (with an underscore) because `class` is a reserved keyword in Python.

In [26]:
from bs4 import BeautifulSoup
import requests as req

In [27]:
page = req.get('https://quotes.toscrape.com/', verify=False)



In [28]:
soup = BeautifulSoup(page.content, 'html.parser')
soup

<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Quotes to Scrape</title>
<link href="/static/bootstrap.min.css" rel="stylesheet"/>
<link href="/static/main.css" rel="stylesheet"/>
</head>
<body>
<div class="container">
<div class="row header-box">
<div class="col-md-8">
<h1>
<a href="/" style="text-decoration: none">Quotes to Scrape</a>
</h1>
</div>
<div class="col-md-4">
<p>
<a href="/login">Login</a>
</p>
</div>
</div>
<div class="row">
<div class="col-md-8">
<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
            Tags:
            <meta class="keywords" content="change,deep-thoughts,thinking,world" itemprop="keywords"/>
<a class="

In [29]:
page_heading=soup.find('a')
page_heading

<a href="/" style="text-decoration: none">Quotes to Scrape</a>

In [30]:
page_heading.getText()

'Quotes to Scrape'

In [31]:
page_heading.text

'Quotes to Scrape'

In [43]:
quotes = soup.find_all('div',class_ = 'quote')
for quote in quotes:
    quote_text = quote.find('span',class_ = 'text')
    author = quote.find('small',class_ = 'author')
    print(quote_text.text)
    # print(quote_text.getText())
    print(author.getText())

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Albert Einstein
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
J.K. Rowling
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Albert Einstein
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Jane Austen
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Marilyn Monroe
“Try not to become a man of success. Rather become a man of value.”
Albert Einstein
“It is better to be hated for what you are than to be loved for what you are not.”
André Gide
“I have not failed. I've just found 10,000 ways that won't work.”
Thomas A. Edison
“A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Eleanor Roosevelt
“A day witho

find quotes by specific author

In [37]:
print('Quotes by Author')
def show__quotes_by_author(author):
    quotes = soup.find_all('div',class_ = 'quote')
    for quote in quotes:
        q_author = quote.find('small',class_ = 'author')
        if author in q_author.text :
            quote_text = quote.find('span',class_ = 'text')
            return quote_text.text

show__quotes_by_author('Albert Einstein')


Quotes by Author


'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'

In [39]:
authors = soup.find_all('small',string = lambda s: 'Albert Einstein' in s)
authors

[<small class="author" itemprop="author">Albert Einstein</small>,
 <small class="author" itemprop="author">Albert Einstein</small>,
 <small class="author" itemprop="author">Albert Einstein</small>]

In [None]:
authors = soup.find_all('small',string = lambda s: 'Albert Einstein' in s)

for author in authors:
    quote_t = author.parent.parent.find('span',class_ = 'text')
    print(quote_t.text)

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
“Try not to become a man of success. Rather become a man of value.”
