# Beautiful Soup Objects

Tutorial for improve skills: 'Beautiful Soup Tutorial - Web Scraping in Python' (freeCodeCamp.org) by Marcus Mariano

**For more information about Marcus Mariano: [Web site](https://marcusmariano.github.io/mmariano/)**  

**Beautiful Soup Tutorial - Web Scraping in Python [here.](https://www.youtube.com/watch?v=87Gx3U0BDlo&t=219s)** 

In [1]:
from bs4 import BeautifulSoup as bs4

In [2]:
# To keep things simple and also reproducible, consider the following HTML code
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; their names:
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>

<b class="boldest">Extremely bold</b>
<blockquote class="boldest">Extremely bold</blockquote>
<b id="1">Test 1</b>
<b another-attribute="1" id="verybold">Test 2</b>
"""


In [6]:
with open('index.html', 'w') as f:
    f.write(html_doc)

In [26]:
soup = bs4(html_doc, "lxml")

In [4]:
# Print out nicely formatted HTML:
print(soup)

<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; their names:
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
<b class="boldest">Extremely bold</b>
<blockquote class="boldest">Extremely bold</blockquote>
<b id="1">Test 1</b>
<b another-attribute="1" id="verybold">Test 2</b>
</body></html>


In [5]:
# Print out nicely formatted HTML:
print(soup.prettify())

<html>
 <head>
  <title>
   The Dormouse's story
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    The Dormouse's story
   </b>
  </p>
  <p class="story">
   Once upon a time there were three little sisters; their names:
   <a class="sister" href="http://example.com/elsie" id="link1">
    Elsie
   </a>
   ,
   <a class="sister" href="http://example.com/lacie" id="link2">
    Lacie
   </a>
   and
   <a class="sister" href="http://example.com/tillie" id="link3">
    Tillie
   </a>
   ;
and they lived at the bottom of a well.
  </p>
  <p class="story">
   ...
  </p>
  <b class="boldest">
   Extremely bold
  </b>
  <blockquote class="boldest">
   Extremely bold
  </blockquote>
  <b id="1">
   Test 1
  </b>
  <b another-attribute="1" id="verybold">
   Test 2
  </b>
 </body>
</html>


## Tag:


In [6]:
# Tag:

# Finds the first occurrence of usage for a "b"
# bold tag.
print(soup.b)


<b>The Dormouse's story</b>


In [7]:
# Tag:
print(soup.p)

<p class="title"><b>The Dormouse's story</b></p>


In [8]:

# The "find" function also does the same, where it
# only finds the first occurrence in the HTML doc
# of a tag with "b".
print(soup.find('b'))

<b>The Dormouse's story</b>


In [9]:
# If we want to find all of the elements on the page
# with the "b" tag, we can use the "find_all" function.
print(soup.find_all('b'))

[<b>The Dormouse's story</b>, <b class="boldest">Extremely bold</b>, <b id="1">Test 1</b>, <b another-attribute="1" id="verybold">Test 2</b>]


In [10]:
len(soup.find_all('b'))

4

In [15]:
# Tag:

# Finds the first occurrence of usage for a "b"
# bold tag.
print(soup.b)

<b>The Dormouse's story</b>


## Name:


In [16]:
# Name:

# This gives the name of the tag. In this case, the 
# tag name is "b".
print(soup.b.name)

b


In [12]:
print(soup.p.name)

p


In [17]:
# We can alter the name and have that reflected in the
# source. For instance:
tag = soup.b
print(tag)
tag.name = "blockquote"
print(tag)

<b>The Dormouse's story</b>
<blockquote>The Dormouse's story</blockquote>


## Attributes:


In [11]:
# Attributes:

tag = soup.find_all('b')[2]
print(tag)

<b id="1">Test 1</b>


In [12]:
# This specific tag has the attribute "id", which
# can be accessed like so:
print(tag['id'])

1


In [15]:
tag = soup.find_all('b')[3]
print(tag)

<b another-attribute="1" id="verybold">Test 2</b>


In [16]:
# We can even access multiple attributes that are
# non-standard HTML attributes:
print(tag['id'])
print(tag['another-attribute'])

verybold
1


In [18]:
# If we want to see all attributes, we can access them
# as a dictionary object:
tag = soup.find_all('b')[3]
print(tag)

<b another-attribute="1" id="verybold">Test 2</b>


In [19]:
print(tag.attrs)

{'another-attribute': '1', 'id': 'verybold'}


In [20]:
# These properties are mutable, and we can alter them
# in the following manner.
print(tag)
tag['another-attribute'] = 2
print(tag)

<b another-attribute="1" id="verybold">Test 2</b>
<b another-attribute="2" id="verybold">Test 2</b>


In [21]:
# We can also use Python's del command for lists to
# remove attributes:
del tag['id']
del tag['another-attribute']
print(tag)

<b>Test 2</b>


In [22]:
# Multi-valued Attributes
tag = soup.find_all('b')[3]
print(tag)
print(tag.string)


<b>Test 2</b>
Test 2


In [27]:
tag = soup.find_all('b')[3]
print(tag)
# We can use the "replace_with" function to replace
# the content of the string with something different:
tag.string.replace_with("This is another string")
print(tag)

<b another-attribute="1" id="verybold">Test 2</b>
<b another-attribute="1" id="verybold">This is another string</b>
