# Kinds of objects

Beautiful Soup transforms a complex HTML document into a complex tree of Python objects. But you’ll only ever have to deal with about four *kinds* of objects: `Tag`, `NavigableString`, `BeautifulSoup`, and `Comment`.

## Tag

A `Tag` object corresponds to an `XML` or `HTML` tag in the original document:

In [2]:
from bs4 import BeautifulSoup

soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'lxml')
tag = soup.b

print(type(tag))

<class 'bs4.element.Tag'>


## Name

Every tag has a name, accessible as `.name`:

In [3]:
tag.name

'b'

If you change a tag's name, the change will reflect in any HTML markup generate by Beautiful Soup:

In [4]:
tag.name = 'blockquote'
tag

<blockquote class="boldest">Extremely bold</blockquote>

# Attributes

A tag may have any numbers of attributes. The tag `<b class="boldest">` has an attribute “id” whose value is “boldest”. You can access a tag’s attributes by treating the tag like a dictionary:

In [5]:
tag['class']

['boldest']

You can access that dictionary as `.attrs`:

In [6]:
tag.attrs

{'class': ['boldest']}

You can add, remove, and modify a tag’s attributes. Again, this is done by treating the tag as a dictionary:

In [7]:
tag['id'] = 'verybold'
tag['another-attribute'] = 1
tag

<blockquote another-attribute="1" class="boldest" id="verybold">Extremely bold</blockquote>

In [8]:
del tag['id']
del tag['another-attribute']
tag

<blockquote class="boldest">Extremely bold</blockquote>

In [9]:
tag['id']

KeyError: 'id'

In [10]:
print(tag.get('id'))

None


## Multi-valued attributes

HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is `class` (that is, a tag can have more than one CSS class). Others include `rel`, `rev`, `accept-charset`, `headers`, and `accesskey`. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:

In [11]:
css_soup = BeautifulSoup('<p class="body"><p>', 'lxml')
css_soup.p['class']

['body']

In [12]:
css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'lxml')
css_soup.p['class']

['body', 'strikeout']

## Navegation String

A string corresponds to a bit of text within a tag. Beautiful Soup uses the `NavigableString` class to contain these bits of text:

In [15]:
tag.string

'Extremely bold'

In [16]:
type(tag.string)

bs4.element.NavigableString

A `NavigableString` is just like a Python Unicode string, excpet that it also supports some of the features described in Navigation the tree and Searching the tree. You can convert a `Navigable String` to a Unicode string with `unicode()`:

In [19]:
str(tag.string)

'Extremely bold'

In [20]:
type(str(tag.string))

str

## String

`Tag`, `NavigableString`, and `BeautifulSoup` cover almost everything you'll see in an HTML or XML file, but there are a few leftwover bits. The only one you'll probably ever need to worry about is the comment:

In [21]:
markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>"
soup = BeautifulSoup(markup, lxml)
comment = soup.b.string
type(comment)



 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml")

  markup_type=markup_type))


bs4.element.Comment

The `comment` object is just a special type of `NavigableString`:

In [22]:
comment

'Hey, buddy. Want to buy a used parser?'