In [1]:
from bs4 import BeautifulSoup

In [4]:
with open('html_doc.html', 'r') as f:
    contents = f.read()
    soup = BeautifulSoup(contents, features="html.parser")

    for child in soup.descendants:
        if child.name:
            print(child.name)

html
head
title
body
p
b
p
a
a
a
p


In [5]:
with open('html_doc.html', 'r') as f:
    contents = f.read()
    soup1 = BeautifulSoup(contents, features="html.parser")

    for child in soup1.descendants:
        if child.name:
            print(child.name)

html
head
title
body
p
b
p
a
a
a
p


### Ways to navigate data structure in BeautifulSoup

In [6]:
# Get the title of the soup object
print(soup.title)
print(soup.title.name)
print(soup.title.string)
print(soup.title.parent)

<title>The Dormouse's story</title>
title
The Dormouse's story
<head><title>The Dormouse's story</title></head>


In [7]:
print(soup.p)   # The first paragraph
print(soup.p['class'])   # The value of the attribute of the first paragraph
print(soup.a)      #The first link

# NB: I use .tag as oppose to passing the tag as string into either find()/find_all() method.
# # THis is because, doing that is a shortcut way

<p class="title"><b>The Dormouse's story</b></p>
['title']
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>


In [8]:
# To  extract all the URLs/Links found within a page’s   use the  <a> tags:
for child in soup.find_all('a'):
    print(child.get('href'))

http://example.com/elsie
http://example.com/lacie
http://example.com/tillie


In [9]:
# To extract all text from a page use get_text() method.
print(soup.get_text())

The Dormouse's story

The Dormouse's story
Once upon a time there were three little sisters; and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well.
...



## Kind of Soup Objects
Beautiful Soup transforms a complex HTML document into a complex tree of Python objects. Four kinds of objects are most common:
* Tag
* NavigableString
* BeautifulSoup
* Comment

### TAG
* A tag object is the XML or HTML tage in the original document.
* It has a alot of attributes and measures.
* the most important featues of a tag are its name and attributes.
* You can change a tag’s name, such change will reflect in any markup generated by Beautiful Soup down the line.
### Attrs
A tag can have a number of attributes. 
* e.g  <b id="boldest"> has an tag 'b' and attribute “id” whose value is “boldest”. 
* You can access a tag’s attributes by treating the tag like a dictionary

In [10]:
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser')
tag = soup.b
type(tag)  #  b is a html tag for paragraph

bs4.element.Tag

In [11]:
print(f'Old tag name is {tag.name}') # will return the tag name 'b'
tag.name = "blockquote"   # Changing the tag name
print(f"The new tag name is {tag.name}")

Old tag name is b
The new tag name is blockquote


In [12]:
tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b
print(tag['id'] )  # This allow us to get the value of attribute 'id'
print(tag.attrs)   # Using .attrs allow us to get the dictionary of attributes.

boldest
{'id': 'boldest'}


### Handling Multi-Valued Attributes.
* In HTML 4, a few attributes can have multiple values.
* the most common multiple attribute is class, having more than one CSS class.
*  Others include rel, rev, accept-charset, headers, and accesskey.
* By default, Beautiful Soup will parses the value(s) of a multi-valued attribute into a list.

In [13]:
css_soup = BeautifulSoup('<p class="body"></p>', 'html.parser')
css_soup1 = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser')

print(f"Output from a multi-valued attribute{css_soup.p['class']}")
print(f"Output from a multi-valued attribute{css_soup1.p['class']}") # will return the values in a list

Output from a multi-valued attribute['body']
Output from a multi-valued attribute['body', 'strikeout']


In [14]:
# If a attribute looks like multi-value, but isn't one. Beautiful Soup will leave the attribute same way.
id_soup = BeautifulSoup('<p id="my id"></p>', 'html.parser')
id_soup.p['id']

'my id'

In [15]:
# 1. You can force all attributes to be parsed as strings
# by passing multi_valued_attributes=None as a keyword argument into the BeautifulSoup constructor:
no_list_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser', multi_valued_attributes=None)
no_list_soup.p['class']


'body strikeout'

In [16]:
#2. You can use get_attribute_list to always return value as a  list, whether or not it’s a multi-valued atribute:
id_soup.p.get_attribute_list('id')

['my id']

* If you parse a document as XML, there are no multi-valued attributes:
* Again, you can configure this using the multi_valued_attributes argument.

## NavigableString

* A string corresponds to a bit of text within a tag. BS uses the NavigableString class to contain these bits of text.
* They are  like a Python Unicode string, except that it also supports some of the features described in Navigating-the-tree and Searching-the-tree in BS documentation.
* You can’t edit a string in place, but you can replace one string with another, using replace_with()
* strings don’t support the .contents or .string attributes, or the find() method.
* If you want to use a NavigableString outside of Beautiful Soup, you should call unicode() on it to turn it into a normal Python Unicode string

In [17]:
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser')
print(soup.b.string)
type(soup.b.string) # Print the type of string

Extremely bold


bs4.element.NavigableString

In [18]:
soup.b.string.replace_with("No longer bold") # Changing the string  in the soup object
soup.b

<b class="boldest">No longer bold</b>

## Navigating the Tree
* Here, We will learn how to move from one part of a document to another. I will be using soup1 object, which is a document above three sisters I have created above. Recall that Tags may contain strings and other tags. These elements are the tag’s children.
*  Beautiful Soup provides a lot of different attributes for navigating and iterating over a tag’s children. Unlike  Beautiful Soup strings, which can’t have children, and thus don't support these attributes.

### A. GOING DOWN

#### 1. Navigating Using Tag names.

In [19]:
#1. The simplest way to navigate the parse tree is to say the name of the tag you want. e.g head/title/div etc
print(soup1.head)
print(soup1.title)

<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>


In [20]:
#Note that the code get the first instanse of the specified tag.
print(soup1.body.b) #This will find the b tag directly beneath the body tag.

<b>The Dormouse's story</b>


In [21]:
#Use find_all() method if you want to retrieve all the instances of a particular tag and not the first one alone.
soup1.find_all('a')

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

### 2. Navigating using .contents and .children
* All the children of  a specified tag are available in a list called .contents:

In [22]:
head_tag = soup1.head
print(head_tag)
print(head_tag.contents) # this will return the children of head tag in a list
head_tag.contents[0]  #return the first item in the list.

<head><title>The Dormouse's story</title></head>
[<title>The Dormouse's story</title>]


<title>The Dormouse's story</title>

In [23]:
# let check out the children of the title tag above
title_tag = head_tag.contents[0]
title_tag.contents

["The Dormouse's story"]

In [24]:
soup1.contents 
#The BeautifulSoup object itself has children. In this case, the <html> tag is the child of the BeautifulSoup object.

[<html><head><title>The Dormouse's story</title></head>
 <body>
 <p class="title"><b>The Dormouse's story</b></p>
 <p class="story">Once upon a time there were three little sisters; and their names were
 <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
 and they lived at the bottom of a well.</p>
 <p class="story">...</p>
 </body></html>]

#### 2b. Iterate over tag's children.
* Instead of getting them as a list, you can iterate over a tag’s children using the .children generator:

In [25]:
for child in title_tag.children:
    print(child)

The Dormouse's story


#### descendants.
* Unlike  .contents and .children attributes, which consider a tag’s direct children. .descendant will get the direct child's child.
* .descendants attribute lets you iterate over all of a tag’s children, recursively: its direct children, the children of its direct children, and so on

In [26]:
# head tag has a direct child, which is a title tag.
head_tag.contents

[<title>The Dormouse's story</title>]

In [27]:
#the title tag itself also has a child, which is a string.
# let see this using .descendant
for child in head_tag.descendants:
    print(child)


<title>The Dormouse's story</title>
The Dormouse's story


* The head tag has only one child. (title tag) and two descendants (title tag and title tag child)

#### 3. .string
* If a tag has only one child, and that child is a NavigableString, the child is made available as .string.
* If a tag’s only child is another tag, and that tag has a .string, then the parent tag is considered to have the same .string as its child:

In [28]:
print(head_tag.string)
title_tag.string

The Dormouse's story


"The Dormouse's story"

#### 4. .strings and stripped_strings
Use the .strings generator to look at just the strings in a tag that contain more than one thing.

In [29]:
for string in soup1.strings:
    print(string)

The Dormouse's story




The Dormouse's story


Once upon a time there were three little sisters; and their names were

Elsie
,

Lacie
 and

Tillie
;
and they lived at the bottom of a well.


...




In [30]:
#These strings tend to have a lot of extra whitespace,
# which you can remove by using the .stripped_strings generator instead:

for string in soup1.stripped_strings:
    print(string)

The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters; and their names were
Elsie
,
Lacie
and
Tillie
;
and they lived at the bottom of a well.
...


### B. GOING UP THE TREE.
* Every tag and every string has a parent: which is the tag that contains it.

#### 1. .parent
* The .parent attribute allow you to access an element's parent.
* In the example below, the head tag is the parent of the title tag:

In [31]:
title_tag = soup1.title
print(title_tag)
print(title_tag.parent)

<title>The Dormouse's story</title>
<head><title>The Dormouse's story</title></head>


In [32]:
#  The parent of a BeautifulSoup object will be None
print(soup1.parent)

None


#### 2. ,parents
Allows you to iterate over all the parents of an element

In [33]:
link = soup1.a
print (link)

for parent in link.parents:
    print(parent.name)

<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
p
body
html
[document]


### C. GOING SIDEWAYS.

In [34]:
#Let's consider this document
sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></a>", 'html.parser')
print(sibling_soup.prettify())

<a>
 <b>
  text1
 </b>
 <c>
  text2
 </c>
</a>


* The b tag and the c tag are at the same level: they’re both direct children of the same tag. We call them siblings.
* When a document is pretty-printed, siblings show up at the same indentation level.
* The strings “text1” and “text2” are not siblings, because they don’t have the same parent.

#### 1. .next_sibling and .previous_sibling.
* They allow you to navigate between page elements that are on the same level of the parse tree:

In [35]:
print(sibling_soup.b.next_sibling)
print(sibling_soup.c.previous_sibling)

<c>text2</c>
<b>text1</b>


* b tag has a .next_sibling, but no .previous_sibling, because there’s nothing before it on the same level of the tree. For the same reason, c tag has a .previous_sibling but no .next_sibling.

In [36]:
print(sibling_soup.b.previous_sibling)
print(sibling_soup.c.next_sibling)

None
None


#### 2, .next_siblings and .previous_siblings
Allow you to iterate over a tag siblings.

In [37]:
for sibling in soup1.a.next_siblings:
    print(repr(sibling))


',\n'
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
' and\n'
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
';\nand they lived at the bottom of a well.'


In [38]:
for sibling in soup1.find(id="link3").previous_siblings:
    print(repr(sibling))


' and\n'
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
',\n'
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
'Once upon a time there were three little sisters; and their names were\n'


### GOING BACK AND FORTH.

In [39]:
#Take a look at the beginning of the “three sisters” document:

# <html><head><title>The Dormouse's story</title></head>
# <p class="title"><b>The Dormouse's story</b></p>

* An HTML parser takes this string of characters and turns it into a series of events: “open an html tag”, “open a head tag”, “open a title tag”, “add a string”, “close the title tag”, “open a p tag”, and so on. 
* Beautiful Soup offers tools for reconstructing the initial parse of the document.

#### 1. next_element and .previous_element.
* The .next_element attribute of a string/tag points to whatever was parsed immediately afterwards. 
* It might be the same as .next_sibling, but it’s usually drastically different.

In [40]:
# Here’s the final <a> tag in the “three sisters” document. 
#Its .next_sibling is a string: the conclusion of the sentence that was interrupted by the start of the <a> tag.:

last_a_tag = soup1.find('a', id='link3')
print(last_a_tag)

print(f'The next sibling of the last a tag is---------- {last_a_tag.next_sibling}') 

print(f'The next element of the last a tag is---------  {last_a_tag.next_element}') 

<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
The next sibling of the last a tag is---------- ;
and they lived at the bottom of a well.
The next element of the last a tag is---------  Tillie


In [41]:
last_a_tag.previous_element

' and\n'

In [42]:
last_a_tag.previous_element.next_element

<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

#### 2. .next_elements and .previous_elements
* These iterators to move forward or backward in the document as it was parsed.

## Searching the Tree

There are a lot of methods for searching the parse tree in beautiful soup, but they’re all very similar.
We are going to stick with the two most popular methods: find() and find_all(). 
The other methods take almost exactly the same arguments, so I’ll just cover them briefly.
We will continue to use the “three sisters” document as an example:
* Tp zoom in the part of document you are interested in, you need to pass a filter argument to any of the methods. The filter can be a STRING, LIST, REGULAR EXPRESSION, A Function, or just TRUE.

#### A STRING 
* Pass a string to a search method and Beautiful Soup will perform a match against that exact string. 

In [43]:
# This code finds all the b tags in the document:
soup1.find_all('b')

[<b>The Dormouse's story</b>]

#### A REGULAR EXPRESSION

In [44]:
# This code finds all the tags whose names start with the letter “b”; in this case, the <body> tag and the <b> tag:
import re
for tag in soup1.find_all(re.compile('^b')):
    print(tag.name)

body
b


In [45]:
#This code finds all the tags whose names contain the letter ‘t’:

for tag in soup1.find_all(re.compile("t")):
    print(tag.name)

html
title


#### A LIST
* If you pass in a list, Beautiful Soup will allow a string match against any item in that list. 

In [46]:
# This code finds all the <a> tags and all the <b> tags
soup1.find_all(["a", "b"])

[<b>The Dormouse's story</b>,
 <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

#### True
The value True matches everything it can. This code finds all the tags in the document, but none of the text strings:

In [47]:
for tag in soup1.find_all(True):
    print(tag.name)

html
head
title
body
p
b
p
a
a
a
p


### Let take a look at the SEARCH METHOD in detail.

* find_all(name, attrs, recursive, string, limit, **kwargs)

* It looks through a tag’s descendants and retrieves all descendants that match your filters. 

In [48]:
soup1.find_all('title')

[<title>The Dormouse's story</title>]

In [49]:
soup1.find_all('p', 'title')

[<p class="title"><b>The Dormouse's story</b></p>]

In [50]:
soup1.find_all('a')

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

In [51]:
soup1.find_all(id='link2')

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

In [52]:
soup1.find(string=re.compile("sisters"))

'Once upon a time there were three little sisters; and their names were\n'

##### The name Argument
NB: The value of name arg can a string/function/regular expression/True

In [53]:
# Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names.
# Text strings will be ignored, as will tags whose names that don’t match.
soup1.find_all('title')

[<title>The Dormouse's story</title>]

##### Keyword Argument
Any argument that’s not recognized will be turned into a filter on one of a tag’s attributes.


In [54]:
#  If you pass in a value for an argument called id, Beautiful Soup will filter against each tag’s ‘id’ attribute:
soup1.find_all(id='link2')

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

In [55]:
#If you pass in a value for href, Beautiful Soup will filter against each tag’s ‘href’ attribute:

soup1.find_all(href=re.compile("elsie"))

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

In [56]:
#You can filter an attribute based on a string/a regular expression/list/unction, or the value True.
#This code finds all tags whose id attribute has a value, regardless of what the value is:

soup1.find_all(id=True)

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

In [57]:
#filtering multiple attributes at once by passing in more than one keyword argument:

soup1.find_all(href=re.compile("elsie"), id='link1')


[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

## Searching By CSS

We can also search for tag with certain CSS class, but the name of the CSS attribute, “class”, is a reserved word in Python. 
Therefore to avoind syntax error, you can search by CSS class using the keyword argument class_:

In [59]:
soup1.find_all("a", class_="sister")

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

In [61]:
# As with any keyword argument, you can pass class_ a string/regular expression/function/True
soup1.find_all(class_=re.compile("itl"))

[<p class="title"><b>The Dormouse's story</b></p>]

In [62]:
#passing a function
def has_six_characters(css_class):
    return css_class is not None and len(css_class) == 6

soup1.find_all(class_=has_six_characters)


[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

* Asingle tag can have multiple values for its “class” attribute. 
* Thus, When you search for a tag that matches a certain CSS class, you’re matching against any of its CSS classes:

In [63]:
css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser')
css_soup.find_all("p", class_="strikeout")

[<p class="body strikeout"></p>]

In [64]:
css_soup.find_all('p', class_='body')

[<p class="body strikeout"></p>]

In [65]:
# You can also search for the exact string value of the class attribute
css_soup.find_all('p', class_="body strikeout")

[<p class="body strikeout"></p>]

*  NB: When search for the exact string, variant of the string won't work. It has to be in order

In [68]:
# To address the problem of variant as discussed above. 
# If you want to search for tags that match two or more CSS classes, you should use a CSS selector:
print(css_soup.select("p.strikeout.body"))
css_soup.select("p.body.strikeout")


[<p class="body strikeout"></p>]


[<p class="body strikeout"></p>]

* In older versions of Beautiful Soup, which don’t have the class_ shortcut, you can use the attrs trick mentioned above. 
* Create a dictionary whose value for “class” is the string (or regular expression, or whatever) you want to search for

In [69]:
soup1.find_all("a", attrs={"class": "sister"})


[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

## The string Argument
* The String argument allow us to search for a string instead of a particular tag
* ANd just like the name and the keyword arguments, you can pass in a string/regular expression/list/function/the value True

In [70]:
soup1.find_all(string="Elsie")

['Elsie']

In [72]:
soup1.find_all(string=["Tillie", "Elsie", "Lacie"])

['Elsie', 'Lacie', 'Tillie']

In [73]:
soup1.find_all(string=re.compile("Dormouse"))

["The Dormouse's story", "The Dormouse's story"]

In [74]:
soup1.find_all(string=True)

["The Dormouse's story",
 '\n',
 '\n',
 "The Dormouse's story",
 '\n',
 'Once upon a time there were three little sisters; and their names were\n',
 'Elsie',
 ',\n',
 'Lacie',
 ' and\n',
 'Tillie',
 ';\nand they lived at the bottom of a well.',
 '\n',
 '...',
 '\n']

## The limit Argument
* it limit the output from the search method. Especially when the documenis large and you don't need all the results.
* You can pass in a number for limit just like in structured query language.

In [75]:
soup1.find_all("a", limit=2)

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

## The Recursive Argument

* Setting recursive to False limit the soup to looking only for the direct sibling, and not the children or children's children.

In [76]:
# if the tag html has any title tag descendant, it will be retured as a list
soup1.html.find_all("title")

[<title>The Dormouse's story</title>]

In [77]:
# An empty list will be retured as the html tag don't have any direct child with title tag
soup1.html.find_all("title", recursive=False)

[]

* Of all beatiful soup search methods, only find() and find_all() support the recursive argument.

## Search Methgods commonly use with the parsed tree.

#### 1. Calling a tag is like calling find_all()
* Since find_all() is the most popular method in the Beautiful Soup search API, we can use a shortcut for it. 
* If you treat the BeautifulSoup object or a Tag object as though it were a function, then it’s the same as calling find_all() on that object. 

In [80]:
#These two lines of code are equivalent:
soup1.find_all('a') == soup1('a')

True

#### 2. find() and find_all()
* find() has same arguments as find_all().
* find() scan the entire document and return the first instance of the searched tag or string
* find_all() returns a list containing the single result, and find() just returns the result.
* using find_all() with a limit of 1 is same as using find() search method.
* If find_all() can’t find anything, it returns an empty list. If find() can’t find anything, it returns None:


#### 3. find_parents() and find_parent()
* same argument as find_all() and find() method.
* find_all() and find() work their way down the tree, looking at tag’s descendants. 
* These methods do the opposite: they work their way up the tree, looking at a tag’s/string’s parents. 

In [82]:
a_string = soup1.find(string="Lacie")
a_string

'Lacie'

In [83]:
a_string.find_parents("a")

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

In [86]:
a_string.find_parent("p")

<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

#### 4.find_next_siblings() and find_next_sibling()
* to iterate over the rest of an element’s siblings in the tree.
* Tfind_next_siblings() method returns all the siblings that match, and find_next_sibling() only returns the first one.

In [88]:
first_link = soup1.a
first_link

<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

In [89]:
first_link.find_next_siblings('a')

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

#### 5. find_all_next() and find_next()
* These methods use .next_elements to iterate over whatever tags and strings that come after it in the document. 
* find_all_next() method returns all matches, and find_next() only returns the first match:

In [90]:
first_link = soup1.a
first_link

<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

In [91]:
first_link.find_all_next(string=True)

['Elsie',
 ',\n',
 'Lacie',
 ' and\n',
 'Tillie',
 ';\nand they lived at the bottom of a well.',
 '\n',
 '...',
 '\n']

## CSS selectors through the .css property
BeautifulSoup and Tag objects support CSS selectors through their .css property

In [116]:
#You can find tags:
soup1.select('title')

[<title>The Dormouse's story</title>]

In [117]:
#Find tags beneath other tags:
soup1.select("body a") #here we find all a tags below the body tag

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

In [118]:
soup1.select("html head title") # Find the title tag below the head tag below the html tag

[<title>The Dormouse's story</title>]

In [99]:
soup1.select("p:nth-of-type(3)")

[<p class="story">...</p>]

##### Find tag directly beneath other tag

In [119]:

soup1.select("head > title")

[<title>The Dormouse's story</title>]

In [120]:
soup1.select("head > title") == soup1.select("head title")

True

* In the case of the current documenting we are using as example.
* title is directly under head tag, and it is also a tag under head tag. Hence it will return True. 

In [101]:
soup1.select("p > a")

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

In [103]:
soup1.select("p > a:nth-of-type(2)")

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

In [104]:
soup1.select("body > a")

[]

#### Finding siblings of tags

In [105]:
soup1.select("#link1 ~ .sister")

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

In [106]:
soup1.select("#link1 + .sister")

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

#### Find tags by CSS class

In [111]:
soup1.css.select(".sister")

AttributeError: 'NoneType' object has no attribute 'select'

## Advanced Soup Sieve features.
* Soup Sieve offers a substantial API beyond the select() and select_one() methods