# Searching The Parse Tree

BeautifulSoup provides a number of methods for searching the parse tree, but we will only cover the `.find_all()` method in this lesson. You can learn about other search methods in the [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).

The `.find_all(filter)` method will search an entire document for the given `filter`. The `filter` can be a string containing the HTML or XML tag name, a tag attribute, or even a regular expression. In this notebook we will see examples of these cases. 

So let's begin by using the `.find_all()` method to find all `<h2>` tags in our `sample.html` file. To do this, we will pass the string `'h2'` to the `.find_all()` method as shown in the code below:

In [1]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

# Open the HTML file and create a BeautifulSoup Object
with open('./sample.html') as f:
    page_content = BeautifulSoup(f, 'lxml')

# Find all the h2 tags
h2_list = page_content.find_all('h2')

# Print the h2_list
print(h2_list)

[<h2 class="h2style" id="hub">Student Hub</h2>, <h2 class="h2style" id="know">Knowledge</h2>]


As we can see, the `.find_all()` method returns a list with all the `<h2>` tags. Since lists are iterables, we can loop through the `h2_list` to print each tag, as shown below:

In [2]:
# Print each tag in the h2_list
for tag in h2_list:
    print(tag)

<h2 class="h2style" id="hub">Student Hub</h2>
<h2 class="h2style" id="know">Knowledge</h2>


# TODO: Find All The `<p>` Tags

In the cell below, use the `.find_all()` method to find all the `<p>` tags in the `sample.html` file. Start by opening the `sample.html` file and passing the open filehandle to the BeautifulSoup constructor using the `lxml` parser. Save the BeautifulSoup object returned by the constructor in a variable called `page_content`. Then use the `.find_all()` method to find all the `<p>` tags from the `page_content` object. Save the list returned by the `.find_all()` method in a variable called `p_list`. Finally, loop through the list and print each tag in the list. Since the `<p>` tags contain subtags, use the `.prettify()` method to improve readability.

In [None]:
# Import BeautifulSoup


# Open the HTML file and create a BeautifulSoup Object

page_content = 

# Find all the p tags
p_list = 

# Print each tag in the p_list


## Searching For Multiple Tags

We can also search for more than one tag at a time by passing a list to the `.find_all()` method. Let's see an example.

Let's suppose we wanted to search for all the `<h2>` and `<p>` tags in our `sample.html` file. Instead of using two statements (one for each tag):

```python
h2_list = page_content.find_all('h2')
p_list = page_content.find_all('p')
```

we can just pass the list `['h2', 'p']` to the `.find_all()` method, as shown in the code below:

In [4]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

# Open the HTML file and create a BeautifulSoup Object
with open('./sample.html') as f:
    page_content = BeautifulSoup(f, 'lxml')
    
# Print all the h2 and p tags
for tag in page_content.find_all(['h2', 'p']):
    print(tag.prettify())

<h2 class="h2style" id="hub">
 Student Hub
</h2>

<p>
 Student Hub is our real time collaboration platform where you can work with peers and mentors. You will find Community rooms with other students and alumni.
</p>

<h2 class="h2style" id="know">
 Knowledge
</h2>

<p>
 Search or ask questions in
 <a href="https://knowledge.udacity.com/">
  Knowledge
 </a>
</p>

<p>
 Good luck and we hope you enjoy the course
</p>



We can see that we get all the `<h2>` and `<p>` tags in our file.

# TODO: Find All The `<a>` and `<link>` Tags

In the cell below, use the `.find_all()` method to find all the `<a>` and `<link>` tags in the `sample.html` file. Start by opening the `sample.html` file and passing the open filehandle to the BeautifulSoup constructor using the `lxml` parser. Save the BeautifulSoup object returned by the constructor in a variable called `page_content`. Then find all the `<a>` and`<link>` tags from the `page_content` object by passing a list to the `.find_all()` method. Loop through the list and print each tag in the list. Use the `.prettify()` method to improve readability.

In [None]:
# Import BeautifulSoup


# Open the HTML file and create a BeautifulSoup Object

page_content = 
    
# Print all the a and link tags


## Searching For Tags With Particular Attributes

The `.find_all()` method also allows us to pass some arguments, such as the attribute of a tag, so that we can search the entire document for the exact tag we want. For example, in our `sample.html` file, we have two `<h2>` tags:

1. `<h2 class="h2style" id="hub">Student Hub</h2>`

2. `<h2 class="h2style" id="know">Knowledge</h2>`

We can see that the first `<h2>` tag has the attribute `id="hub"`, while the second `<h2>` tag has the attribute `id="know"`. Let's suppose, we only wanted to search our `sample.html` document for the `<h2>` tag that had `id="know"`. To do this, we will add the `id` attribute to the `.find_all()` method as shown below:

In [6]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

# Open the HTML file and create a BeautifulSoup Object
with open('./sample.html') as f:
    page_content = BeautifulSoup(f, 'lxml')

# Find the h2 tag with id = know
h2_know = page_content.find_all('h2', id = 'know')

# Print each item in the h2_know
for tag in h2_know:
    print(tag)

<h2 class="h2style" id="know">Knowledge</h2>


We can see, the list returned by the `.find_all()` method only has one element, and it corresponds to the `<h2>` tag that has `id="know"`.

# TODO: Find All The `<h1>` Tags With The Attribute `id='intro'`

In the cell below, use the `.find_all()` method to find all the `<h1>` tags in the `sample.html` file that have the attribute `id="intro"`. Start by opening the `sample.html` file and passing the open filehandle to the BeautifulSoup constructor using the `lxml` parser. Save the BeautifulSoup object returned by the constructor in a variable called `page_content`. Then find all the `<h1>` tags that have the attribute `id="intro"` from the `page_content` object. Loop through the list and print each tag in the list.

In [None]:
# Import BeautifulSoup


# Open the HTML file and create a BeautifulSoup Object

page_content = 

# Print all the h1 tags with id = intro


## Searching For Attributes Directly

The `.find_all()` method also allows us to search for tag attributes directly. For example, we can search for all the tags that have the attribute `id="intro"` by only passing this attribute to the `.find_all()` method, as shown below:

In [8]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

# Open the HTML file and create a BeautifulSoup Object
with open('./sample.html') as f:
    page_content = BeautifulSoup(f, 'lxml')

# Print all the tags with id = intro
for tag in page_content.find_all(id = 'intro'):
    print(tag)

<h1 id="intro">Get Help From Peers and Mentors</h1>


We can see that we only get one tag, since the `<h1>` tag is the only tag in our document that has the attribute `id="intro"`.

# TODO: Find All Tags With Attribute `id='hub'`

In the cell below, use the `.find_all()` method to find all the tags in the `sample.html` file that have the attribute `id="hub"`. Start by opening the `sample.html` file and passing the open filehandle to the BeautifulSoup constructor using the `lxml` parser. Save the BeautifulSoup object returned by the constructor in a variable called `page_content`. Then find all the tags that have the attribute `id="hub"` from the `page_content` object. Loop through the list and print each tag in the list.

In [None]:
# Import BeautifulSoup


# Open the HTML file and create a BeautifulSoup Object

page_content = 

# Print all the tags with id = hub


# Solution

[Solution notebook](searching_the_parse_tree_solution.ipynb)