# Searching by `class`

Let's suppose we wanted to find all the tags that had the attribute `class="h2style"`. Unfortunately, in this case, we can't simply pass this attribute to the `.find_all()` method. The reason is that the **CSS** attribute, `class`, is a reserved word in Python. Therefore, using `class` as a keyword argument in the `.find_all()` method, will give you a syntax error. To get around this problem, BeautifulSoup has implemented the keyword `class_` (notice the underscore at the end) that can be used to search for the `class` attribute. Let's see how this works.

In the code below, we will use the `.find_all()` method to search for all the tags in our `sample.html` file that have the attribute `class="h2style"`:

In [1]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

# Open the HTML file and create a BeautifulSoup Object
with open('./sample.html') as f:
    page_content = BeautifulSoup(f, 'lxml')
    
# Print the tags that have the attribute class_ = 'h2style'
for tag in page_content.find_all(class_ = 'h2style'):
    print(tag)

<h2 class="h2style" id="hub">Student Hub</h2>
<h2 class="h2style" id="know">Knowledge</h2>


We can see that we get the two `<h2>` tags since they are the only ones in our document that have the attribute `class="h2style"`.

# TODO: Find All Tags With Attribute `class='section'`

In the cell below, use the `.find_all()` method to find all the tags in the `sample.html` file that have the attribute `class="section"`. Start by opening the `sample.html` file and passing the open filehandle to the BeautifulSoup constructor using the `lxml` parser. Save the BeautifulSoup object returned by the constructor in a variable called `page_content`. Then find all the tags that have the attribute `class="section"` from the `page_content` object. Loop through the list and print each tag in the list. Use the `.prettify()` method to improve readability.

In [2]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

# Open the HTML file and create a BeautifulSoup Object
with open('sample.html', 'r') as f:
    page_content = BeautifulSoup(f, 'lxml')
    
# Print the tags that have the attribute class_ = 'section'
for tag in page_content.find_all(class_='section'):
    print(tag.prettify())

<div class="section">
 <h2 class="h2style" id="hub">
  Student Hub
 </h2>
 <p>
  Student Hub is our real time collaboration platform where you can work with peers and mentors. You will find Community rooms with other students and alumni.
 </p>
</div>

<div class="section">
 <h2 class="h2style" id="know">
  Knowledge
 </h2>
 <p>
  Search or ask questions in
  <a href="https://knowledge.udacity.com/">
   Knowledge
  </a>
 </p>
</div>



# Searching With Regular Expressions

We can also pass a regular expression object to the `.find_all()` method. Let's see an example. The code below uses a regular expression to find all the tags whose names contain the letter `i`. Remember that in order to use regular expressions we must import the `re` module. In this particular example we will be only interested in printing the tag name and not its entire content. In order to do this, we will use the `.name` attribute of the `Tag` object to only print the name of tag itself, as shown below:

In [3]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

# Import the re module
import re 

# Open the HTML file and create a BeautifulSoup Object
with open('./sample.html') as f:
    page_content = BeautifulSoup(f, 'lxml')
    
# Print only the tag names of all the tags whose name contain the letter i
for tag in page_content.find_all(re.compile(r'i')):
    print(tag.name)

title
link
div
div
div


# TODO: Find All Tags The Start With The Letter `h`

In the cell below, pass a regular expression to the `.find_all()` method to find all the tags whose names start with the letter `h`. Start by opening the `sample.html` file and passing the open filehandle to the BeautifulSoup constructor using the `lxml` parser. Save the BeautifulSoup object returned by the constructor in a variable called `page_content`. Then find all the tags whose names start with the letter `h` by passing a regular expression to the `.find_all()` method. Loop through the list and print each tag in the list.

In [4]:
# Import BeautifulSoup
from bs4 import BeautifulSoup

# Import the re module
import re

# Open the HTML file and create a BeautifulSoup Object
with open('sample.html', 'r') as f:
    page_content = BeautifulSoup(f, 'lxml')
    
# Print only the tag names of all the tags whose names start with the letter h
for tag in page_content.find_all(re.compile(r'h')):
    print(tag.prettify())


<html lang="en-US">
 <head>
  <title>
   AI For Trading
  </title>
  <meta charset="utf-8"/>
  <link href="./teststyle.css" rel="stylesheet"/>
  <style>
   .h2style {background-color: tomato;color: white;padding: 10px;}
  </style>
 </head>
 <body>
  <h1 id="intro">
   Get Help From Peers and Mentors
  </h1>
  <div class="section">
   <h2 class="h2style" id="hub">
    Student Hub
   </h2>
   <p>
    Student Hub is our real time collaboration platform where you can work with peers and mentors. You will find Community rooms with other students and alumni.
   </p>
  </div>
  <hr/>
  <div class="section">
   <h2 class="h2style" id="know">
    Knowledge
   </h2>
   <p>
    Search or ask questions in
    <a href="https://knowledge.udacity.com/">
     Knowledge
    </a>
   </p>
  </div>
  <div class="outro">
   <h3 id="know">
    Good Luck
   </h3>
   <p>
    Good luck and we hope you enjoy the course
   </p>
  </div>
 </body>
</html>
<head>
 <title>
  AI For Trading
 </title>
 <meta charset="

# Solution

[Solution notebook](searching_by_class_and_regexes_solution.ipynb)