# Part 3: Parse HTML Code With Beautiful Soup

- Find Elements by ID
- Find Elements by HTML Class Name
- Extract Text From HTML Elements
- Extract Attributes From HTML Elements

## ⚠️ Durabilty Warning ⚠️

Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.

Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.

All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically.

In [None]:
# scrape the site
import requests

url = "https://realpython.github.io/fake-jobs/"
response = requests.get(url)

After scraping the HTML content, you continue working to pick out the info you need.

In [None]:
from bs4 import BeautifulSoup

In [None]:
soup = BeautifulSoup(response.content)

In [None]:
soup

What a soup!!! 🍜 Let's be picky and thin it out.

## Find Elements By ID

`id` attributes uniquely identify HTML elements. Let's find one we need with Developer Tools!

In [None]:
results = soup.find(id="ResultsContainer")

In [None]:
results

Better, but let's drill down some more

## Find Elements By Class Name

The job postings all have the same HTML `class`. Let's find all that are on this page.

In [None]:
jobs = results.find_all("div", class_="card-content")

In [None]:
len(jobs)  # how many?

In [None]:
jobs[0]  # let's check out just one of them

## Extract Text From HTML Elements

Next, let's target a specific text from the site and extract it from the surrounding HTML

In [None]:
title_element = jobs[0].find("h2")
title_element

In [None]:
title = title_element.text
title

In [None]:
# clean it up - not necessary here, but often helpful to remove whitespace
title.strip()

And now for all jobs, in a concise list comprehension:

In [None]:
job_titles = [job.find("h2").text.strip() for job in jobs]

In [None]:
job_titles

## Extract Attributes From HTML Elements

Apart from text content, HTML attributes can contain important information you want to parse, for example the URL where a link points to. Let's learn how to extract them.

In [None]:
apply_link = jobs[0].find("a", text="Apply")
apply_link

In [None]:
job_url = apply_link["href"]
job_url

With this, you are now able to access the specifc job posting, for example by using `requests` again:

In [None]:
job_site = requests.get(job_url)
job_soup = BeautifulSoup(job_site.content)

In [None]:
job_soup.text

You could set up a pipeline that follows the job posting details links and fetches the more detailed job description from there. You could set up some parameters by which to highlight or discard listings that contain certain key phrases.

There's a lot you can do to customize this automated job search script to your own specific interests.