In [1]:
pip install icecream

Note: you may need to restart the kernel to use updated packages.


### We'll learn to some basic scraping techniques using this mock site <a href="https://sandeepmj.github.io/scrape-example-page/demo-text.html">demo page</a>. 

The webpage is ```https://sandeepmj.github.io/scrape-example-page/demo-text.html```

### All web scraping requires a little sleuthing:

* Where and how is the content held on the page?
* How can we access it?
* Is there a pattern?
* Is there anything that breaks the pattern?

In [2]:
## import library
from bs4 import BeautifulSoup ## package to parse HTML and XML
import icecream as ic ## for debugging
import requests ## The most widely downloaded package - captures content from web


In [8]:
## Requesting web content

##scrape url website
url = "https://sandeepmj.github.io/scrape-example-page/demo-text.html"
response = requests.get(url)

In [9]:
## did it work?
response.status_code

200

In [10]:
## what type of object did we capture?
type(response)

requests.models.Response

## Pull out what we want by using

- ```response.text``` for string content like HTML, XML etc.
- ```response.content``` for binary content like PDFs, images, etc.

In [11]:
## what object does it return
type(response.text)

str

## Create a BeautifulSoup object
<img src="">

In [12]:
## we add name of our file
soup = BeautifulSoup(response.text, "html.parser")
type(soup)

bs4.BeautifulSoup

In [13]:
soup

<!DOCTYPE html>

<html lang="en">
<head>
<title>title tag</title>
<style>
body {padding: 20px; max-width: 700px; margin: 0 auto;}
</style>
</head>
<body>
<h1 class="title"><b>The title headline is Demo for BeautifulSoup</b></h1>
<p>Learning to scrape using BeautifulSoup.</p>
<div class="content article">
<section>
<p>Here's some pretty useless info:</p>
</section>
<section class="main" id="all_plants">
<h2 class="subhead" id="vegitation">Plants</h2>
<p class="article">Three plants that thrive in deep shade:</p>
<ol>
<li><a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>: <span class="cost">$10</span></li>
<li><a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>: <span class="cost">$20</span></li>
<li><a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a> <span class="cost">$30</span></li>
</ol>
</section>
<section class="main" id="all_animals">
<h2 class="subhead" id="creatures">Animals</h2>
<p class="arti

In [14]:
## prettify our printout
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <title>
   title tag
  </title>
  <style>
   body {padding: 20px; max-width: 700px; margin: 0 auto;}
  </style>
 </head>
 <body>
  <h1 class="title">
   <b>
    The title headline is Demo for BeautifulSoup
   </b>
  </h1>
  <p>
   Learning to scrape using BeautifulSoup.
  </p>
  <div class="content article">
   <section>
    <p>
     Here's some pretty useless info:
    </p>
   </section>
   <section class="main" id="all_plants">
    <h2 class="subhead" id="vegitation">
     Plants
    </h2>
    <p class="article">
     Three plants that thrive in deep shade:
    </p>
    <ol>
     <li>
      <a class="plants life" href="http://example.com/plant1" id="plant1">
       Plant 1
      </a>
      :
      <span class="cost">
       $10
      </span>
     </li>
     <li>
      <a class="plants life" href="http://example.com/plant2" id="plant2">
       Plant 2
      </a>
      :
      <span class="cost">
       $20
      </span>
     </li>
     <li>
 

In [15]:
## What type of file is it?
type(soup)

bs4.BeautifulSoup

In [16]:
## get title of page
soup.title

<title>title tag</title>

In [17]:
## What about the h1 tag with the class of title? 
## How can we have two titles?

soup.h1

<h1 class="title"><b>The title headline is Demo for BeautifulSoup</b></h1>

### string v. get_text()

In most cases, our final step in a scrape is to convert everything to a string. We don't want all the html. 

We can use ```.string``` or ```get_text().```

- ```get_text()``` is far more powerful because you can add parameters to strip, specify separators, etc.

I **only** use ```get_text()```.


In [18]:
## return just a string of the tag:
soup.title.string

'title tag'

In [19]:
## get only title text and not html
soup.title.get_text()

'title tag'

In [21]:
## use string on soup (returns nothing)
print(soup.string)

None


In [22]:
## get text from soup
soup.get_text()

"\n\n\ntitle tag\n\n\n\nThe title headline is Demo for BeautifulSoup\nLearning to scrape using BeautifulSoup.\n\n\nHere's some pretty useless info:\n\n\nPlants\nThree plants that thrive in deep shade:\n\nPlant 1: $10\nPlant 2: $20\nPlant 3 $30\n\n\n\nAnimals\n Three animals in the barn:\n\nAnimal 1: $500\nAnimal 2: $600 \nAnimal 3: $700\n\n\n\nObjects\n Three shiny rocks:\n\nRock 1\nRock 2\nRock 3\n\n\n\nThe seven classifications of animals\n\nKingdom\nPhylum\nClass\nOrder\nFamily\nGenus\nSpecies\n\n\n\n\n\n"

In [23]:
## get rid of weird characters
soup.get_text(strip="True")

"title tagThe title headline is Demo for BeautifulSoupLearning to scrape using BeautifulSoup.Here's some pretty useless info:PlantsThree plants that thrive in deep shade:Plant 1:$10Plant 2:$20Plant 3$30AnimalsThree animals in the barn:Animal 1:$500Animal 2:$600Animal 3:$700ObjectsThree shiny rocks:Rock 1Rock 2Rock 3The seven classifications of animalsKingdomPhylumClassOrderFamilyGenusSpecies"

In [24]:
## get p tag text
soup.p.get_text()

'Learning to scrape using BeautifulSoup.'

# Targeting content



## Searching for IDs

```soup(id="ID_name")```

In [25]:
## SEARCH BY ID for "all_plants"
soup(id="all_plants")

[<section class="main" id="all_plants">
 <h2 class="subhead" id="vegitation">Plants</h2>
 <p class="article">Three plants that thrive in deep shade:</p>
 <ol>
 <li><a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>: <span class="cost">$10</span></li>
 <li><a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>: <span class="cost">$20</span></li>
 <li><a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a> <span class="cost">$30</span></li>
 </ol>
 </section>]

In [32]:
type(soup(id="all_plants"))

bs4.element.ResultSet

In [33]:
soup(id="all_plants")[0]

<section class="main" id="all_plants">
<h2 class="subhead" id="vegitation">Plants</h2>
<p class="article">Three plants that thrive in deep shade:</p>
<ol>
<li><a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>: <span class="cost">$10</span></li>
<li><a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>: <span class="cost">$20</span></li>
<li><a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a> <span class="cost">$30</span></li>
</ol>
</section>

In [27]:
## SEARCH BY ID for "vegitation"
soup(id="vegitation")

[<h2 class="subhead" id="vegitation">Plants</h2>]

In [28]:
## SEARCH BY ID for "plant1"
soup(id="plant1")

[<a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>]

In [31]:
## SEARCH BY ID for "animal3"
soup(id="animal3")

[<a class="animals life" href="http://example.com/animal3" id="animal3">Animal 3</a>]

## Finding ```class```

Let's say we want to find the ```p tag``` content for the ```article class``` 

```find()``` returns the first occurence of any item you are searching for.

There are three ways to target our content but only Method 3 is the correct way




In [34]:
## a wide net is not best
soup.p

<p>Learning to scrape using BeautifulSoup.</p>

### Method 1. Target the tag only.

```soup.find("tag_name")```


In [35]:
## simple but without precision
## still too wide a net
soup.find("p")

<p>Learning to scrape using BeautifulSoup.</p>

### Method 2. Target the class only




- Use ```soup.find(class_="class_name"``` to be clear what class we are looking for.
- ```class_``` is not Python or BeautifulSoup. It is simply there to tell us we are looking for a ```class```. Because ```class``` (a type of data) is a Python reserved word, we add the ```_``` to tell us we are referring to an ```HTML class```.


In [36]:
# find the first p tag with the class "article"
## this is still too wide
soup.find(class_="article")

<div class="content article">
<section>
<p>Here's some pretty useless info:</p>
</section>
<section class="main" id="all_plants">
<h2 class="subhead" id="vegitation">Plants</h2>
<p class="article">Three plants that thrive in deep shade:</p>
<ol>
<li><a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>: <span class="cost">$10</span></li>
<li><a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>: <span class="cost">$20</span></li>
<li><a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a> <span class="cost">$30</span></li>
</ol>
</section>
<section class="main" id="all_animals">
<h2 class="subhead" id="creatures">Animals</h2>
<p class="article"> Three animals in the barn:</p>
<ol>
<li><a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>: <span class="cost">$500</span></li>
<li><a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>: <span class="cost">$

### Method 3. Precision, clarity and simplicity

In the previous example, we could have run into trouble in case the ```class = "article"``` applied to multiple tags.

- Use the ```tag``` and the ```class``` to add precision, clarity and simplicity.

```soup.find("tag_name", class_="class_name")```

In [37]:
# find the first p tag with the class "article"
soup.find("p", class_="article")

<p class="article">Three plants that thrive in deep shade:</p>

## ```find_all``` tags, classes

- ```find_all``` is **the most widely** used BeautifulSoup command.
- Unlike ```find``` it returns **ALL** occurences of a class or tag.
- Remember ```find``` returns just the first occurence.
- ```soup.find_all("tag_name", class_="class_name")```
- It returns all occurences in a **```beautifulSoup object```** that is similiar to a **```list```**.

In [38]:
## Return all p tag content with the class "article"
soup.find_all("p", class_="article")

[<p class="article">Three plants that thrive in deep shade:</p>,
 <p class="article"> Three animals in the barn:</p>,
 <p class="article"> Three shiny rocks:</p>]

In [39]:
## what type of object is returned
type(soup.find_all("p", class_="article"))

bs4.element.ResultSet

In [40]:
## Return all all content in the sections with the main class
soup.find_all("section", class_="main")

[<section class="main" id="all_plants">
 <h2 class="subhead" id="vegitation">Plants</h2>
 <p class="article">Three plants that thrive in deep shade:</p>
 <ol>
 <li><a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>: <span class="cost">$10</span></li>
 <li><a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>: <span class="cost">$20</span></li>
 <li><a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a> <span class="cost">$30</span></li>
 </ol>
 </section>,
 <section class="main" id="all_animals">
 <h2 class="subhead" id="creatures">Animals</h2>
 <p class="article"> Three animals in the barn:</p>
 <ol>
 <li><a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>: <span class="cost">$500</span></li>
 <li><a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>: <span class="cost">$600</span> </li>
 <li><a class="animals life" href="http://example.com/anim

In [41]:
## how many items are in this object
len(soup.find_all("section", class_="main"))

2

In [45]:
for item in soup.find_all("section", class_="main"):
    print(item)
    print("*********")

<section class="main" id="all_plants">
<h2 class="subhead" id="vegitation">Plants</h2>
<p class="article">Three plants that thrive in deep shade:</p>
<ol>
<li><a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>: <span class="cost">$10</span></li>
<li><a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>: <span class="cost">$20</span></li>
<li><a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a> <span class="cost">$30</span></li>
</ol>
</section>
*********
<section class="main" id="all_animals">
<h2 class="subhead" id="creatures">Animals</h2>
<p class="article"> Three animals in the barn:</p>
<ol>
<li><a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>: <span class="cost">$500</span></li>
<li><a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>: <span class="cost">$600</span> </li>
<li><a class="animals life" href="http://example.com/animal3" id

In [48]:
## how many items are there if we targeted only the tag "section"
len(soup.find_all("section"))

5

### Find all life forms on the page

In [49]:
## code it here
soup.find_all("a", class_="life")

[<a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>,
 <a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>,
 <a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a>,
 <a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>,
 <a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>,
 <a class="animals life" href="http://example.com/animal3" id="animal3">Animal 3</a>]

## The old ways

Earlier versions of BeautifulSoup did not use the ```class_``` notation. They used:

```soup.find_all("tag", {"class": "class_name"})```

and ever older way:

```soup.find_all("tag", attrs={"class": "class_name"})```

FYI since you might still encounter these in your stacking.

To recap, the most current/modern way is:

```soup.find_all("tag_name", class_="class_name")```

# Excluding classes

Most modern sites have tags that include multiple classes. 

What if you want to target a tag with a single class but that class also appears in tags with others that holds other types of content.

For example, target the ```animals``` class tag that does not also have the ```life``` class.

In this case we use ```.select``` which looks for that tag by itself.

```soup.select('[class="class_name"]')```


In [51]:
'''
if use find_all to look for the class animals
it turns all animals class, along with life class
'''

soup.find_all("a", class_="animals")

[<a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>,
 <a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>,
 <a class="animals life" href="http://example.com/animal3" id="animal3">Animal 3</a>,
 <a class="animals" href="http://example.com/kingdom">Kingdom</a>,
 <a class="animals" href="http://example.com/phylum">Phylum</a>,
 <a class="animals" href="http://example.com/class">Class</a>,
 <a class="animals" href="http://example.com/order">Order</a>,
 <a class="animals" href="http://example.com/family">Family</a>,
 <a class="animals" href="http://example.com/genus">Genus</a>,
 <a class="animals" href="http://example.com/species">Species</a>]

In [54]:
## write code here

soup.select('[class="animals"]')

[<a class="animals" href="http://example.com/kingdom">Kingdom</a>,
 <a class="animals" href="http://example.com/phylum">Phylum</a>,
 <a class="animals" href="http://example.com/class">Class</a>,
 <a class="animals" href="http://example.com/order">Order</a>,
 <a class="animals" href="http://example.com/family">Family</a>,
 <a class="animals" href="http://example.com/genus">Genus</a>,
 <a class="animals" href="http://example.com/species">Species</a>]

## Storing values

We haven't been saving in values in memory. 


In [55]:
## Again, save all lifeforms in a object called lifeforms
lifeforms = soup.find_all("a", class_="life")
lifeforms

[<a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>,
 <a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>,
 <a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a>,
 <a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>,
 <a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>,
 <a class="animals life" href="http://example.com/animal3" id="animal3">Animal 3</a>]

In [56]:
## what kind of object it it?
type(lifeforms)

bs4.element.ResultSet

### Print lifeforms. Does it look familiar?

In [57]:
## print lifeforms
lifeforms

[<a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>,
 <a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>,
 <a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a>,
 <a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>,
 <a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>,
 <a class="animals life" href="http://example.com/animal3" id="animal3">Animal 3</a>]

In [58]:
## print it out with a break between each
for life in lifeforms:
    print(life)
    print("************")

<a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>
************
<a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>
************
<a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a>
************
<a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>
************
<a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>
************
<a class="animals life" href="http://example.com/animal3" id="animal3">Animal 3</a>
************


### You can't just get the text for the lifeforms.
### Why? You can't call ```.get_text()``` on a ```<class 'bs4.element.ResultSet'>``` object.


In [59]:
## try it
lifeforms.get_text()

AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

## Instead, iterate through and work on each item in the list whic in this case is a ```<class 'bs4.element.Tag'>```

In [60]:
## see type of object
for life in lifeforms:
    print(type(life))

<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>


In [62]:
for life in lifeforms:
    print(life.get_text())
    print("*********")

Plant 1
*********
Plant 2
*********
Plant 3
*********
Animal 1
*********
Animal 2
*********
Animal 3
*********


In [63]:
## just the text, no html
## Using for loop
lifeforms_list = []

for life in lifeforms:
    lifeforms_list.append(life.get_text())

lifeforms_list

['Plant 1', 'Plant 2', 'Plant 3', 'Animal 1', 'Animal 2', 'Animal 3']

In [64]:
## just the text, no html
## Using for list comprehension

lifeforms_lc = [life.get_text() for life in lifeforms]
lifeforms_lc

['Plant 1', 'Plant 2', 'Plant 3', 'Animal 1', 'Animal 2', 'Animal 3']

## Get the urls for each

In [69]:
## use for loop
all_urls_fl = []
for link in lifeforms:
    print(link)
    url = link.get("href")
    print(url)
    all_urls_fl.append(url)
    


<a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>
http://example.com/plant1
<a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>
http://example.com/plant2
<a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a>
http://example.com/plant3
<a class="animals life" href="http://example.com/animal1" id="animal1">Animal 1</a>
http://example.com/animal1
<a class="animals life" href="http://example.com/animal2" id="animal2">Animal 2</a>
http://example.com/animal2
<a class="animals life" href="http://example.com/animal3" id="animal3">Animal 3</a>
http://example.com/animal3


In [70]:
all_urls_fl

['http://example.com/plant1',
 'http://example.com/plant2',
 'http://example.com/plant3',
 'http://example.com/animal1',
 'http://example.com/animal2',
 'http://example.com/animal3']

In [71]:
# using list comprehension
all_urls_lc = [link.get("href") for link in lifeforms]
all_urls_lc

['http://example.com/plant1',
 'http://example.com/plant2',
 'http://example.com/plant3',
 'http://example.com/animal1',
 'http://example.com/animal2',
 'http://example.com/animal3']

## Cost

Let's grab the cost

How do we target the cost?

In [73]:
## A wide target:
cost = soup.find_all("span")
cost

[<span class="cost">$10</span>,
 <span class="cost">$20</span>,
 <span class="cost">$30</span>,
 <span class="cost">$500</span>,
 <span class="cost">$600</span>,
 <span class="cost">$700</span>,
 <span>Rock 1</span>,
 <span>Rock 2</span>,
 <span>Rock 3</span>]

In [74]:
## narrow the target
cost = soup.find_all("span", class_="cost")
cost

[<span class="cost">$10</span>,
 <span class="cost">$20</span>,
 <span class="cost">$30</span>,
 <span class="cost">$500</span>,
 <span class="cost">$600</span>,
 <span class="cost">$700</span>]

In [75]:
## using for loop
cost_list_fl = []
for price in cost:
    cost_list_fl.append(price.get_text())
cost_list_fl

['$10', '$20', '$30', '$500', '$600', '$700']

In [76]:
## using list comprehension
cost_list_lc = [cost.get_text() for cost in soup.find_all("span", class_="cost")]
cost_list_lc

['$10', '$20', '$30', '$500', '$600', '$700']

In [77]:
## in your function to clean string values
def clean_numbers(some_string_number):
  '''
  Enter a number or a list of numbers. 
  The items can be strings, integers, floats or a mix of all. 
  I will convert it to an integer.
  '''
  if isinstance(some_string_number, str): 
    amount = round(float(some_string_number.replace("$","").replace(",","")))

  else:
    amount = round(float(some_string_number))

  return amount

In [79]:
cost

[<span class="cost">$10</span>,
 <span class="cost">$20</span>,
 <span class="cost">$30</span>,
 <span class="cost">$500</span>,
 <span class="cost">$600</span>,
 <span class="cost">$700</span>]

In [78]:
## final cost
cost_final = [clean_numbers(amount.get_text()) for amount in cost]
cost_final

[10, 20, 30, 500, 600, 700]

## Prepare to Export

You now have one list that holds the name of the lifeform and another that holds the related URL.

Let's create a dict call ```life_dict```.

Keys are name and url...values are the related values


In [80]:
## create it here
life_dict_list = []

for (life, cost, url) in zip(lifeforms_lc, cost_final, all_urls_lc):
    life = {"life_forms": life, "cost": cost, "link": url}
    life_dict_list.append(life)

life_dict_list


[{'life_forms': 'Plant 1', 'cost': 10, 'link': 'http://example.com/plant1'},
 {'life_forms': 'Plant 2', 'cost': 20, 'link': 'http://example.com/plant2'},
 {'life_forms': 'Plant 3', 'cost': 30, 'link': 'http://example.com/plant3'},
 {'life_forms': 'Animal 1', 'cost': 500, 'link': 'http://example.com/animal1'},
 {'life_forms': 'Animal 2', 'cost': 600, 'link': 'http://example.com/animal2'},
 {'life_forms': 'Animal 3', 'cost': 700, 'link': 'http://example.com/animal3'}]

## Export as CSV

We'll use Pandas to export our data to an external file.

We'll cover this in more detail soon, but for now here it is:

In [81]:
## import pandas
import pandas as pd

In [82]:
## use pandas to write to csv file
filename = "lifeforms.csv"
df = pd.DataFrame(life_dict_list)

In [83]:
df

Unnamed: 0,life_forms,cost,link
0,Plant 1,10,http://example.com/plant1
1,Plant 2,20,http://example.com/plant2
2,Plant 3,30,http://example.com/plant3
3,Animal 1,500,http://example.com/animal1
4,Animal 2,600,http://example.com/animal2
5,Animal 3,700,http://example.com/animal3


In [84]:
df.to_csv(filename, encoding ="UTF-8", index=False)

# BeautifulSoup

We covered some basic BeautifulSoup functionality:

- Remember ```soup``` is just a term we use to store an entire webpage or file. We could call it anything we want.
- Searching by ```tags``` like ```title```, ```h1```, ```span``` etc.
- Searching by ```class``` or ```id```
- Finding all occurences of an item using ```find_all()```
- Finding the first occurence of an item using ```find()```
- Removing the html and returning just the string by using ```.string``` or ```get_text()```
- Grabbing just the URL(s) using ```get("href")```

These are the most frequently used BeautifulSoup functions. You can [find many more](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#) in the documentation. 
