This is the example code on the crummy.com BeautifulSoup page with comments to deepen understanding.

In [1]:
from bs4 import BeautifulSoup

In [2]:
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story<b></p>

<p class="story">Once upon a time there were three little sister; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id=link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id=link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

# This is html code. There is key syntax here that will be important for 
# understanding how to use beautifulsoup. a href seems to be about setting up links
# while p class is about identifying sections of the html page. 

In [3]:
soup = BeautifulSoup(html_doc, 'html.parser')
# initializing BeautifulSoup. But what is html.parser? 
# this is will parse html text files formatted in HTML and XHTML

In [4]:
print(soup.prettify())
# this will format the HTML text to make it easier to read. 

<html>
 <head>
  <title>
   The Dormouse's story
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    The Dormouse's story
    <b>
    </b>
   </b>
  </p>
  <p class="story">
   Once upon a time there were three little sister; and their names were
   <a class="sister" href="http://example.com/elsie" id="link1">
    Elsie
   </a>
   ,
   <a class="sister" href="http://example.com/lacie" id='link2"'>
    Lacie
   </a>
   and
   <a class="sister" href="http://example.com/tillie" id='link3"'>
    Tillie
   </a>
   ;
and they lived at the bottom of a well.
  </p>
  <p class="story">
   ...
  </p>
 </body>
</html>


Navigating HTML text through Beautiful soup.

In [5]:
soup.title
# finds the first instance of the title tag and grabs the whole thing

<title>The Dormouse's story</title>

In [6]:
soup.title.name
# this pulls out the name of the title tag. 
# This seems kinda useless so far, so I'll have to wait and see where this goes

'title'

In [7]:
# let's just pull out the string inside the title tag
soup.title.string

"The Dormouse's story"

In [8]:
# what is the title tag under?
soup.title.parent.name

'head'

In [9]:
# we can pull out the p tag out
soup.p

<p class="title"><b>The Dormouse's story<b></b></b></p>

In [12]:
# how do you get the class name?
soup.p['class']

['title']

In [13]:
# by now it's kinda obvious how to get the tags just by entering in the names.
soup.a
# but notice how it only pulled out the first instance only

<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

In [14]:
# how do you get the other ones?
soup.find_all('a') # basically find all the a tags
# that comes back to you as a list

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id='link2"'>Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id='link3"'>Tillie</a>]

In [15]:
# as an aside, do this:
# put it into a variable to choose specific ones.
lst_var = soup.find_all('a')
lst_var[2]

<a class="sister" href="http://example.com/tillie" id='link3"'>Tillie</a>

In [30]:
# or...
rndm_var = soup.find(id="link3")
# wait this didn't work out right the first time I did it. Why?

In [35]:
type(rndm_var)
# its a none type meaning nothing is happening and its coming out as a None value.
# What am I doing wrong? That's the same syntax as on the website.
soup.find("a", attrs="link3")

In [32]:
# how to extract the URLs found within a page's <a> tags:
for link in soup.find_all('a'):
    print(link.get('href'))

http://example.com/elsie
http://example.com/lacie
http://example.com/tillie


In [36]:
# and what if we wanted to get the text from the page we created:
print(soup.get_text())
#this grabs any text inside a tag.


The Dormouse's story

The Dormouse's story
Once upon a time there were three little sister; and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well.
...



Found random text online and I'll try to do the same thing above...

In [37]:
rndm_html = """
<h2>In post mean shot ye</h2>

<p>Woody equal ask saw sir weeks aware decay. Entrance prospect removing we packages strictly is no smallest he. For hopes may chief get hours day rooms. Oh no turned behind polite piqued enough at. Forbade few through inquiry blushes you. Cousin no itself eldest it in dinner latter missed no. Boisterous estimating interested collecting get conviction friendship say boy. Him mrs shy article smiling respect opinion excited. Welcomed humoured rejoiced peculiar to in an.</p>

<p>Far curiosity incommode now led smallness allowance. Favour bed assure son things yet. She consisted consulted elsewhere happiness disposing household any old the. Widow downs you new shade drift hopes small. So otherwise commanded sweetness we improving. Instantly by daughters resembled unwilling principle so middleton. Fail most room even gone her end like. Comparison dissimilar unpleasant six compliment two unpleasing any add. Ashamed my company thought wishing colonel it prevent he in. Pretended residence are something far engrossed old off.</p>

<p>Gave read use way make spot how nor. In daughter goodness an likewise oh consider at procured wandered. Songs words wrong by me hills heard timed. Happy eat may doors songs. Be ignorant so of suitable dissuade weddings together. Least whole timed we is. An smallness deficient discourse do newspaper be an eagerness continued. Mr my ready guest ye after short at.</p>

<p>On no twenty spring of in esteem spirit likely estate. Continue new you declared differed learning bringing honoured. At mean mind so upon they rent am walk. Shortly am waiting inhabit smiling he chiefly of in. Lain tore time gone him his dear sure. Fat decisively estimating affronting assistance not. Resolve pursuit regular so calling me. West he plan girl been my then up no.</p>

<p>It as announcing it me stimulated frequently continuing. Least their she you now above going stand forth. He pretty future afraid should genius spirit on. Set property addition building put likewise get. Of will at sell well at as. Too want but tall nay like old. Removing yourself be in answered he. Consider occasion get improved him she eat. Letter by lively oh denote an.</p>

<p>Satisfied conveying an dependent contented he gentleman agreeable do be. Warrant private blushes removed an in equally totally if. Delivered dejection necessary objection do mr prevailed. Mr feeling do chiefly cordial in do. Water timed folly right aware if oh truth. Imprudence attachment him his for sympathize. Large above be to means. Dashwood do provided stronger is. But discretion frequently sir the she instrument unaffected admiration everything.</p>

<p>Both rest of know draw fond post as. It agreement defective to excellent. Feebly do engage of narrow. Extensive repulsive belonging depending if promotion be zealously as. Preference inquietude ask now are dispatched led appearance. Small meant in so doubt hopes. Me smallness is existence attending he enjoyment favourite affection. Delivered is to ye belonging enjoyment preferred. Astonished and acceptance men two discretion. Law education recommend did objection how old.</p>

<p>Received shutters expenses ye he pleasant. Drift as blind above at up. No up simple county stairs do should praise as. Drawings sir gay together landlord had law smallest. Formerly welcomed attended declared met say unlocked. Jennings outlived no dwelling denoting in peculiar as he believed. Behaviour excellent middleton be as it curiosity departure ourselves.</p>

<p>Extremely we promotion remainder eagerness enjoyment an. Ham her demands removal brought minuter raising invited gay. Contented consisted continual curiosity contained get sex. Forth child dried in in aware do. You had met they song how feel lain evil near. Small she avoid six yet table china. And bed make say been then dine mrs. To household rapturous fulfilled attempted on so.</p>


"""

In [38]:
soup = BeautifulSoup(rndm_html, 'html.parser')

In [44]:
print(soup.prettify())

<h2>
 In post mean shot ye
</h2>
<p>
 Woody equal ask saw sir weeks aware decay. Entrance prospect removing we packages strictly is no smallest he. For hopes may chief get hours day rooms. Oh no turned behind polite piqued enough at. Forbade few through inquiry blushes you. Cousin no itself eldest it in dinner latter missed no. Boisterous estimating interested collecting get conviction friendship say boy. Him mrs shy article smiling respect opinion excited. Welcomed humoured rejoiced peculiar to in an.
</p>
<p>
 Far curiosity incommode now led smallness allowance. Favour bed assure son things yet. She consisted consulted elsewhere happiness disposing household any old the. Widow downs you new shade drift hopes small. So otherwise commanded sweetness we improving. Instantly by daughters resembled unwilling principle so middleton. Fail most room even gone her end like. Comparison dissimilar unpleasant six compliment two unpleasing any add. Ashamed my company thought wishing colonel it pr

In [40]:
soup.h2

<h2>In post mean shot ye</h2>

In [42]:
soup.find_all("p")

[<p>Woody equal ask saw sir weeks aware decay. Entrance prospect removing we packages strictly is no smallest he. For hopes may chief get hours day rooms. Oh no turned behind polite piqued enough at. Forbade few through inquiry blushes you. Cousin no itself eldest it in dinner latter missed no. Boisterous estimating interested collecting get conviction friendship say boy. Him mrs shy article smiling respect opinion excited. Welcomed humoured rejoiced peculiar to in an.</p>,
 <p>Far curiosity incommode now led smallness allowance. Favour bed assure son things yet. She consisted consulted elsewhere happiness disposing household any old the. Widow downs you new shade drift hopes small. So otherwise commanded sweetness we improving. Instantly by daughters resembled unwilling principle so middleton. Fail most room even gone her end like. Comparison dissimilar unpleasant six compliment two unpleasing any add. Ashamed my company thought wishing colonel it prevent he in. Pretended residence ar

In [46]:
print(soup.get_text())


In post mean shot ye
Woody equal ask saw sir weeks aware decay. Entrance prospect removing we packages strictly is no smallest he. For hopes may chief get hours day rooms. Oh no turned behind polite piqued enough at. Forbade few through inquiry blushes you. Cousin no itself eldest it in dinner latter missed no. Boisterous estimating interested collecting get conviction friendship say boy. Him mrs shy article smiling respect opinion excited. Welcomed humoured rejoiced peculiar to in an.
Far curiosity incommode now led smallness allowance. Favour bed assure son things yet. She consisted consulted elsewhere happiness disposing household any old the. Widow downs you new shade drift hopes small. So otherwise commanded sweetness we improving. Instantly by daughters resembled unwilling principle so middleton. Fail most room even gone her end like. Comparison dissimilar unpleasant six compliment two unpleasing any add. Ashamed my company thought wishing colonel it prevent he in. Pretended res

In [None]:
# this text doesn't seem like what I want to really showcase learning. Will change it later.