### Use cases
* Using python to extract web link pages
* Ecommerce store Automation
* Hydrological Analysis
* Emergency Resource Allocation Planning
* Oil and Gas Production Intel

### Four BeautifulSoup Object Types
* BeautifulSoup Object
* Tag Object
* NavigableString Object
* Comment Object

In [90]:
from bs4 import BeautifulSoup 

#### The BeautifulSoup object

In [2]:
html_doc ='''<!doctype html>
<html lang="en">
    <head>
        <meta charset="UTF-8" />
        <meta http-equiv="X-UA-Compatible" content="IE=edge" />
        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
        <link rel="stylesheet" href="https://unpkg.com/mvp.css@1.12/mvp.css" />
        <link rel="stylesheet" href="pyscript.css" />
        <script defer src="pyscript.min.js"></script>
        <title>PyScript</title>
    </head>

    <body>
        <main>
            <h1>&lt;py-script&gt;</h1>
            <ul>
                <li><a href="pyscript.js">pyscript.js</a></li>
                <li><a href="pyscript.min.js">pyscript.min.js</a></li>
                <li><a href="pyscript.css">pyscript.css</a></li>
                <li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
                <li><a href="pyscript.js.map">pyscript.js.map</a></li>
            </ul>
            <div id="out"></div>
            <py-script std-out="out">
                import sys
                print(sys.version)
            </py-script>

            <h2>Example</h2>
            <pre style="padding: 1em; border: 1px solid #000000">
&lt;!DOCTYPE html&gt;
&lt;html lang=&quot;en&quot;&gt;
    &lt;head&gt;
    &lt;meta charset=&quot;utf-8&quot; /&gt;
    &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width,initial-scale=1&quot; /&gt;
    &lt;title&gt;PyScript Hello World&lt;/title&gt;
    &lt;link rel=&quot;stylesheet&quot; href=&quot;https://pyscript.net/latest/pyscript.css&quot; /&gt;
    &lt;script defer src=&quot;https://pyscript.net/latest/pyscript.js&quot;&gt;&lt;/script&gt;
    &lt;/head&gt;

    &lt;body&gt;
    Hello world! &lt;br&gt;
    This is the current date and time, as computed by Python:
    &lt;py-script&gt;
from datetime import datetime
now = datetime.now()
now.strftime(&quot;%m/%d/%Y, %H:%M:%S&quot;)
    &lt;/py-script&gt;
    &lt;/body&gt;
&lt;/html&gt;</pre
            >
        </main>
    </body>
</html>
'''

In [3]:
soup= BeautifulSoup(html_doc, 'html.parser')
print(soup)

<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<link href="https://unpkg.com/mvp.css@1.12/mvp.css" rel="stylesheet"/>
<link href="pyscript.css" rel="stylesheet"/>
<script defer="" src="pyscript.min.js"></script>
<title>PyScript</title>
</head>
<body>
<main>
<h1>&lt;py-script&gt;</h1>
<ul>
<li><a href="pyscript.js">pyscript.js</a></li>
<li><a href="pyscript.min.js">pyscript.min.js</a></li>
<li><a href="pyscript.css">pyscript.css</a></li>
<li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
<li><a href="pyscript.js.map">pyscript.js.map</a></li>
</ul>
<div id="out"></div>
<py-script std-out="out">
                import sys
                print(sys.version)
            </py-script>
<h2>Example</h2>
<pre style="padding: 1em; border: 1px solid #000000">
&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
    &lt;head&gt;
    &lt;meta charset="u

In [4]:
print(soup.prettify()[:350])

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <link href="https://unpkg.com/mvp.css@1.12/mvp.css" rel="stylesheet"/>
  <link href="pyscript.css" rel="stylesheet"/>
  <script defer="" src="pyscript


### Tag objects

Working with names

In [5]:
soup = BeautifulSoup('<b body="description"">product Description</b>','html')

Tag = soup.b
print(Tag)

<b body="description">product Description</b>


In [6]:
type(Tag)

bs4.element.Tag

In [7]:
Tag.name

'b'

In [8]:
Tag.name = "bestbooks"
Tag

<bestbooks body="description">product Description</bestbooks>

In [9]:
Tag.name

'bestbooks'

### Working with attributes

In [10]:
Tag['body']

'description'

In [11]:
Tag.attrs

{'body': 'description'}

In [12]:
Tag['id'] =3
Tag.attrs

{'body': 'description', 'id': 3}

In [13]:
del Tag['body']

In [14]:
Tag

<bestbooks id="3">product Description</bestbooks>

In [15]:
del Tag['id']
Tag.attrs

{}

### Using tags to navigate a tree

In [16]:
Html_doc = '''<!doctype html>
<html lang="en">
    <head>
        <meta charset="UTF-8" />
        <meta http-equiv="X-UA-Compatible" content="IE=edge" />
        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
        <link rel="stylesheet" href="https://unpkg.com/mvp.css@1.12/mvp.css" />
        <link rel="stylesheet" href="pyscript.css" />
        <script defer src="pyscript.min.js"></script>
        <title>PyScript</title>
    </head>

    <body>
        <main>
            <h1>&lt;py-script&gt;</h1>
            <ul>
                <li><a href="pyscript.js">pyscript.js</a></li>
                <li><a href="pyscript.min.js">pyscript.min.js</a></li>
                <li><a href="pyscript.css">pyscript.css</a></li>
                <li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
                <li><a href="pyscript.js.map">pyscript.js.map</a></li>
            </ul>
            <div id="out"></div>
            <py-script std-out="out">
                import sys
                print(sys.version)
            </py-script>

            <h2>Example</h2>
            <pre style="padding: 1em; border: 1px solid #000000">
&lt;!DOCTYPE html&gt;
&lt;html lang=&quot;en&quot;&gt;
    &lt;head&gt;
    &lt;meta charset=&quot;utf-8&quot; /&gt;
    &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width,initial-scale=1&quot; /&gt;
    &lt;title&gt;PyScript Hello World&lt;/title&gt;
    &lt;link rel=&quot;stylesheet&quot; href=&quot;https://pyscript.net/latest/pyscript.css&quot; /&gt;
    &lt;script defer src=&quot;https://pyscript.net/latest/pyscript.js&quot;&gt;&lt;/script&gt;
    &lt;/head&gt;

    &lt;body&gt;
    Hello world! &lt;br&gt;
    This is the current date and time, as computed by Python:
    &lt;py-script&gt;
from datetime import datetime
now = datetime.now()
now.strftime(&quot;%m/%d/%Y, %H:%M:%S&quot;)
    &lt;/py-script&gt;
    &lt;/body&gt;
&lt;/html&gt;</pre
            >
        </main>
    </body>
</html>
'''

soup = BeautifulSoup(html_doc, "html.parser")

In [17]:
soup.head

<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<link href="https://unpkg.com/mvp.css@1.12/mvp.css" rel="stylesheet"/>
<link href="pyscript.css" rel="stylesheet"/>
<script defer="" src="pyscript.min.js"></script>
<title>PyScript</title>
</head>

In [18]:
soup.title

<title>PyScript</title>

In [19]:
soup.body

<body>
<main>
<h1>&lt;py-script&gt;</h1>
<ul>
<li><a href="pyscript.js">pyscript.js</a></li>
<li><a href="pyscript.min.js">pyscript.min.js</a></li>
<li><a href="pyscript.css">pyscript.css</a></li>
<li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
<li><a href="pyscript.js.map">pyscript.js.map</a></li>
</ul>
<div id="out"></div>
<py-script std-out="out">
                import sys
                print(sys.version)
            </py-script>
<h2>Example</h2>
<pre style="padding: 1em; border: 1px solid #000000">
&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
    &lt;head&gt;
    &lt;meta charset="utf-8" /&gt;
    &lt;meta name="viewport" content="width=device-width,initial-scale=1" /&gt;
    &lt;title&gt;PyScript Hello World&lt;/title&gt;
    &lt;link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" /&gt;
    &lt;script defer src="https://pyscript.net/latest/pyscript.js"&gt;&lt;/script&gt;
    &lt;/head&gt;

    &lt;body&gt;
    Hello world! &lt;br&gt;
    This is

In [20]:
soup.body.ul

<ul>
<li><a href="pyscript.js">pyscript.js</a></li>
<li><a href="pyscript.min.js">pyscript.min.js</a></li>
<li><a href="pyscript.css">pyscript.css</a></li>
<li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
<li><a href="pyscript.js.map">pyscript.js.map</a></li>
</ul>

In [21]:
soup.body.li

<li><a href="pyscript.js">pyscript.js</a></li>

In [22]:
soup.ul

<ul>
<li><a href="pyscript.js">pyscript.js</a></li>
<li><a href="pyscript.min.js">pyscript.min.js</a></li>
<li><a href="pyscript.css">pyscript.css</a></li>
<li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
<li><a href="pyscript.js.map">pyscript.js.map</a></li>
</ul>

In [23]:
soup.a

<a href="pyscript.js">pyscript.js</a>

### Beautifulsoup object

In [24]:
soup0 = BeautifulSoup('<b body="description">product description</b>')

### NavigableString objects


In [25]:
tag = soup0.b
type(tag)

bs4.element.Tag

In [26]:
tag.name

'b'

In [27]:
tag.string

'product description'

In [28]:
tag.int

In [29]:
tag.bool

In [30]:
type(tag.string)

bs4.element.NavigableString

In [31]:
navstring = tag.string
navstring

'product description'

In [32]:
navstring.replace_with("Null")
tag.string

'Null'

### Working with NavigableString objects

In [34]:
html_docc ="""<!doctype html>
<html lang="en">
    <head>
        <meta charset="UTF-8" />
        <meta http-equiv="X-UA-Compatible" content="IE=edge" />
        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
        <link rel="stylesheet" href="https://unpkg.com/mvp.css@1.12/mvp.css" />
        <link rel="stylesheet" href="pyscript.css" />
        <script defer src="pyscript.min.js"></script>
        <title>PyScript</title>
    </head>

    <body>
        <main>
            <h1>&lt;py-script&gt;</h1>
            <ul>
                <li><a href="pyscript.js">pyscript.js</a></li>
                <li><a href="pyscript.min.js">pyscript.min.js</a></li>
                <li><a href="pyscript.css">pyscript.css</a></li>
                <li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
                <li><a href="pyscript.js.map">pyscript.js.map</a></li>
            </ul>
            <div id="out"></div>
            <py-script std-out="out">
                import sys
                print(sys.version)
            </py-script>

            <h2>Example</h2>
            <pre style="padding: 1em; border: 1px solid #000000">
&lt;!DOCTYPE html&gt;
&lt;html lang=&quot;en&quot;&gt;
    &lt;head&gt;
    &lt;meta charset=&quot;utf-8&quot; /&gt;
    &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width,initial-scale=1&quot; /&gt;
    &lt;title&gt;PyScript Hello World&lt;/title&gt;
    &lt;link rel=&quot;stylesheet&quot; href=&quot;https://pyscript.net/latest/pyscript.css&quot; /&gt;
    &lt;script defer src=&quot;https://pyscript.net/latest/pyscript.js&quot;&gt;&lt;/script&gt;
    &lt;/head&gt;

    &lt;body&gt;
    Hello world! &lt;br&gt;
    This is the current date and time, as computed by Python:
    &lt;py-script&gt;
from datetime import datetime
now = datetime.now()
now.strftime(&quot;%m/%d/%Y, %H:%M:%S&quot;)
    &lt;/py-script&gt;
    &lt;/body&gt;
&lt;/html&gt;</pre
            >
        </main>
    </body>
</html>
"""

soup11 = BeautifulSoup('html_doc11', 'html.parser')

In [35]:
for string in soup.stripped_strings: print(repr(string))

'PyScript'
'<py-script>'
'pyscript.js'
'pyscript.min.js'
'pyscript.css'
'pyscript.min.js.map'
'pyscript.js.map'
'import sys\n                print(sys.version)'
'Example'
'<!DOCTYPE html>\n<html lang="en">\n    <head>\n    <meta charset="utf-8" />\n    <meta name="viewport" content="width=device-width,initial-scale=1" />\n    <title>PyScript Hello World</title>\n    <link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" />\n    <script defer src="https://pyscript.net/latest/pyscript.js"></script>\n    </head>\n\n    <body>\n    Hello world! <br>\n    This is the current date and time, as computed by Python:\n    <py-script>\nfrom datetime import datetime\nnow = datetime.now()\nnow.strftime("%m/%d/%Y, %H:%M:%S")\n    </py-script>\n    </body>\n</html>'


In [36]:
title_tag = soup.title
title_tag

<title>PyScript</title>

In [37]:
title_tag.parent

<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<link href="https://unpkg.com/mvp.css@1.12/mvp.css" rel="stylesheet"/>
<link href="pyscript.css" rel="stylesheet"/>
<script defer="" src="pyscript.min.js"></script>
<title>PyScript</title>
</head>

In [38]:
title_tag.string

'PyScript'

In [39]:
title_tag.string.parent

<title>PyScript</title>

### Working with Parsed Data
* An HTML or XML document is just passed to a BeautifulSoup() constructor.
* The constructor converts the document to unicode and then parses it with a built-in HTML parser (by defauly

`Printing data that's in a parsed tree
Searching and retrievning data from a parse tree`

In [40]:
#Data Parsing
import pandas as pd
import re

In [41]:
r = '''<!doctype html>
<html lang="en">
    <head>
        <meta charset="UTF-8" />
        <meta http-equiv="X-UA-Compatible" content="IE=edge" />
        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
        <link rel="stylesheet" href="https://unpkg.com/mvp.css@1.12/mvp.css" />
        <link rel="stylesheet" href="pyscript.css" />
        <script defer src="pyscript.min.js"></script>
        <title>PyScript</title>
    </head>

    <body>
        <main>
            <h1>&lt;py-script&gt;</h1>
            <ul>
                <li><a href="pyscript.js">pyscript.js</a></li>
                <li><a href="pyscript.min.js">pyscript.min.js</a></li>
                <li><a href="pyscript.css">pyscript.css</a></li>
                <li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
                <li><a href="pyscript.js.map">pyscript.js.map</a></li>
            </ul>
            <div id="out"></div>
            <py-script std-out="out">
                import sys
                print(sys.version)
            </py-script>

            <h2>Example</h2>
            <pre style="padding: 1em; border: 1px solid #000000">
&lt;!DOCTYPE html&gt;
&lt;html lang=&quot;en&quot;&gt;
    &lt;head&gt;
    &lt;meta charset=&quot;utf-8&quot; /&gt;
    &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width,initial-scale=1&quot; /&gt;
    &lt;title&gt;PyScript Hello World&lt;/title&gt;
    &lt;link rel=&quot;stylesheet&quot; href=&quot;https://pyscript.net/latest/pyscript.css&quot; /&gt;
    &lt;script defer src=&quot;https://pyscript.net/latest/pyscript.js&quot;&gt;&lt;/script&gt;
    &lt;/head&gt;

    &lt;body&gt;
    Hello world! &lt;br&gt;
    This is the current date and time, as computed by Python:
    &lt;py-script&gt;
from datetime import datetime
now = datetime.now()
now.strftime(&quot;%m/%d/%Y, %H:%M:%S&quot;)
    &lt;/py-script&gt;
    &lt;/body&gt;
&lt;/html&gt;</pre
            >
        </main>
    </body>
</html>'''

In [43]:
soups = BeautifulSoup(r, 'lxml')
type(soups)

bs4.BeautifulSoup

In [44]:
print(soups.prettify()[0:100])

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-eq


### Getting data from a parse tree

In [45]:
text_only = soups.get_text()
print(text_only)









PyScript



<py-script>

pyscript.js
pyscript.min.js
pyscript.css
pyscript.min.js.map
pyscript.js.map



                import sys
                print(sys.version)
            
Example

<!DOCTYPE html>
<html lang="en">
    <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width,initial-scale=1" />
    <title>PyScript Hello World</title>
    <link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" />
    <script defer src="https://pyscript.net/latest/pyscript.js"></script>
    </head>

    <body>
    Hello world! <br>
    This is the current date and time, as computed by Python:
    <py-script>
from datetime import datetime
now = datetime.now()
now.strftime("%m/%d/%Y, %H:%M:%S")
    </py-script>
    </body>
</html>





### Searching and retrieving data from a parse tree

`Retrieving tags by filtering with name arguments`

In [47]:
soups.find_all('li')

[<li><a href="pyscript.js">pyscript.js</a></li>,
 <li><a href="pyscript.min.js">pyscript.min.js</a></li>,
 <li><a href="pyscript.css">pyscript.css</a></li>,
 <li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>,
 <li><a href="pyscript.js.map">pyscript.js.map</a></li>]

`Retrieving tags by filtering with keyword argument`

In [55]:
soups.find_all(id="meta")

[]

`Retrieving tags by filtering with string arguments`

In [54]:
soups.find_all('ul')

[<ul>
 <li><a href="pyscript.js">pyscript.js</a></li>
 <li><a href="pyscript.min.js">pyscript.min.js</a></li>
 <li><a href="pyscript.css">pyscript.css</a></li>
 <li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
 <li><a href="pyscript.js.map">pyscript.js.map</a></li>
 </ul>]

`Retrieving tags by filtering with list objects`

In [56]:
soups.find_all(['ul', 'b'])

[<ul>
 <li><a href="pyscript.js">pyscript.js</a></li>
 <li><a href="pyscript.min.js">pyscript.min.js</a></li>
 <li><a href="pyscript.css">pyscript.css</a></li>
 <li><a href="pyscript.min.js.map">pyscript.min.js.map</a></li>
 <li><a href="pyscript.js.map">pyscript.js.map</a></li>
 </ul>]

`Retrieving tags by filtering with regular expressions`

In [58]:
l = re.compile('l')
for tag in soups.find_all(l):print(tag.name)

html
link
link
title
ul
li
li
li
li
li


`Retrieving tags by filtering with a Boolean value`

In [59]:
for tag in soups.find_all(True):print(tag.name)

html
head
meta
meta
meta
link
link
script
title
body
main
h1
ul
li
a
li
a
li
a
li
a
li
a
div
py-script
h2
pre


`Retrieving weblinksby filtering with string objects`

In [60]:
for link in soups.find_all('a'): print(link.get('href'))


pyscript.js
pyscript.min.js
pyscript.css
pyscript.min.js.map
pyscript.js.map


`Retrieving strings by filtering with regular expressions`

In [63]:
soups.find_all(string=re.compile('py'))

['<py-script>',
 'pyscript.js',
 'pyscript.min.js',
 'pyscript.css',
 'pyscript.min.js.map',
 'pyscript.js.map',
 '\n<!DOCTYPE html>\n<html lang="en">\n    <head>\n    <meta charset="utf-8" />\n    <meta name="viewport" content="width=device-width,initial-scale=1" />\n    <title>PyScript Hello World</title>\n    <link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" />\n    <script defer src="https://pyscript.net/latest/pyscript.js"></script>\n    </head>\n\n    <body>\n    Hello world! <br>\n    This is the current date and time, as computed by Python:\n    <py-script>\nfrom datetime import datetime\nnow = datetime.now()\nnow.strftime("%m/%d/%Y, %H:%M:%S")\n    </py-script>\n    </body>\n</html>']

### Demonstrating Web pages
* 1. Scraping a webpage
* 2. Saving Web scraping results

In [100]:
import re
from bs4 import BeautifulSoup
import time

In [105]:
r =  urllib.request.urlopen('https://plotly.com/get-pricing/').read()
soup = BeautifulSoup(r, 'html.parser')
type(soup)

bs4.BeautifulSoup

### Scraping a webpage and saving your result

In [77]:
print(soup.prettify()[:100])

<!DOCTYPE html>
<html lang="en">
 <!-- Initalize title and data source variables -->
 <head>
  <!--



In [78]:
for link in soup.find_all('a'): print(link.get('href'))

/
#explanation
https://analytics.usa.gov/data/
https://open.gsa.gov/api/dap/
https://analytics.usa.gov/data/live/all-domains-30-days.csv
https://analytics.usa.gov/data/live/top-downloads-yesterday.csv
https://analytics.usa.gov/data/live/top-traffic-sources-30-days.csv
https://analytics.usa.gov/data/live/top-traffic-sources-30-days.json
https://analytics.usa.gov/data/live/top-exit-pages-30-days.csv
https://analytics.usa.gov/data/live/top-exit-pages-30-days.json
https://analytics.usa.gov/data/live/all-pages-realtime.csv
https://analytics.usa.gov/data/live/realtime.json
https://analytics.usa.gov/data/live/language.csv
https://analytics.usa.gov/data/live/language.json
https://analytics.usa.gov/data/live/top-countries-realtime.json
https://analytics.usa.gov/data/live/top-cities-realtime.json
https://analytics.usa.gov/data/live/devices.csv
https://analytics.usa.gov/data/live/browsers.csv
https://analytics.usa.gov/data/live/browsers.json
https://analytics.usa.gov/data/live/ie.csv
https://anal

In [80]:
for link in soup.findAll('a', attrs={'href':re.compile('^http')}): print(link) 

<a href="https://analytics.usa.gov/data/">Data</a>
<a href="https://open.gsa.gov/api/dap/" rel="noopener" target="_blank">API</a>
<a class="download-data" href="https://analytics.usa.gov/data/live/all-domains-30-days.csv"><span class="usa-label-big">CSV</span></a>
<a class="download-data" href="https://analytics.usa.gov/data/live/top-downloads-yesterday.csv"><span class="usa-label-big">CSV</span></a>
<a class="download-data" href="https://analytics.usa.gov/data/live/top-traffic-sources-30-days.csv"><span class="usa-label-big">CSV</span></a>
<a class="download-data" href="https://analytics.usa.gov/data/live/top-traffic-sources-30-days.json"><span class="usa-label-big">JSON</span></a>
<a class="download-data" href="https://analytics.usa.gov/data/live/top-exit-pages-30-days.csv"><span class="usa-label-big">CSV</span></a>
<a class="download-data" href="https://analytics.usa.gov/data/live/top-exit-pages-30-days.json"><span class="usa-label-big">JSON</span></a>
<a class="download-data" href=

In [88]:
file = open("parsed_data.txt", 'w')
for link in soup.findAll('a', attrs={'href': re.compile('^http')}):
    soup_link = str(link)
    print(soup_link)
    file.write(soup_link)
    file.flush()
    file.close()

<a href="https://analytics.usa.gov/data/">Data</a>
<a href="https://open.gsa.gov/api/dap/" rel="noopener" target="_blank">API</a>


ValueError: I/O operation on closed file.

In [89]:
%pwd

'/Users/leo/Downloads'

In [108]:
import urllib
import requests
from bs4 import BeautifulSoup
import time

r =  urllib.request.Request('https://bitcointalk.org/index.php?', headers={'User-Agent': 'opera'})
response = urllib.request.urlopen(r)
soup = BeautifulSoup(response.read(), 'html.parser')
for link in soup.find_all('a'): print(link.get('href'))

#
https://bitcointalk.org/index.php?action=login
https://bitcointalk.org/index.php?action=register
https://bitcoincore.org/en/download/
https://bitcointalk.org/bitcoin-24.0.1.torrent
https://bitcointalk.org/index.php?action=search;advanced
https://bitcointalk.org/index.php
https://bitcointalk.org/index.php?action=help
https://bitcointalk.org/index.php?action=search
https://bitcointalk.org/index.php?action=login
https://bitcointalk.org/index.php?action=register
/more.php
https://bitcointalk.org/index.php
https://bitcointalk.org/index.php#1
https://bitcointalk.org/index.php?action=unread;board=1.0
https://bitcointalk.org/index.php?board=1.0
https://bitcointalk.org/index.php?action=profile;u=164822
https://bitcointalk.org/index.php?action=profile;u=973017
https://bitcointalk.org/index.php?topic=5447359.msg62032022#new
https://bitcointalk.org/index.php?board=74.0
https://bitcointalk.org/index.php?board=77.0
https://bitcointalk.org/index.php?board=86.0
https://bitcointalk.org/index.php?boar

In [117]:
soup.find_all(string=re.compile('Solokan'))

[]

In [118]:
for tag in soup.find_all(True):print(tag.name)

html
head
meta
meta
meta
script
script
title
link
link
link
style
link
link
link
link
meta
script
script
body
div
table
tr
td
span
td
img
table
tr
td
span
a
img
tr
td
table
tr
td
span
b
a
a
table
tr
td
span
b
a
a
td
form
a
img
input
input
input
table
tr
td
td
td
a
td
td
a
td
a
td
a
td
a
td
a
td
div
table
tr
td
div
b
a
td
div
div
a
table
tr
td
a
img
td
b
a
br
div
i
a
td
span
br
td
span
b
a
br
a
br
b
tr
td
span
b
a
a
a
a
tr
td
a
img
td
b
a
br
div
i
a
a
td
span
br
td
span
b
a
br
a
br
b
tr
td
span
b
a
tr
td
a
img
td
b
a
br
div
i
a
td
span
br
td
span
b
a
br
a
br
b
tr
td
span
b
a
a
a
a
a
tr
td
a
img
td
b
a
br
div
i
a
td
span
br
td
span
b
a
br
a
br
b
tr
td
a
img
td
b
a
br
td
span
br
td
span
b
a
br
a
br
b
div
div
a
table
tr
td
a
img
td
b
a
br
td
span
br
td
span
b
a
br
a
br
b
tr
td
span
b
a
tr
td
a
img
td
b
a
br
div
i
a
a
td
span
br
td
span
b
a
br
a
br
b
tr
td
span
b
a
a
a
a
a
a
a
a
a
tr
td
a
img
td
b
a
br
div
i
a
td
span
br
td
span
b
a
br
a
br
b
tr
td
span
b
a
a
div
div
a
table
tr
td
a
img
td


In [119]:
text_only = soup.get_text()
print(text_only)








Bitcoin Forum - Index
















Bitcoin Forum









April 04, 2023, 08:34:08 AM







Welcome, Guest. Please login or register.				









News: Latest Bitcoin Core release: 24.0.1 [Torrent]




 
						








  

Home
 

Help


Search


Login


Register


More

 





Bitcoin Forum






Bitcoin






Bitcoin Discussion
						General discussion about the Bitcoin ecosystem that doesn't fit better elsewhere. News, the Bitcoin community, innovations, the general environment, etc. Discussion of specific Bitcoin-related services usually belongs in other sections.
					Moderator: hilariousandco


					2511781 Posts 
					98153 Topics
				


Last post  by Rupok
						in Re: Do you think low pro...
						on Today at 08:29:39 AM
					




Child Boards: Legal, Press, Meetups, Important Announcements






Development & Technical Discussion
						Technical discussion about Satoshi's Bitcoin client and the Bitcoin network in general. No third-party sites/clients, bug reports 