<span style="color:green">Making soup from a webpage</span>
### Hypertext Transfer Protocol (HTTP) is the foundation for data communication on the world wide web
- _Entering a URL_ is a request for the resource (webpage) at that domain address
- Response is what happens (page loads? 404 error?)


In [1]:
import pandas as pd
import urllib.request
from bs4 import BeautifulSoup as BS

#### The [Requests](http://docs.python-requests.org/en/master/user/quickstart/) package makes working with HTTP easier

In [2]:
# create a Request object for the wikipedia page for the Turing Award
request = urllib.request.Request('https://en.wikipedia.org/wiki/Turing_Award')

# use urlopen to fetch the requested URL
result = urllib.request.urlopen(request)

# read the resulting HTML into a variable called result_text
result_text = result.read()

In [3]:
print('request is a ', type(request))
print('result is a ', type(result))
print('result_text is a ', type(result_text))

request is a  <class 'urllib.request.Request'>
result is a  <class 'http.client.HTTPResponse'>
result_text is a  <class 'bytes'>


In [None]:
result_text

#### [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is a package that helps pull data from HTML (and XML) files

In [5]:
# create the soup by constructing a BS object from the html page and the appropriate parser
soup = BS(result_text, 'html.parser')

#### Explore the methods and attributes of the soup object - here are a few

In [6]:
soup.title

<title>Turing Award - Wikipedia</title>

In [None]:
print(soup.prettify())

In [8]:
soup.find_all('table')

[<table class="infobox vevent" style="width:22em"><tbody><tr><th class="summary" colspan="2" style="text-align:center;font-size:125%;font-weight:bold;background-color: #eedd82;">ACM Turing Award</th></tr><tr><td colspan="2" style="text-align:center">
 <a class="image" href="/wiki/File:Turing-statue-Bletchley_11.jpg"><img alt="Turing-statue-Bletchley 11.jpg" data-file-height="4928" data-file-width="3264" height="332" src="//upload.wikimedia.org/wikipedia/commons/thumb/a/ad/Turing-statue-Bletchley_11.jpg/220px-Turing-statue-Bletchley_11.jpg" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/a/ad/Turing-statue-Bletchley_11.jpg/330px-Turing-statue-Bletchley_11.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/a/ad/Turing-statue-Bletchley_11.jpg/440px-Turing-statue-Bletchley_11.jpg 2x" width="220"/></a><div><a href="/wiki/Stephen_Kettle" title="Stephen Kettle">Stephen Kettle</a>'s slate statue of <a href="/wiki/Alan_Turing" title="Alan Turing">Alan Turing</a> at <a href="/wiki/B

#### Hypertext Markup Language (HTML) uses tags to organize and style text. You can learn more about the standard tags [here](https://www.w3schools.com/html/html_intro.asp)

![assets/html.png](assets/html.png)