<font size = 8> Grabbing a Title

### Example Task 0 - Grabbing the title of a page

Let's start very simple, we will grab the title of a page. Remember that this is the HTML block with the **title** tag. For this task we will use **www.example.com** which is a website specifically made to serve as an example domain. Let's go through the main steps:

In [1]:
import requests

In [3]:
# get a webpage through a request

In [4]:
result = requests.get("http://www.example.com")

In [5]:
type(result)

requests.models.Response

In [6]:
result.text

'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <

In [7]:
# this gets the same document

![image.png](attachment:image.png)

In [8]:
# the same html is stored as a giant string in Python

In [9]:
# in order to parse this string, we use beautiful soup

In [10]:
import bs4

In [12]:
soup = bs4.BeautifulSoup(result.text, "lxml")

# beautiful soup uses lxml at the back end to figure out the different classes and elements

In [13]:
soup

<!DOCTYPE html>
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-type"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>
</head>
<body>
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples

## WOW

***

In [15]:
soup.select('title')

[<title>Example Domain</title>]

In [16]:
# notice by default, it returns a LIST

In [17]:
soup.select('p')

[<p>This domain is for use in illustrative examples in documents. You may use this
     domain in literature without prior coordination or asking for permission.</p>,
 <p><a href="https://www.iana.org/domains/example">More information...</a></p>]

In [18]:
soup.select('h1')

[<h1>Example Domain</h1>]

***

In [19]:
# grabbing actual text

In [20]:
soup.select('title')[0]

<title>Example Domain</title>

In [22]:
# this just gets the first item from the list -- moot because the list contains only one

In [24]:
soup.select('title')[0].getText()

'Example Domain'

***

In [25]:
site_paragraphs = soup.select('p')

In [26]:
site_paragraphs

[<p>This domain is for use in illustrative examples in documents. You may use this
     domain in literature without prior coordination or asking for permission.</p>,
 <p><a href="https://www.iana.org/domains/example">More information...</a></p>]

In [27]:
site_paragraphs[0]

<p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>

In [28]:
type(site_paragraphs[0])

bs4.element.Tag

In [29]:
# Note this is not a string, but a specialized BS4 object